Web Analytics - An Introduction

There are two main technological approaches to collecting web analytics data:

  • Logfile analysis reads the logfiles in which the web server records all its transactions.
  • Page tagging uses JavaScript on each page to notify a third-party server when a page is rendered by a web browser.

Each approach is briefly outlined below.

Logfile Analysis

(e.g. Webalizer)

Web servers record all their transactions in a logfile. Analysis software reads this data to provide information about on the site's traffic. Page views and visits are commonly displayed metrics, but are now considered rather unsophisticated measurements.

  • Page View: 
    A request made to the web server for a page, as opposed to a graphic.
  • Visit (session):
    A sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes.

The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.

The extensive use of web caches also presents a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received or recorded by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Advantages of logfile analysis

  • The data is readily available. (Collecting data via page tagging requires changes to the website.)
  • The web server reliably records every transaction.
  • The data is on our servers in a standard format. This makes it easy to use several different programs, or analyze historical data with a new program.
  • Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is important data for performing search engine optimization.
  • Logfiles contain information on failed requests; page tagging only records an event if the page is successfully viewed.

Disadvantages

  • Logfiles contain information on visits from search engine spiders. Logfile analysis must be configured to filter out this traffic for a more reliable estimate of human traffic.
  • With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL.

Page Tagging

(e.g. Google Analytics)

Concerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or 'Web bugs'.

In the mid 1990s, Web counters were commonly seen - these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.

With the increasing popularity of Ajax-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest objects.

Advantages of page tagging

  • The JavaScript is automatically run every time the page is loaded, so there are fewer worries about caching.
  • Eaiser to add additional information to the JavaScript which can then be collected by the remote server (screen size, order value, etc.).
  • Can report on events which do not involve a request to the web server, such as interactions within Flash movies.
  • The page tagging service manages the process of assigning cookies to visitors; with logfile analysis, the server has to be configured to do this.
  • Page tagging is available to companies who do not run their own web servers.

Disadvantages

  • Page tagging relies on the visitors' browser, which may not always cooperate (example, if JavaScript is disabled).
  • Logfiles contain information on failed requests; page tagging only records an event if the page is successfully viewed.

Page tagging solutions involve vendor lock-in (for example, having your web stats with Google).

The bottom line: web analytics packages are not accurate.

 

Web analytics packages, installed on the same web site, configured the same way, produce different numbers. Sometimes radically different numbers. In some cases the package showing the highest numbers reported 150% more traffic than the package reporting the least traffic. The web is a mess. Counting on the web is far from deterministic. Users from AOL have IP addresses that change mid-session. Proxy servers strip referrer information. About 3% of users disable Javascript. 2-3% of people don't support cookies. There are many issues of these types.

Each package handles these issues differently. For example, some packages do not collect session related information from people without cookies, and others fall back on IP and User Agent tracking. Each package has different strengths and weaknesses. Determining what package is best for your company needs to be evaluated based on how your specific requirements match up with those strengths and weaknesses. Don't buy the higher priced package simply because your site is larger.

The bottom line: We recommend using a complementary solution - pulling some data from the logfiles and other data from page tagging.

Sources:
http://en.wikipedia.org/wiki/Web_analytics