There are two main technological approaches to collecting web analytics data:
Each approach is briefly outlined below.
(e.g. Webalizer)
Web servers record all their transactions in a logfile. Analysis software reads this data to provide information about on the site's traffic. Page views and visits are commonly displayed metrics, but are now considered rather unsophisticated measurements.
The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.
The extensive use of web caches also presents a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received or recorded by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.
(e.g. Google Analytics)
Concerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or 'Web bugs'.
In the mid 1990s, Web counters were commonly seen - these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.
The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.
With the increasing popularity of Ajax-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest objects.
Page tagging solutions involve vendor lock-in (for example, having your web stats with Google).
Web analytics packages, installed on the same web site, configured the same way, produce different numbers. Sometimes radically different numbers. In some cases the package showing the highest numbers reported 150% more traffic than the package reporting the least traffic. The web is a mess. Counting on the web is far from deterministic. Users from AOL have IP addresses that change mid-session. Proxy servers strip referrer information. About 3% of users disable Javascript. 2-3% of people don't support cookies. There are many issues of these types.
Each package handles these issues differently. For example, some packages do not collect session related information from people without cookies, and others fall back on IP and User Agent tracking. Each package has different strengths and weaknesses. Determining what package is best for your company needs to be evaluated based on how your specific requirements match up with those strengths and weaknesses. Don't buy the higher priced package simply because your site is larger.
The bottom line: We recommend using a complementary solution - pulling some data from the logfiles and other data from page tagging.