- +44 1293 403636
- This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
- Follow us on Twitter
- Google+
Using Page Tags or Log Files for Web Analytics
Should I Use Page Tags, Log Files?
This is a question that is frequently encountered when discussing data collection methods and strategy for a web analytics software deployment. The default request and expectation from most customers is the deployment of a page tag based solution.
This is usually driven by the predominance of this approach by most web analytics vendors, including the ubiquitous Google Analytics and most hosted web analytics solutions. Before addressing the fundamental question of 'Page Tags or Log Files ?' it would be useful to explore some definitions.
What is a Page Tag?
This is typically small piece of JavaScript code placed in each page of your website (usually via a template). Every time the page is requested it automatically runs the JavaScript in the web browser and sends, via a request to the web analytics data collection server, information regarding web site usage. The data is captured on the remote server in a log or database and can then be processed to monitor website activity. The data will be similar to that stored in a standard web server log file, and will allow you to capture more in-depth information about intra-page activity, usage of rich internet applications and client side data such as screen resolution, screen colour depth and the versions of plug-ins they are running. This gives you the opportunity to ensure that your website is functional and operational for all of your users.
If the browser is unable to execute JavaScript then the default position is the request of a single pixel image on the web server and a limited amount of information can be gathered.
What is a Log File?
When a web page is requested from within your web browser (Internet Explorer, Firefox etc.), the request is made to a web server, for example IIS or Apache, that fetches the page from either a database or its file system. When the server collects this page and all its associated images, style sheets and the like, it records all these tasks in a log file. This log file includes data on who requested the page (indirectly at least), what site referred them to your site, what time did they make the request and which page/image/object was requested?
These logs can then be processed at a later date to monitor your website activity.
So, Which One Should I Use?
As with most technical solutions the answer relies upon the individual requirements of the user, based on what data you require and preferences on data capture.
Page Tag Benefits
Accuracy due to tendency of page tags not to get cached (assisted by the addition of some random data to the tag request).
Client browser information - Language, plug-in versions, screen resolution.
Overcomes the problem of not having access to the server log files - analytics environment may be hosted distant from the sites being measured.
Tag instrumentation can be included in flash applications.
Can track on-page events in dynamic (DHTML/AJAX/HTML 5) applications.
Can track clicks on outbound links.
Search engine Robots/Spiders don't (usually) request the tags - so there's less data-quality issues to worry about.
Page Tag Drawbacks
If tag fails to load, the page may load with errors.
Tags must be added to every page. No tags = no data.
Vendor lock-in because the tags are usually proprietary and you may not have access to the raw data.
Vendor-hosted tagging may not be located in your country - introducing the possibility of privacy / data protection issues.
Not all resources may be tagged, PDF files for example. The PDF standard does allow JavaScript / ECMA Script in PDF files, but it is usually not reliable and typically people tag the link to the PDF, rather than the PDF itself.
You can't reliably track search engine robots and spiders.
Mitigating Page Tag Drawbacks
Technologies exist to allow for the automated insertion, validation and management of page tags which reduces the cost of deployment and maintenance.
Some page tags, like those used for the NetInsight web analytics software, give you all the advantages of any other tagging solution, but also allow historical data to be stored and re-processed.
Page tags may be hosted in your environment - tagging does not necessarily imply vendor-hosted
Log File Benefits
Captures Robot/Spider and RSS reader activity, useful for tracking click fraud, bots and understanding how your site is being indexed.
The web server will always produce logs if the server is operational
Log files capture web server errors and page redirects
Can confirm the loading of files such as pdf documents
No vendor lock-in as you can load log files into any log analyser
Log File Drawbacks
Tedious to filter out all robot/spider activity
No client browser information logged (other than the User Agent string)
No analysis of intra-page activity
Cookies (for visitor identification) must be managed by the web server / your application
Mitigating Log File Drawbacks
Log file analysis tools (such as NetInsight) have a comprehensive list of Robots / Spiders to isolate (not necessarily exclude as the data may have value, but separated from the Human/Visitor activity)
While it is more effort to set-up at the outset, the options for setting visitor identification cookies are broader than with page tags - for example setting 3rd-party cookies to track (as much as is possible) cross-domain activity.
Other options
Tags and logs aren't the only options, there are many variations and options that may prove useful - most notable would be the active (tag/content inserting) or passive (wire-sniffing) options. Generally these are provided as a separate device, either as an appliance in your environment, or, potentially, hosted by a third-party. Providing these data collection solutions provide data in a 'reasonable' format then the data can probably be loaded by any analytics package. Of course NetInsight can read many different formats of data and we have worked with several of the providers of these sorts of solutions to present the data that they record.
Ideal Approach
From a data-quality perspective, hosting simple page tag collection yourself and loading both the tag-based data, as well as the traditional web server log file-based data will give the best results, without limiting your options. Of course the cost and complexity of this solution may outweigh the benefits, so compromises may be necessary. What is clear is that (at the moment at least) that there is no clear-cut 'right' answer for everyone.


