blog

  news
  content management system
  asp.net
  general
  quickelsoft cms

newsletter
Subscribe via RSS or via email
            

Difference between raw log files and Google Analytics
 

I have been using Google Analytics for some time now to analyse the traffic of my web site and I also use another program (WebLog Expert) to analyse the raw log files provided by IIS.

 

To be able to use Google Analytics, you have to add a JavaScript code on each page you want to track. Once the page is loaded in a browser, this JavaScript is executed and the code adds to your page a blank image to call the Google Analytics server and adds all the metrics it can find within the browser (like the referrer, the screen resolution, …)

 

Here is the code I added to my page:

 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
 </script>
 <script type="text/javascript">
 _uacct = "UA-1062297-1";
 urchinTracker();
 </script>

It was not a surprised to see that the number of visitors reported by Google Analytics is only a quarter of what the log files reported.

Other site owners already saw this traffic difference.

 

But why?

The reasons invoked are almost that:

  • The visitors disable JavaScript in their browser. Google Analytics underestimates.
  • The raw log files contain also the traffic send by robots (like Feedburner) or search bots. Log file overestimates.

For the first reason, I really had doubts that so many people disabled JavaScript as more and more web sites rely on it.

For the second reason, I have put a series of filters in Web Log Expert to be sure not to take into account that traffic.

How? I use filters on the User Agent found in each hit in the log file and I do a reverse DNS on the requestor IP address to see if the request comes from Google, Live.com, Ask.com, …

 

So who is right ? How to test it ?

 

What did I test?
My traffic is pretty stable from week to week so I separated my test during four weeks to try to find an answer to this problem.
The first tests were to try different placements of the Google Analytics script to see if it affected the result.

 

First and second week: Position of the Google JavaScript in the page.

The first week, I put the script at the end of the page just before the tag </body>.

The second week, I moved the script just after the opening tag <body>.

If the script is at the end of your page and if the user stops the download of your page, then, Google Analytics is not called and maybe that is why I have a difference.

And the answer is “No”! During the two weeks, the statistics were pretty much the same.

So for my web site, the position of the script has no influence.


Third week: Google script is not working.
For the third week, I had the nerve to think: “Maybe it is the Google code which is broken”. I know Analytics is used by thousand and thousand of sites so it should have been debugged correctly but who knows?
To test if it was a problem with JavaScript or with Google, I added a simple JavaScript that download a specific image to my page.

<script>
 var tempImage = new Image();
 tempImage.src = "http://www.quickelsoft.com/images/testjs.gif";
</script>

I added this code just before the Google one.

Then, I tracked the file “testjs.gif” in WebLog expert to check if I have more visitors for this file for a specific day than the number of visitors Google Analytics reported and again, I did not see a lot of difference.

For some days, I had more visitors for “testjs.gif” and less in Analytics and sometimes, it was the opposite.

Thus, maybe then a lot of visitors disable JavaScript?

 

Fourth week.
I added even more filters in WebLog Expert just to take into account as user agent the hits coming from Internet Explorer and Firefox.

I have some visitors using Safari and Opera but they only count for 1% of my traffic.

But with these rules, I was also pretty sure that unless a robot uses a fake user agent to pretend to be a browser, only true human visitors will be in the WebLog Expert reports.
And again, I have 3-4 times more visitors with the raw log files.

 

Conclusion

I though the problem was JavaScript but:

  • My test with my own script gave the same number of visitors than Google.
  • How is it possible that Google is so successful with AdSense? AdSense is also based on JavaScript.

So where is the difference ?

At first, I wanted to prove that Google Analytics underestimated my traffic.

Why? Because the raw log files tends to show a better performance for my web site in term of visitors. Which is reassuring.

But in fact, it is Google Analytics which is right.

 

Update: WebLog Expert just launched a new version 5.1 which contains a spider list more complete. I have now almost the same result from Google Analytics and the log files.


Friday, January 04, 2008



Safety study of IE and Firefox
 

Jeff Jones published an analysis on of the vulnerabilities in Internet Explorer and Firefox.

 

I expected to find a list of security problems but I only found a “mine-is-better-than-yours” report. It could have been interesting but for me this kind of comparison is just a waste of time. Who will trust me if I denigrate my competitors? If I spend my time trying to prove that they failed while I still have to improve myself ?

 

It reminds me a story of Ingvard Kampar, founder of Ikea.
During its first years, at the time when Ikea did not design its own furniture, a pressure from the competitors caused suppliers to boycott Ikea. The competitors tried to "demolish" Ikea instead of improving themselves. Ikea continued to innovate and at the end, won.

 

What was more interesting in this report is the link to the National Vulnerability Database which tracks vulnerabilities found in various products. Useful!


Thursday, December 06, 2007



Blog Action Day and the environment
 

The site “Blog Action Day” asked every blogger to blog about the environment.

 

So, here is a link to a environmental web site using QuickelSoft CMS. This company offsets your CO2 emissions.

 

From there web site:

A practical example of carbon cancelling is the emissions caused by one's automobile. 
Individuals emit CO2 through the use of their car for their necessary daily chores.
Company cars emit CO2 whilst carrying out regular business activities which are
required for the survival of the corporation and its employees.
This all contributes to the global warming problem.


However by participating in green activities in developing countries one can offset
these effects by supporting e.g. biomass, wind power, solar power or hydropower
projects. These projects are a step in the right direction, they will slowly
be replacing the fossil fuels.

Monday, October 15, 2007