blog

  news
  content management system
  asp.net
  general
  quickelsoft cms

newsletter
Subscribe via RSS or via email
            

Difference between raw log files and Google Analytics

I have been using Google Analytics for some time now to analyse the traffic of my web site and I also use another program (WebLog Expert) to analyse the raw log files provided by IIS.

 

To be able to use Google Analytics, you have to add a JavaScript code on each page you want to track. Once the page is loaded in a browser, this JavaScript is executed and the code adds to your page a blank image to call the Google Analytics server and adds all the metrics it can find within the browser (like the referrer, the screen resolution, …)

 

Here is the code I added to my page:

 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
 </script>
 <script type="text/javascript">
 _uacct = "UA-1062297-1";
 urchinTracker();
 </script>

It was not a surprised to see that the number of visitors reported by Google Analytics is only a quarter of what the log files reported.

Other site owners already saw this traffic difference.

 

But why?

The reasons invoked are almost that:

  • The visitors disable JavaScript in their browser. Google Analytics underestimates.
  • The raw log files contain also the traffic send by robots (like Feedburner) or search bots. Log file overestimates.

For the first reason, I really had doubts that so many people disabled JavaScript as more and more web sites rely on it.

For the second reason, I have put a series of filters in Web Log Expert to be sure not to take into account that traffic.

How? I use filters on the User Agent found in each hit in the log file and I do a reverse DNS on the requestor IP address to see if the request comes from Google, Live.com, Ask.com, …

 

So who is right ? How to test it ?

 

What did I test?
My traffic is pretty stable from week to week so I separated my test during four weeks to try to find an answer to this problem.
The first tests were to try different placements of the Google Analytics script to see if it affected the result.

 

First and second week: Position of the Google JavaScript in the page.

The first week, I put the script at the end of the page just before the tag </body>.

The second week, I moved the script just after the opening tag <body>.

If the script is at the end of your page and if the user stops the download of your page, then, Google Analytics is not called and maybe that is why I have a difference.

And the answer is “No”! During the two weeks, the statistics were pretty much the same.

So for my web site, the position of the script has no influence.


Third week: Google script is not working.
For the third week, I had the nerve to think: “Maybe it is the Google code which is broken”. I know Analytics is used by thousand and thousand of sites so it should have been debugged correctly but who knows?
To test if it was a problem with JavaScript or with Google, I added a simple JavaScript that download a specific image to my page.

<script>
 var tempImage = new Image();
 tempImage.src = "http://www.quickelsoft.com/images/testjs.gif";
</script>

I added this code just before the Google one.

Then, I tracked the file “testjs.gif” in WebLog expert to check if I have more visitors for this file for a specific day than the number of visitors Google Analytics reported and again, I did not see a lot of difference.

For some days, I had more visitors for “testjs.gif” and less in Analytics and sometimes, it was the opposite.

Thus, maybe then a lot of visitors disable JavaScript?

 

Fourth week.
I added even more filters in WebLog Expert just to take into account as user agent the hits coming from Internet Explorer and Firefox.

I have some visitors using Safari and Opera but they only count for 1% of my traffic.

But with these rules, I was also pretty sure that unless a robot uses a fake user agent to pretend to be a browser, only true human visitors will be in the WebLog Expert reports.
And again, I have 3-4 times more visitors with the raw log files.

 

Conclusion

I though the problem was JavaScript but:

  • My test with my own script gave the same number of visitors than Google.
  • How is it possible that Google is so successful with AdSense? AdSense is also based on JavaScript.

So where is the difference ?

At first, I wanted to prove that Google Analytics underestimated my traffic.

Why? Because the raw log files tends to show a better performance for my web site in term of visitors. Which is reassuring.

But in fact, it is Google Analytics which is right.

 

Update: WebLog Expert just launched a new version 5.1 which contains a spider list more complete. I have now almost the same result from Google Analytics and the log files.