What You Miss When Ignoring Raw Log Files

Back in the day when I started as an SEO there wasn’t really a good website analytics system.  To understand what users were doing I had to learn to read raw access log files.  As time passed several companies launched analytics services but because of high prices the services were out of the reach of a lot of my clients. The most popular system was called Urchin and cost about $400 a month.  Urchin was so popular that Google decided to buy the company and then did the unthinkable – offered the system (branded as Google Analytics) for FREE. This put site usage information in the hands of any interested webmaster regardless of the size of their site. The system has become more and more sophisticated and many companies now employ people whose jobs are dedicated to web analytics.

All these years of graphics-heavy, easy to understand analytics have spoiled most SEOs.  There aren’t many who have been around long enough to have learned how to read raw log files so most SEOs don’t know anything about them.  You might think “Well, things change, analytics got better and everybody moved on”, but not so fast…  the user data is great but something big is missing; robots.  Most robots (including search engine robots like Googlebot) do not fire the analytics tracking code and are not recorded in pretty systems like Google Analytics. Webmasters have no idea how they are progressing through their sites, if something is causing them problems, and what URLs they are finding. Additionally, Without raw log files you have no way of knowing how much bandwidth bots are using. One client of mine was running their server ragged and the traffic reported in analytics didn’t correspond with the server’s load. The access logs showed that Google was accessing 120+ pages per minute, every minute during peak usage times – that was 5x the number of actual people on the site. Most SEOs rely on Google’s Webmaster Tools to report where the bots are having problems accessing their sites. The biggest problems will show up in the reports, but it’s likely that there are a bunch of things that Google just doesn’t feel like telling you about.  Things that could improve site performance for Google and for users.  Better performance means better ranking.

I’ve decided to address this topic over a few installments instead of in one massive post.  Check back soon for the next post in this series which will discuss the anatomy of a server access log.