In an earlier post I mentioned the importance of server access log files. I figured it was time for another installment in this little series. Today I’ll be discussing the anatomy of a log file.
Webserver access logs contain a lot of information and can become really large files, really fast. Most servers have a cap on how big the files can be to protect themselves from filling the drive with logs, and unfortunately, some shared environments don’t keep logs at all. A server log is a record of every interaction between your server and a web browser or a robot. Access logs can be formatted in several different ways, but they mostly have the same information. As I mentioned in my previous post, this is the only place you can really see what Googlebot does on a website because Googlebot doesn’t fire analytics. Depending on where your site is hosted and what type of server you’re on, access to your log files will vary. If you don’t know how to get to your site’s logs, ask your host if and where they are available.
Once you get your server log file downloaded you can open it with a text editor like PSPad. (In a future post I’ll talk about analyzing the log files with special software.) After you open the log file you’ll see something like the image below. It looks pretty scary, but since the info has standardized formatting, it’s pretty easy to access.
Let’s start with a single entry from the log file. Below is one line taken from my log file.
66.249.75.120 – – [03/Aug/2015:07:21:21 -0700] “GET /robots.txt HTTP/1.1” 200 915 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
Breaking that line down into its delimited fields makes it easier to understand. The chart below shows what each field means and the corresponding data.
According to the example, Googlebot requested my robots.txt file. That’s the first file Googlebot requests when visiting a website, so if I were to look through every entry after this one and look for that IP, I could follow Googlebot around my site. It can get a bit tough though because when a page is loaded, every resources called from the page is logged (scripts, etc.) so you have to really keep track of what’s going on, or use special log analysis software . When I first started out as an SEO, reading through the log file in Notepad was the only way to analyze these files. I had to to remember which IP I was keeping track of, find every log entry for that IP after a certain timestamp, and then either piece them together in my mind or pull them out into Excel. Fortunately times have changed and now there are quite a few programs and websites dedicated to making log file information less headache-inducing.
In my next installment I’ll discuss some of the different software and websites that will help you make sense of your log file and provide you with actionable information.