Optimize PDF Meta Info to Stand Out

It’s always good practice to provide as much information as possible to Google about your documents no matter the type.  PDFs are often overlooked and taking some time to optimize them will really make them stand out among the rest.  The same principals apply to PDFs as webpages regarding tag relevance.  Don’t just put your company’s slogan in the title and / or description tag because Google chooses to display title and description information that is relevant to the searcher’s query.  The best thing to do is be specific.  Use part numbers and exact part nomenclature wherever possible because the results for those types of searches is where PDFs are most likely to perform well.

Free PDF makers support the addition of meta information but in this post I will highlight Adobe Acrobat because that’s what we use.  Open a PDF in Acrobat and right click the document to bring up the pop-up menu.  Select “Document Properties” from the list.  This brings up a window that allows you to enter a lot of meta information.  Enter a title for the PDF that is about 55 characters long including spaces.  When properly optimized this will control the blue link in Google.  Next check out the “author” box.  Depending on the program you used to generate the PDF, this box will sometimes contain your name or initials or nothing at all.  It’s best to enter your company’s name in this box.  The subject is also very important to displaying a good listing in the results.  When properly optimized the information in the subject box will control the two-line blurb in Google.  Try to include relevant part numbers and specific nomenclature in the subject while keeping it to 150 characters or less. (including spaces)  Making this tag relevant to the user’s search will help ensure the subject tag is displayed in the results instead of what Google chooses on their own.

Document Properties box in Adobe Acrobat
Document Properties box in Adobe Acrobat

If you don’t enter this information into your PDFs, Google will try to index the content of the PDF and display what it thinks is relevant information.  When the document doesn’t contain indexable text, Google will do their best to run OCR (optical character recognition) and extract text to use.  That’s how you end up with descriptions like the one you see in the screenshot below.  The client had about 150 PDFs that would rank for some queries and all of them had this wonky capitalization, but the best part was that each was different!  The PDF had the text on a dark background and it seems to have wreaked havoc on Google’s OCR.  (The screenshot of the full listing is below.)

Google's attempt at OCR for this PDF yielded some awesome capitalization!
Google’s attempt at OCR for this PDF yielded some awesome capitalization! Click for a larger view.

As an interesting sidenote, back in 2008 I pointed out how capitalization was impacting search results followed by Google doing some live index editing to “fix” it.  According to Google, their searches are generally not case sensitive.  The set of results shown in the screenshot below is definitely case sensitive because it changes depending on which letters I capitalize in the query.  Interesting.

Full results to show the OCR capitalization craziness, redacted to protect my client.
Full results to show the OCR capitalization craziness, redacted to protect my client.