; Or, dispelling myths about how applicant tracking systems (should) work.
After publishing my last post, I decided to sort through all of the articles written about automated resumé screening software. What I began to realize was that, for the most part, nobody knows what they’re talking about. I followed links to cited works and found no evidence supporting claims–anywhere. I read a book that was referenced in an article. Its sources led me back to articles on questionable websites, some without authors. Many of the “experts” I was able to track down have dubious credentials.
My concern is that the advice isn’t supplied by those who have designed or have extensive experience with such software. As an engineer, my aim is to separate fact from fiction quite simply- by using existing parsing tools and building my own resumé screening functions. After a few hours of coding, we’ll be able to see how well popular advice holds up, or perhaps how inadequately some of the applicant tracking systems on the market function.
“Submit Your Resumé in .docx Format Instead of .pdf”
– hireright.com; Conclusion: Myth
Actually, discrepancies between versions of Microsoft Word can cause formatting errors, whereas PDF files always display the exact same way. PDF files are more complex for a program to parse through but the data is more structured, so the format is more useful. It’s likely that a .docx file would be converted to the standardized PDF before being run through, anyway. This claim is ridiculous.
“Don’t Use Headers or Footers, They Jam Most Parsing Algorithms”
– time.com; Conclusion: Myth
This is referring to Microsoft Word’s unique header and footer areas. You may be thinking “how difficult could it be to pull data from them?” The answer: not difficult at all. Some applicant tracking systems could be skipping the field, but it’s not going to break anything.Why?
When a .docx file is converted to a .pdf file (or an .xml file, which is also likely), the header section loses any special significance. At that point, it’s just a regular part of the document.
Just for fun, I looked into parsers specifically for Microsoft Word files. It turns out that the Python Docx library allows you to perform operations specifically on headers or footers. It’s wholly possible that a company’s screener skips the header and footer, but that begs the question- why?
“Don’t Use Graphics”
– ezinearticles.com; Conclusion: Myth
Most resumés are, objectively, better off without graphics. However, there’s no reason images should “jam” a parser. The idea that “extraction tools work on the basis of text recognition” wrongly suggests (by use of the word “recognition”) that a submitted file may be scanned visually by a program, but that will never happen. Your resumé can easily be broken down into machine-friendly objects, and it’s easy to preserve (or skip over) images.Why?
Using PDFtoText and PDFtoHTML, I was able to work with images flawlessly. As you may expect, PDFtoText skips over images, while PDFtoHTML preserves one. Here’s the text generated by PDFtoText followed by the format-preserved PDFtoHTML.
(The ^M character can be removed using sed, but I’ve left it in to demonstrate that something between the two bodies of text has been removed)
Tim^M O'Hearn^M ^M ^M ^M ^M ^M ^M ^M ^M There^M is^M a^M large^M apple^M in^M this^M resumé.^M
“Use Bullets Rather Than Paragraphs to Describe Your Work”
– biginterview.com; Conclusion: Myth
Though it is a general rule of thumb that I agree with, following this advice might actually be detrimental when dealing with screening software. There seems to be consensus that keyword-matching and, in more advanced cases, keyword contextualization, is the most important component of screening software. It’s going to be easier for a primitive program to assess your depth of experience if it’s contained in large paragraphs rather than bullets. Bullets being “easier for screeners to navigate” is a hard claim to substantiate.
“Use Web-Standard Fonts”
– mashable.com; Conclusion: Truth
As far as anyone should be concerned, this is a myth. I created PDF files using some of the most ridiculous fonts I could find online and I had no trouble parsing them. Any fonts that weren’t recognized (web friendly), were just converted to a basic typeface. I even had some luck with other languages, such as Mandarin, while parsing Greek and Arabic produced some extra characters. I want this to be an imaginary problem, but it is conceivable that some parsers will depend on fonts, or, at least, encoding schemes, being available. I don’t think this is something worth worrying about.
“Stuff Your Resumé with Invisible Text”
– jobs-resumes.wonderhowto.com; Conclusion: Myth
Software that parses text files can indeed detect font color. While you may be able to fool certain systems, it’s extremely risky, because, if you get caught, your resumé is getting thrown out. Further, if you do get the job and get found out later, this is grounds for getting fired.Why?
Here, I can use PDFtoHTML and then search through for any text marked as white. If you’re dealing with a multitude of background colors, it won’t be this simple. But, in those cases, you can detect background color and then heavily scrutinize any identical color settings, whether its white-on-white, black-on-black, or anything in between.
The “Invisible Keywords” are white
Tim O'Hearn's Resumé Visible qualifications - Hungry - Culinary Skills - Table Manners Invisible Keywords - CEO - Master Sommelier - Ph.D
pdftohtml InvisibleText.pdf output_folder sed -i.bak 's/^M//g' output_folder/*.html grep "color:#ffffff" output_folder*.html | sed -E 's/.*color:#ffffff;\">(.*)<\/span.*/\1/g'
Invisible Keywords CEO Master Sommelier Ph.D
“Applicant Tracking Systems Don’t Care About Resumé Length”
– hireright.com; Conclusion: Myth
If your company is using an applicant tracking system that doesn’t check how many pages a resumé is, I hope you’re not paying any money for it. This is a simple pre-check that can be performed programmatically before your resumé is even opened. If your resumé greatly exceeds conventional length and you make it to an interview with a real person, you’ll come off as a dweeb. This is terrible advice.Why?
Using a command-line utility called PDF Toolkit, a one line script will return the number of pages in a pdf file. Setting parameters in a program to only parse resumés within certain length parameters would be trivial.
pdftk Five_Page_Resume.pdf dump_data | grep NumberOfPages NumberOfPages: 5
Much of the advice out there regarding resumé screeners is either outdated, made up, or, worse, true. It’s entirely possible that some systems are designed so poorly that they are indeed plagued by common issues referenced in the above articles. As for what you can do to beat the system, focus on your resumé’s content rather than blindly implementing formatting tricks. Whenever possible, though, you should just contact a recruiter and avoid this mess entirely.
Share your thoughts in the comments below or contact me at Timothy.Joseph.OHearn@gmail.com
Disclaimer: This post was written with absolutely no direct knowledge of my current or any past employer’s resume screening procedures.