STORY UPDATED: check for updates below.
Compare following two headlines:
Thе rеаsоn аrmy hеlіcоptеrs аrе nаmеd аftеr nаtіvе trіbеs wіll mаkе yоu smіlе
The reason Army helicopters are named after native tribes will make you smile
Notice anything different about them? OK, the "A" in "army" is capitalized in one case and not in the other but other than that they seem identical, no? That 'a' vs 'A' shouldn't matter to Google if you copy paste these titles into the search field and run search, right?
Let's try it out, here is the first headline:
OK, so apparently nobody wrote anything yet with that headline in it, at least not to Google's knowledge. Nothing strange going on, there are millions of hypothetical headlines that won't return a single Google search result.
Let's try the second one by simply typing it into Google (including the lower case 'a' so we can properly compare results):
Wait, what? 489.000 results? And why wasn't Google able to find that article on wearthemighty.com only seconds ago? What changed?
On Google's side nothing did. But the first headline is not as identical to the second one as you might think. Most of the vowels (a, e, i, o) from the Latin alphabet in it have been replaced with their visual equivalents in the Cyrillic alphabet (а, е, і, о). In most fonts there is no visual difference between these characters (i.e. they are Homoglyphs) but to a computer (or Google...) they look very different. Cyrillic characters are mainly used in Eastern-European countries like Russia and Ukraine but also in places like Macedonia or Kosovo.
Check what happens when we paste both headlines into a document on Google Docs and switch the font to "Syncopate":
Not only do the letters look different now but because Google does't recognize these weird mixed strings of Latin and Cyrillic characters as words anymore the built-in spellchecker goes crazy with the red wavy underlines (and for no apparent reason as long as you are using one of the standard fonts like Arial or Times New Roman).
So why would anyone want to do this? Doesn't using headlines like these make it harder or even impossible to be found in Google? Compare these two posts we found on Facebook:
The first one links to the original article on We Are The Mighty published in December 2017 (archived here), the second one points to a plagiarized copy on Native Animals which was published in March of 2018 (archived here). Crucially the second headline has the Cyrillic characters in it (although you couldn't tell by just looking at it).
"We Are The Mighty" is a website for the military community run by David M. Gale from Los Angeles. "Native Animals" on the other hand appears to be a site about Native American issues run out of... Kosovo by Mirsad Rexhepi (archived WHOIS data here).
The site is part of a growing list of fake Native American pages run out of places like Macedonia, Kosovo or Vietnam. It is seen as an easy way to make a little money with advertising. They get most of their traffic by reposting content stolen from elsewhere. Craig Silverman of Buzzfeed and Alex Kaplan of MediaMatters already wrote about this phenomenon if you want to learn more about it.
But why would a site go through the trouble of obfuscating their headline and make it harder to be found in Google? Don't they want the traffic maybe? We don't think so: these sites tend to get most of their visitors from Facebook and other social networks and their stolen content will rank much lower than the originals in Google anyway. But being visible in Google carries a risk for them: perhaps the original copyright owner will spot his articles being plagiarized leading to angry emails and DMCA requests to advertisers, Facebook's abuse department and website hosting providers.
Also: fact checkers like ourselves might be looking for new sites copying articles from known fake news websites by plugging the headlines into tools like our own Trendolizer or Buzzsumo in order to add these sites to various fake news watchlists. (Note: in this case we haven't fact checked the article, so we are not claiming it is true or false here). But plugging the original headline in Buzzsumo reveals several copycats:
When operating a fake news or plagiarism website you would much prefer it if copy pasting in your headline resulted in zero results, like this:
It keeps those pesky fact checkers and copyright lawyers off your back. At least, that was the theory. As you can see, we found out eventually...
: A previous version of this article erroneously mentioned Albania as the location in Mirsad Rexhepi's registration of the nativeallnews.com domain name.