Detecting Fraud and Deceit: The Fingerprints of Five Languages

Last month, I described how computer-aided textual analysis can help detect fraud and deception in company communications. But what other insights can we glean from this research into scandalous companies?

We used Deception and Truth Analysis (DATA) to examine 10 of the biggest corporate scandals in recent history and found that the average lead time between our textual identification of the deception and public recognition of the potential scandal was more than six years.

Subscribe Button

Corporate Scandals: The Time Between Textual Evidence and Public Confession

tapea companySize is in US millionsThe year of the scandalAverage alert score in the run-upThe average alert score before a scandalyears warning
LEHLehman Bros.$50,0002008-37.2%-3.8%13
He saysSatyam$1,4002009-28.9%-38.4%6
TYCTyco International$6002002-77.1%-81.7%7
wmWaste management$6,0001997-39.4%-41.1%2
the total$144,290middle-40.3%6.6

The question that arises is why. Why is it taking so long for regulators and markets to learn about these scandals? And a follow-up question: What insights from the text-based analysis can we use to better identify these scandals early on? Let’s take this in turn.

Theory: It’s behavior

Why does data spot deception faster than highly interested investors and regulators? After thinking about this for a while, we developed a theory. It boils down to 86.5%. This is the percentage of financial information that is expressed in texts, rather than numbers, in annual reports. Text communications reveal the behavior of the company’s management teams, and this behavior leads to the outcome, which is expressed in numerical performance.

So 6.6 years between the first indication of a scam and the time the scandal broke up is the average amount of time a badly behaved company can fake it until they can’t massage the numbers anymore.

What’s interesting is that the two scandals that took more than a decade to admit the two financial firms involved: AIG and Lehman Brothers. Their annual reports were hundreds of pages long, and the speed of money circulating through their balance sheets, income statements, and cash flows was very, very high. Thus, it took a long time for their bad behaviors and choices – the inputs – to finally show up in numbers or outputs.

If this theory is a valid interpretation of this hiatus, then the scandal should contain the hallmarks of language that investors can brush off either as an early warning system or as a second opinion about the normal underlying work that investment research teams do.

Tile The Current Issue Of Financial Analysts Magazine

Language that reveals a potential scandal

After examining the 10 scandals mentioned above as well as Wirecard and other recent controversies, we’ve identified five text fingerprints that differ from those of the most honest companies by more than 50%.

Scandal words and company connections

language fingerprintRelative injury
to average
Words denoting friendship+ 56.1%
Words that indicate risks+55.9%
Impersonal pronouns+ 54.1%
Words that indicate differences-53.6%
Words that negate a statement+50.4%

In addition to the text-based analysis, we also conducted one-on-one conversations to better distinguish deception from the truth and to identify some of the more comprehensive deceptive behaviors that people engage in. Our results are consistent with what previous lie detector researchers have found: that each of the five possible indicators of deception that emerge in the text-based analysis also occur in the interviews.

So let’s delve deeper into each of them.

1. Words denoting friendship

Lie detector researchers have shown that deceivers often use obfuscation to create confusion. One way to do this is to use words that indicate friendship more often than is usual in business communications. Scam companies use such terms 56.1% more often than average, according to our analysis. So if the annual report includes a number of disturbing terms, it may be evidence of opacity and deception.

But the distinction is important here: Words that denote friendship—”friend,” “pal,” “neighbor,” and “gang,” for example—are different from friendly words.

2. Dangerous words

Hardcore companies prefer words that denote risk far above the average company. These include terms like “aversion,” “avoidance,” “concern,” “difficulty,” “prevent,” “stop,” and so on. These types of words already tend to irk securities researchers, and as we pointed out in the last article, companies proactively exclude these types of “red flag” words from their annual reports.

3. Impersonal pronouns

“Else,” “everyone,” “someone,” and “whichever” are the type of impersonal pronouns that dishonest companies use to a much greater extent—54.1% more often—than their honest peers. Why do they prefer to be impersonal in their communications? The researchers believe that they are trying to create an emotional space between themselves and those they wish to mislead.

Tiles For Geographical Economics

4. Words that indicate difference

Lying requires cognitive. One manifestation of this is that, in the process of deception, the liar is often unable to distinguish between competing viewpoints in his communications and is therefore less likely to make comparisons. So using words that suggest difference is actually an indication of sincerity. Constructions that present contrasting perspectives—”Compared to other years…”—are examples.

Scammers also have an agenda: to convince their target to believe their favorite story. They are less likely to distinguish between other novels and will tend to focus on their favorite ones.

5. Words that negate a statement

Research also indicates that liars often use negative terms more than tell the truth. This is the reason for distinguishing between words denoting friendship and friendly words.

But researchers don’t always find that scammers are more passive than honest. However, our analysis of dishonest company communications indicates that they tend to use words like “no,” “never,” “shouldn’t,” “don’t,” and “shouldn’t” 50.4% more often than average.

Advertisement To Earn Investor Confidence Report


So what is the strongest indicator of deception? The number of swear words in the annual report. Although rare, swear words occur in scandalous company annual reports a whopping 277.1% more frequently than average.

If you liked this post, don’t forget to subscribe Venture investor.

All posts are the opinion of the author. As such, it should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of the CFA Institute or the author’s employer.

Photo credit: © Getty Images / Matthias Kulka

Professional learning for CFA Institute members

CFA Institute members are empowered to report self-earned and self-report Professional Learning (PL) credits, including content on Venture investor. Members can easily score credits using their online PL tracker.

Home Page




Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors