Dec 10, 2006

Not All Differences Are Created Equal - Problems in Detecting Deception

First, I want to apologize for this rather long post. But, there is an issue that has always bothered me when it comes to detecting deception. An issue which I don’t think gets near enough attention.

There is a fallacy underlying most of the research on detecting deception. And the fallacy goes something like this:

  • People’s nonverbal behavior changes when telling the truth versus lying.
  • If we can identify some of the nonverbal differences involved, then we train people to detect deception.
This argument seems pretty logical and straightforward, but it’s not.

There is already much debate surrounding the idea that the nonverbal cues of deception can be reliability identified. But, just for the sake of argument, let’s assume that it can be done.

Here’s where I think the real problem starts. There is a big difference between identifying nonverbal cues associated with deception, and being able to use those cues to detect deception.

Specifically, the problem is that the nonverbal cues which have been identified are based on statistically significant differences. These significant differences are not the same as diagnostic differences; that is, differences which can be used to distinguish group members from each other.

I am going try to explain this distinction using a concrete example, but first some basics about statistical significance and research on detecting deception.

To begin with, a significant difference refers to the idea that an observed outcome is probably not due to chance. And a great description of statistically significant differences can be found on the Cancer Guide website.

But, for our purposes let's use an example. Let’s assume that we watched two groups of 60 people each. People in the fist group were instructed to tell the truth about their favorite vacation and include as many details as possible. Now, the individuals in the second group were given the same instructions, but told to lie. We could videotape everyone’s stories and count how often certain types of nonverbal behaviors occurred. Watching the tapes, we might just notice that people in the lying group touched their face more often than people in the truth telling group.

And there might even be a significant difference between the two groups with respect to this nonverbal behavior. Let’s say that, on average, liars touched their face 5 times, while truth tellers only touched their face 3 times. Even though we’ve found a statistically significant difference – a difference that is unusual – this does not necessarily mean that we can use this information to detect deception. Significant differences cannot always be used in a diagnostic way. That is, in way to reliability distinguish group members (liars from truth tellers) from each other.

Ok, let me show you a concrete example of why this can’t typically be done.

I often teach the same course during the semester - a day course and a night course. And every time this happens, the students in the day class earn better grades than the students in the night class. I think this difference occurs because students taking night course are more likely to work full-time during the day and have little extra free time for studying.

Here are 2 sets of grades from the last time this happened (see, Table 1).

Table 1 - Scores from two separate classes (full data is provided in Table 3).

Day Students' Scores
Night Students' Scores
92.37
81.84
94.47
75.90
...
...
83.69
38.09
87.54
97.15

Now, the average scores for the students in the day class is 85.02 or a "B" while the average score for students in the night class is 80.31 - right around a "C+" or "B-."

And there is a significant difference in grades between these two classes – the difference observed is probably not due to chance (t[118]=1.679, p < .05). It’s a small difference, but it’s still a statistically significant difference. In other words, I can say with some confidence that students in my day class really did earn better grades than students in my night class. So, far so good. I’ve identified a significant difference that exists between two groups.

Or to think about it in terms of detecting deception, I’ve found two groups which statistically differ from each other – just like noticing a difference between liars and truth tellers with respect to some nonverbal behavior.

But, here is the twist and where the problem emerges. I’ve now combined the two classes and reordered their scores from highest to lowest (see, Table 2). You know the two groups of students are significantly different with respect to their grades, but can you tell them apart based on their scores?

In other words, can you reverse engineer the problem?

This is the same problem as trying to catch a liar by looking at his or her nonverbal behavior. Give it a try. Here are all of my students' scores. Which class does each student come from – the day class or the night class?

Table 2

Combined
Students' Scores
Guess What Class?
100.00 D or N
100.00 D or N
98.44 D or N
97.88 D or N
97.61 D or N
97.15 D or N
96.82 D or N
96.59 D or N
96.57 D or N
96.30 D or N
96.29 D or N
96.05 D or N
95.85 D or N
95.85 D or N
95.84 D or N
95.78 D or N
95.60 D or N
94.47 D or N
94.02 D or N
93.90 D or N
93.84 D or N
93.78 D or N
93.71 D or N
93.51 D or N
93.25 D or N
93.15 D or N
92.73 D or N
92.64 D or N
92.49 D or N
92.47 D or N
92.37 D or N
92.09 D or N
91.96 D or N
91.85 D or N
91.85 D or N
91.85 D or N
91.70 D or N
91.57 D or N
91.44 D or N
91.44 D or N
90.92 D or N
90.66 D or N
90.15 D or N
89.77 D or N
89.74 D or N
89.50 D or N
89.37 D or N
89.10 D or N
88.84 D or N
88.73 D or N
88.33 D or N
88.17 D or N
88.17 D or N
88.08 D or N
88.05 D or N
87.89 D or N
87.81 D or N
87.54 D or N
87.54 D or N
87.29 D or N
87.13 D or N
87.08 D or N
86.77 D or N
86.60 D or N
86.56 D or N
86.34 D or N
86.06 D or N
85.81 D or N
85.75 D or N
85.73 D or N
85.72 D or N
85.03 D or N
84.95 D or N
84.44 D or N
83.69 D or N
83.68 D or N
83.16 D or N
82.63 D or N
82.40 D or N
82.13 D or N
82.10 D or N
81.89 D or N
81.84 D or N
81.84 D or N
81.06 D or N
80.79 D or N
80.55 D or N
80.36 D or N
80.28 D or N
79.51 D or N
79.25 D or N
79.00 D or N
78.70 D or N
78.48 D or N
78.22 D or N
78.21 D or N
77.96 D or N
77.95 D or N
77.43 D or N
75.90 D or N
75.13 D or N
75.01 D or N
74.33 D or N
72.41 D or N
67.33 D or N
66.83 D or N
66.55 D or N
65.75 D or N
57.99 D or N
47.20 D or N
45.77 D or N
44.30 D or N
44.19 D or N
44.19 D or N
40.42 D or N
38.09 D or N
38.09 D or N
38.09 D or N
36.30 D or N
32.64 D or N

Now, even if you were to play it safe and assume that anyone with a grade above the middlemost score was most likely from my day class, and anyone below that score was in my night class… you’d still not get it right. Take a look at the data again, this time with the right answers provided.

Table 3 – Best Guess Plus Real Answer

Combined
Students' Scores
Best Guess
Actual Class
Correct Guess
100.00 D D Correct
100.00 D N Incorrect
98.44 D N Incorrect
97.88 D D Correct
97.61 D D Correct
97.15 D N Incorrect
96.82 D D Correct
96.59 D D Correct
96.57 D D Correct
96.30 D D Correct
96.29 D D Correct
96.05 D D Correct
95.85 D N Incorrect
95.85 D N Incorrect
95.84 D N Incorrect
95.78 D D Correct
95.60 D D Correct
94.47 D D Correct
94.02 D N Incorrect
93.90 D D Correct
93.84 D D Correct
93.78 D N Incorrect
93.71 D D Correct
93.51 D N Incorrect
93.25 D N Incorrect
93.15 D D Correct
92.73 D N Incorrect
92.64 D D Correct
92.49 D N Incorrect
92.47 D N Incorrect
92.37 D D Correct
92.09 D D Correct
91.96 D N Incorrect
91.85 D D Correct
91.85 D D Correct
91.85 D D Correct
91.70 D N Incorrect
91.57 D D Correct
91.44 D N Incorrect
91.44 D N Incorrect
90.92 D N Incorrect
90.66 D N Incorrect
90.15 D N Incorrect
89.77 D D Correct
89.74 D D Correct
89.50 D D Correct
89.37 D N Incorrect
89.10 D N Incorrect
88.84 D N Incorrect
88.73 D D Correct
88.33 D N Incorrect
88.17 D D Correct
88.17 D D Correct
88.08 D D Correct
88.05 D N Incorrect
87.89 D D Correct
87.81 D N Incorrect
87.54 D N Incorrect
87.54 D D Correct
87.29 D N Incorrect
87.13 N D Incorrect
87.08 N D Incorrect
86.77 N N Correct
86.60 N D Incorrect
86.56 N D Incorrect
86.34 N D Incorrect
86.06 N D Incorrect
85.81 N D Incorrect
85.75 N N Correct
85.73 N N Correct
85.72 N N Correct
85.03 N D Incorrect
84.95 N N Correct
84.44 N N Correct
83.69 N D Incorrect
83.68 N D Incorrect
83.16 N N Correct
82.63 N D Incorrect
82.40 N D Incorrect
82.13 N D Incorrect
82.10 N N Correct
81.89 N D Incorrect
81.84 N N Correct
81.84 N N Correct
81.06 N D Incorrect
80.79 N N Correct
80.55 N N Correct
80.36 N D Incorrect
80.28 N N Correct
79.51 N D Incorrect
79.25 N N Correct
79.00 N N Correct
78.70 N D Incorrect
78.48 N N Correct
78.22 N N Correct
78.21 N N Correct
77.96 N D Incorrect
77.95 N D Incorrect
77.43 N D Incorrect
75.90 N N Correct
75.13 N N Correct
75.01 N D Incorrect
74.33 N N Correct
72.41 N D Incorrect
67.33 N N Correct
66.83 N D Incorrect
66.55 N N Correct
65.75 N N Correct
57.99 N N Correct
47.20 N N Correct
45.77 N D Incorrect
44.30 N N Correct
44.19 N D Incorrect
44.19 N D Incorrect
40.42 N N Correct
38.09 N N Correct
38.09 N N Correct
38.09 N N Correct
36.30 N D Incorrect
32.64 N N Correct

Even in the best case scenario, you’d only be right 53.3% of the time. “Just guessing” or flipping a coin would get you that type of answer.

This example illustrates just one of the problems that can occur when crying to catch liars based on significant differences in nonverbal behavior. And decades of research on detecting deception reveals a very similar pattern of results. Significant differences between truth tellers and liars are identified. Training programs are created. Testing shows only modest gains in people’s ability to detect deception. In fact, most studies on detecting deception show that people are not very good at it – the accuracy rate is usually around 50 to 60%.

Personally, I believe the main problem underlying research on detecting deception is due to the fact that significant differences do not necessarily identify diagnostic differences.

For the most part, nonverbal cues associated with deception, can only be seen when looking at group averages, not specific individuals.

1 comment:

Juan Cristian said...

I just stumbled across your post while trying to figure out how to better unmask an ex-spouse that fabricated domestic violence charges against me two years ago. The state of the law in the US in this regard is a shambles. Apparently it is common knowledge among family lawyers that organizations intended to protect people from legitimate spousal abuse regularly help their clients fabricate declarations and statements intended to leverage their power and entitlement in the court system. This has certainly been the case with my ex-spouse.

The trouble is that judges could really use some education when it comes to statistics, because absent any real rules of evidence, they basically profile based on gender and then run roughshod over every law governing them without much fear of repercussion, since they are pretty much unsupervised...

So I have been looking around to see if these courts have anything that they accept as evidence from men, other than very expensive lawyers, and was thinking that there must be some sort of scientific determination that the courts cannot ignore - something along the lines of polygraph testing.

This is how I landed here, but the thing that impressed me about your explanation is that your point about the fact that statistically significant differences between groups are not the same as reliable predictors of individual behaviors. I'm pretty sure this is a variant on a basic rule of predicate logic - the common fallacy that if A=B then -A=-B. What it made me think of is black people in America and the notion in big urban centers where this is the case that since most crime is at least blamed on black people, then when you see a black person, they are most probably a criminal. Sad state of affairs. I think it is human nature to pre-judge, but when we pre-judge without an awareness of the ease with which we can believe the irrational to be rational (such as the aforementioned proposition that the black kid walking up the street must be up to no good), we become victims of our own thin veneer of rationality, misapplied. So the next time you see someone that you think is suspicious, reflect for a second and see if you're not trying to pigeonhole them based on the confusion between the observable and the diagnostic...