Evidence for shallow voters, or mere exposure? November 15, 2007Posted by Johan in Applied, Face Perception, Social Psychology.
Picture by Brandt Luke Zorn, Wikimedia Commons
Iacoboni has gotten in trouble recently for some bizarre, non-peer reviewed and much publicised studies investigating voters’ neural reactions to the different presidential candidates. Vaughan noted that it is a little surprising that Iacoboni, who has done some fantastic work, would put his name on such weak research. I couldn’t help but be reminded of a post over at Dr Petra Boynton’s blog on the shameless proposals she has received from marketing companies. Essentially, the business model is that you as a researcher either gather some junk data yourself for handsome compensation, or alternatively, you simply sign off on a ready-made article. It is a credibility-for-cash transaction.
Unfortunately, such spin doctor stories might get in the way of real research on voter behaviour. In the latest issue of PNAS, Ballew and Todorov (2007) report that election outcomes can be predicted from fast face judgements in participants who know neither of the candidates. In other words, to some extent voting behaviour is influenced by quick judgments of appearance – maybe the guy with the better hair really does win. Although this study is very interesting, there are a few shortcomings that will be discussed at the end of this post.
Ballew and Todorov gathered pictures of the winner and the runner-up from 89 gubernatorial races. The pairs were shown to participants, who picked the candidate that seemed more competent (other measures were also used, but I’ll spare you the details). In order to avoid familiarity effects, Ballew and Todorov also included a check for whether the participants recognised any of the candidates. Trials in which the participant did recognise a candidate were excluded. The paper contains three experiments, of which I will cover the first two.
In experiment 1, participants were specifically instructed to base their decision on their gut feeling of which candidate would be more competent. The stimuli were presented for 100 ms, 250 ms, or until the participants responded.
Across all conditions, the competence judgements were significantly above chance (50 percent) in predicting the elected candidate. The three conditions did not differ significantly amongst themselves. Looking across all races, the participants’ averaged “vote” achieved an accuracy of 64 percent in predicting the election outcome. This may seem like a trivial increase over chance, but keep in mind that the participants based this decision on only a very brief exposure to an unfamiliar face. The fact that they could predict the winner suggests that voter behaviour is to some extent determined by the same type of fast, automatic evaluations.
In experiment two, Ballew and Todorov sought to investigate whether this effect could be modulated by the instructions that the participants received. Since Ballew and Todorov are advocating the notion that these judgments are automatic and fast, it becomes important to show that participants gain nothing when they have more time to plan their response. Thus, one group was instructed to deliberate carefully over their decision, and were given no time limits for viewing or responding. A response deadline group viewed the stimulus until they responded, which they had to do within 2 seconds. Finally, the 250 ms condition from experiment 1 was replicated for comparison.
In addition to this, Ballew and Todorov restricted the candidate photos to pairs in which the candidates shared the same gender and ethniticity. This was done since results in experiment 1 indicated that predictions were stronger for such pairs.
As in experiment 1, participants in all conditions were significantly more likely to pick a winning candidate. However, when investigating how each group’s “vote” predicted the election outcome, the deliberation group was not significantly above chance, while the two short-exposure non-deliberation groups were above chance, achieving an average accuracy of 70.9 percent between the two. In other words, careful deliberation and slow responding actually hindered performance.
I think these results are nice, since they offer an explanation for why candidates are so well-groomed (particularly the winners), even though no voter would ever admit to basing their choice on the candidate’s appearance. However, I see two issues with this research. First, although Ballew and Todorov asked their participants to rate competence, was this really what the participants were responding to? Given the fast processing that was necessary in the conditions where the participants performed well, it is perhaps unlikely that they were able to incorporate the instructions. Ballow and Todorov compared the ‘gut feeling’ instructions to a condition where participants were asked deliberate, but unfortunately they confounded the ‘instructions’ variable by giving the participants in the deliberation group unlimited time, in addition to different instructions effectively. It would also have been nice to see a control condition where participants indicated which face was more attractive rather than more competent, to show that participants were responding to something more abstract than attractiveness.
The second problem is more fundamental. Ballew and Todorov used participants from the US who viewed US gubernatorial candidates. In other words, it is likely that participants had been exposed to some of the candidates beforehand. We know from a phenomenon called the mere exposure effect that we tend to like things that we know better. It is not unlikely that winning candidates received more media exposure, so the participants may simply have responded to their increased familiarity with the winning candidate.
Ballew and Todorov tried to control for this by removing trials where the participants reported that they recognised the candidate, but this may be insufficient. Research on the mere exposure effect shows that even subliminal exposure to an object can increase self-rated liking for it. So even if the participants didn’t recognise the face, they may still have been exposed to it, and this may have biased their ratings. You might also think that winning candidates may have gained more exposure simply by acting as governor following the election. However, this account can be ruled out by the third experiment, which I haven’t reported here. Essentially, Ballew and Todorov replicated their findings with voters before an election.
To rule out mere exposure effects more conclusively, Ballew and Todorov would have done well to use candidates from local elections in other countries, where any kind of prior exposure would be more unlikely. You can’t help but feel that in using US voters and US guvernatorial candidates, Ballew and Todorov are sacrificing accuracy of measurement for face validity and impact. It is quite powerful to show that US voters respond this way to US candidates – it drives home the point that this is an effect that likely operates outside of the lab too. That being said, I’m not sure if this is a reasonable trade-off to make.
Finally, it’s worth noting that even if Ballew and Todorov’s results really do measure mere exposure (we would need to carry out more research to confirm that), that doesn’t render the findings invalid. It merely means that the mechanism that brings about the behaviour isn’t fast, automatic judgment of facial features, but fast, unconscious biasing based on prior exposure.
Ballew, C.C., and Todorov, A. (2007). Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences (USA), 104, 17948-17953.
It’s the socialising, not just the bingo: new take on brain training November 5, 2007Posted by Johan in Applied, Cognition, Social Psychology.
Brain training is everywhere these days. From the Nintendo DS phenomenon Dr Kawashima’s Brain Training to the Neuroscientist-endorsed Mindfit, it is suddenly obvious to everyone that giving your brain a proper workout is as important to warding off dementia as getting your pulse up a few times a week is to avoiding heart disease.
I confess to being skeptical. Will my brain really benefit if I suffer through a mind-numbingly tedious working memory task? I think it depends on what your alternatives are. If your alternative is to sit silent in front of the TV, I suspect you will benefit, but isn’t there some other, less boring activity that might also help your brain?
A paper by Ybarra et al (in press) suggests that the answer to that is yes, and the alternative is socalising. Ybarra et al combined correlational and experimental designs to arrive at this conclusion. First, they used questionnaire data to show a positive relationship between the number of social interactions and cognitive functioning. The relationship held for all age groups (24-96), while controlling for a range of other factors.
This is a nice finding, but since there is no experimental manipulation, it is just as valid to interpret the findings to mean that intelligent people socialise more. So Ybarra et al went a step further, and recruited participants for an experimental study.
Participants were randomly assigned to three groups, where each group spent 10 minutes carrying out their task: the social interaction group discussed a current political issue, while the intellectual group did reading comprehension and mental rotation tasks, along with a crossword puzzle. There was also a passive control group who simply spent 10 minutes watching Seinfeld.
In order to assess how these different tasks affected cognitive functioning, Ybarra et al estimated processing speed via a task where participants made same-different judgements about dots, and a working memory task, where participants were read sentences which they had to answer questions about, all the while keeping a section of the sentence in memory.
The table below gives the results.
While the scores for the social interaction and intellectual groups are similar, the passive control group appears to have fared worse. Indeed, significance testing revealed that on each task, the experimental groups did significantly better than the passive control group, while the social interaction and intellectual groups did not differ from each other.
It is worth noting that the intellectual task is quite similar to the type of tasks that brain training programs consist of. These results indicate that instead of suffering under Dr Kawashima, you might as well get into an argument over politics with a friend (The alternative and equally valid interpretation of the data is that watching Seinfeld rots your brain). Discussing politics might just be more fun in any case – Ybarra et al did ask the participants to rate the tasks, but found no significant difference in how much the participants liked their tasks. Still, I would argue that most people will choose a debate over a working memory task any day.
This study is quite inspiring in that a single 10-minute session of intellectual or social stimulation was enough to bring about significant differences in task performance. Furthermore, it really is a testament to the power of social interaction that the intellectual task group didn’t come out ahead, even though they had basically spent 10 minutes doing very similar tasks to the ones they were assessed with. However, a few caveats should be considered. First of all, although the intellectual task resembles actual brain training, they are not one and the same. I would love to see a direct comparison between something like Mindfit and the social interaction condition used here. Secondly, although I wasn’t entirely serious about the possibility of Seinfeld rotting your brain, the fact that performance was tested immediately following the 10-minute training session is potentially problematic. It may be that carrying out an activity, any activity, simply raises your overall awareness more than watching TV does. It would have been nice to see a re-test the following day. Finally, this test only shows an immediate effect. If social interaction is to be taken seriously as an alternative to brain training, more longitudinal studies are needed, where regular training over a longer time is used.
So to conclude, these results indicate that bingo isn’t only good for granny for this reason:
But also for this reason:
Ok, so this particular bingo game might not be to granny’s taste, but you get the point.
Of course, no one can sell you social interactions, so expect brain training business to continue as usual.
Ybarra, O., Burnstein, E., Winkielman, P., Keller, M.C., Manis, M., Chan, E., and Rodriguez, J. (in press). Mental exercising through simple socializing: Social interaction promotes general cognitive functioning. Personality and Social Psychology Bulletin.
Hearing limitations, pt. 2: Distinguishing MP3 from CD October 16, 2007Posted by Johan in Applied, Sensation and Perception, Social Psychology.
add a comment
As a continuation of the recent post on audiophiles, let’s look closer at how good we are detecting the compression in digital music formats.
Most music formats, such as MP3 or the AAC format used by iTunes, define the rate of compression as the number of bits that is used to encode each second of music. The standard bitrate, as used by the iTunes Music Store and elsewhere, is 128 kbit/s. Music geeks (myself included) tend to use slightly higher bitrates, while the proper audiophiles use lossless formats that compress the file without actually removing any information. Recently, Radiohead released their new album as a free download, only to experience some fan backlash for their choice of a 160 kbit/s bitrate. Critics bemoaned the fact that this was half as much as the 320 kbit/s rate that is used on the mp3s available for purchase on their website. By comparison, the bitrate of a normal audio CD is approximately 1411 kbit/s, so clearly a lot of information is removed.
But can you tell the difference? I dug out a few non-peer-reviewed sources to get an idea – if someone knows of peer-reviewed studies into this, I’d be interested to hear about them. The most serious source is probably this 1998 report from the international organisation for standardisation (PDF), which reports some evidence that participants could distinguish 128 kbit/s compression from the original, uncompressed source. Unfortunately, no tests were made above 128 kbit/s. More recent, but less rigorous tests have been reported by Maximum PC and PC World.
Maximum PC elected to report their results participant-by-participant, and with a sample size of 4, maybe that’s just as well. There isn’t enough data reported in this article to actually run a binomial or another significance test, but the overall conclusion seems to be that none of the testers did well at distinguishing 160 kbit/s from the original source.
PC world’s test actually contains some descriptives, and used a sample size of 30. However, they used some fairly obscure ways of reporting their results. Clearly, in a case like this one, the optimal method is to ask the participants to guess which file is the mp3 and which is the cd, and run a number of trials without feedback. With this approach, you can easily assess whether performance is over the level of chance (50%) for each bitrate. With this in mind, here are their results:
The percentages represent the proportion of listeners who “felt they couldn’t tell the difference” – once again, this measure is far from ideal. While we have no idea which of these differences are significant, the trend is that the differences in ratings flatten off: there appears to be no difference in quality between 192 kbit/s and 256 kbit/s, and in the case of MP3s, no real difference between 128 and 192.
These studies aren’t exactly hard science, they do seem to indicate that those complaining about Radiohead’s 160 kbit/s bitrate wouldn’t necessarily be able to distinguish it from CD quality, let alone a 256 kbit/s mp3. This illustrates the human tendency to overestimate our own perceptual ability – if we know that two things are different, we will find differences, imagined or otherwise. Blind testing is the only way to establish whether a genuine difference in sound quality exists, yet, this is very rarely done.
If you want to test your own ears, try these examples. With the above in mind, it would be best to get a friend to operate the playback, so that you can’t tell from the outset which file is which. If you run a large number of trials, you can also look up whether your performance is above chance in this Binomial probability table. In psychology, .05 is the commonly accepted p value, so as an example you would need to get 15 out of 20 trials correct for your performance to be significantly better than chance at this level.
Update: Dave over at Cognitive Daily has answered my prayers by carrying out a nicely designed test of performance at discriminating different bitrates. In a nutshell, his results confirm the ones reported here – Although there participants rated the 64 kbit/s tracks as significantly poorer in quality, no differences appeared between 128 and 256 kbit/s. Read the complete write-up here.
Audiophiles and the limitations of human hearing October 13, 2007Posted by Johan in Applied, Sensation and Perception, Social Psychology.
The other week Gizmodo posted an amusing rant about a set of $7250 speaker cables, and the gushing review they received. Among other things, the reviewer referred to the cables as “danceable.” James Randi soon popped around to offer his $1 million prize to the cable company, if they could prove that their cables outperform “normal” Monster cables in a double-blind test.
This is actually an issue of the limits of human perception. Is it really possible to tell the difference between normal high-end equipment, and equipment that veers into the audiophile range? It’s clear that according to many audiophiles, the answer is going to be yes. Wikipedia informs us that there are actually two schools among audio enthusiasts: the objectivist school, which favours double-blind testing, and subjectivists, who favour a more philosophical approach. The review that caused so much ire comes from Positive Feedback, an online magazine that concerns itself with the “audio arts” – guess which school they subscribe to.
Among subjectivist audiophiles, there is a belief that almost any change to the stereo setup results in a perceptible difference in sound. This results in bizarre behaviours, as in this picture from the Positive Feedback staff page:
Note how the speaker cables are carefully propped up on stilts to keep them off the floor, and how what looks like power amplifiers are propped up on massive slabs of wood (I can assure you those didn’t come from the local lumberyard). Another nice example comes from an article in Hi-Fi magazine Masters on Video and Audio:
“The [product] tightened up the sounds of a wide variety of equipment, the improvements often most noticeable in the bass. Imaging and focus usually improved, as did the interstitial quiet, which raised the level of overall palpability, air, and transparency.”
The product? Shelves.
There is one obvious objection to raise here: judging by the pictures of the reviewers on sites such as Positive Feedback, most of them are in their 40′s and beyond. As the following figure shows, this spells trouble:
I grabbed this from a lecture handout, so unfortunately I don’t know the source. The lines plot performance at detecting sounds over age, with each line representing a frequency. In this case “hearing level” is a standardised measure where normal hearing is at 0 dB. The clear pattern is that the higher frequencies disappear with age. This figure only goes up to 6 khz, but it’s worth noting that the human ear can hear up to 20 khz, and that the loss is more dramatic the higher up you go.
In other words, it’s not really worth trusting an audio reviewer who is older than you are, because there is a range of higher frequencies that you can hear while they cannot.
Apart from the overall lack of evidence and the sheer physical implausibility of some of the products, there is some classic research in social psychology that have implications for this topic.
Cognitive dissonance theory was primarily developed by Festinger. Briefly, the idea is that when the individual finds himself in a state where internal beliefs conflict with reality, there is dissonance, which is an unpleasant state. The individual may then employ a number of mechanisms to get around the dissonance, ranging from simply acknowledging that the beliefs were wrong to attacking the reality of external events, or devaluing the conflict.
The classic cognitive dissonance study is one where students perform a dull experiment, and are then paid a small or a large amount of money for telling the next participant that the experiment is actually fun (Festinger & Carlsmith, 1959). Surprisingly, students who are paid less actually rate the dull task as more interesting. In this case, the student finds himself (all males in Psychology studies in those days, generally) in a conflict: he has just done a boring experiment and lied to a fellow student for a very small reward. According to Festinger and Carlsmith, the student then reduces dissonance by re-evaluating the task. If the task was actually fun, then there is no dissonance between the student’s actions and beliefs.
The implication for consumer behaviour is that when your green $7250 cables arrive in the mail and you plug them in, finding that they do nothing would result in unacceptable dissonance. In fact, cognitive dissonance theory predicts that the more you pay for the cables, the more inclined you will be to conclude that they sound good, regardless of the actual quality of the cables. In this context, it is worth noting that the Positive Feedback website states that their policy is that reviewers should own the equipment they review, which is a very unusual policy in light of cognitive dissonance theory.
There is another classic social psychology study that is relevant here: Sherif’s investigation of the autokinetic effect (1935). To observe this effect, place yourself in an absolutely dark room with a single, faint light source. The spot of light will appear to move around as a result of small eye movements that your brain normally filters out. Sherif’s participants didn’t know about this however, so they really thought the light moved.
When the participants rated when the light moved individually, there was considerable variation between the participants in how far the light moved. Sherif then placed participants in groups and asked them to call out the movements of the light. Now, there was a convergence effect, so that the estimates of the different participants came closer to each other, and remained close in subsequent individual re-tests.
If you and your friend are listening to a new stereo and she mentions that the low bass sounds a bit flat, you are going to hear it too. The sound itself is ambiguous, not to mention the terminology that audiophiles use, so Sherif’s study suggests that in such situations, you will align with the group. You can imagine that this tendency to conform is quite useful in many real-life contexts, but it does mean that wine sampling and stereo testing are unlikely to reflect anything other than your tendency toward conformity. That doesn’t mean it can’t be fun, of course.
You can test this out yourself if you ever find yourself at a wine sampling. Make up associations: say the wine tastes like blackcurrant (always a a winner), sandal wood, tobacco, myrrh. As long as your ideas aren’t too far off, you will find that others suddenly experience the taste too.
While our senses are rather limited, our ability to fool ourselves is almost endless.
Festinger, L., and Carlsmith, J.M. (1959). Cognitive Consequences of Forced Compliance. Journal of Abnormal and Social Psychology, 58, 203-210.
Sherif, M. (1935). A study of some social factors in perception. Archives of Psychology, 27.