jump to navigation

Reducing the problem of face recognition to an average February 5, 2008

Posted by Johan in Applied, Cognition, Face Perception, Theory.
1 comment so far

ResearchBlogging.orgAlthough computer software is now adept at face detectionGoogle’s image search does it, and so does you camera if you bought it within the past year – the problem of recognising a face as belonging to a specific individual has proved a hard nut to crack.

Essentially, this is a problem of classification. A model for this process should be able to sort images of three persons into three separate categories. This is remarkably difficult to do. If you look at the sheer physical differences between images of the same person, they easily outnumber the differences between images of different persons, taken from the same angle under the same lighting conditions. In other words, the bulk of the physical variability between different face images is uninformative, as far as face recognition is concerned. Thus, this remains an area where humans effortlessly outperform any of the currently-available face recognition models.

Recent work by Mark Burton at the Glasgow Face Recognition Group suggests a solution by which computer models can achieve human-like performance at face recognition. By implication, such a model may also offer a plausible mechanism for how humans perform this task. The model that Burton et al (2005) proposed is best explained by this figure, which outlines the necessary processing steps:

For each face that the model is to learn, a number of example images are collected (as shown in A). These images are morphed to a standard shape (B), which makes it possible to carry out pixel-by-pixel averaging to create a composite (C). This composite is then used by the model to attempt to recognise a new set of images of the person.

This may sound relatively straight-forward, but the idea is novel. Most face recognition models that work with photographs use an exemplar-based algorithm, where the model stores each of the images it is shown. Such models do improve as more faces are added (since there are more exemplars that might possibly match), but not as much as an averaging model does as more pictures are added to the average (Burton et al, 2005). Furthermore, when noise is added in the form of greater variations in lighting, the exemplar model breaks down rapidly while the averaging model is largely unaffected.

Why is this model so effective? The averaging process appears to remove variability that is not relevant to personal identity (such as differences in lighting and shading, changes in hair style), while preserving information that is informative for recognition (eyebrows, eyes, nose, mouth, perhaps skin texture). The figure at the top of this post highlights this (from Burton et al, 2005). The pictures are shape-free averages, created from 20 exemplar pictures of each celebrity. To the extent that hair is present, it is usually blurry. But the pictures are eminently recognisable, even though you have in fact never seen any of these particular images before (since they are composites). Indeed, Burton et al (2005) showed that participants were faster to recognise these averages than they were at recognising the individual exemplar pictures.

In the latest issue of Science, Jenkins and Burton (2008) presented an unusual demonstration of the capabilities of this model. They pitted their model against one of the dominant commercial face-recognition systems (FaceVACS). The commercial model has been implemented at MyHeritage, a website that matches pictures you submit to a database of celebrities.

Jenkins and Burton (2008) took advantage of this by feeding the website a number of images from the Burton lab’s own celebrity face database. Note that the website is all about matching your face to a celebrity, so if an image of Bill Clinton from the Burton database is given as input, you would expect the face recognition algorithm to find a strong resemblance to the Bill Clinton images stored by MyHeritage. Overall, performance was unimpressive – 20 different images of 25 male celebrities were used, and the commercial face algorithm matched only 54% of these images to the correct person. This highlights how computationally difficult face recognition is.

In order to see how averaging might affect the model’s performance, Jenkins and Burton (2008) took the same 20 images and created a shape-free average for each celebrity. Each average was then fed into the model.

This raised the hit rate from 54% to 100%.

The model that Burton is advocating is really one where individual face images are recognised with reference to a stored average. This finding is essentially the converse – the commercial model, which attempts to store information about each exemplar, is used to identify an average. But there is no reason why it wouldn’t work the other way around.

This demonstration suggests that as far as computer science is concerned, the problem of face recognition may be within our grasp. There are a few remaining kinks before we all have to pose for 20 passport pictures instead of one, however: the model only works if each exemplar is transformed, as shown in the figure above. As I understand it, this process cannot be automated at present.

While we’re on the computer science side I think it is also worth mentioning that there may be some ethical implications to automatic face recognition, especially in a country with one CCTV camera for every 5 inhabitants (according to Wikipedia). I have always dismissed the typical Big Brother concerns with the practical issue of how anyone would have time to actually watch the footage. If, however, automatic face recognition becomes common-place, you had better hope that your government remains (relatively) benevolent, because there will be no place to hide.

Turning to psychology, the assertion by Burton et al is that this model also represents to some extent what the human face recognition system is doing. This sounds good until you realise that face recognition is not hugely affected by changes in viewing position – you can recognise a face from straight on, in profile, or somewhere in between. This model can’t do that (hence the generation of a shape-free average), so if the human system works this way, it must either transform a profile image to a portrait image in order to compare it to a single, portrait average, or it must store a number of averages for different orientations, which leads to some bizarre predictions (for example, you should have an easier time recognising the guy who sits next to you in lecture from a profile image, because that’s how you have usually viewed him).

That being said, this model offers an extremely elegant account of how face recognition might occur – read the technical description of FaceVACS to get a taste for how intensely complex most conventional face recognition models are (and by implication, how complex the human face recognition system is thought to be). The Burton model has a few things left to explain, but it is eminently parsimonious compared to previous efforts.

References
Burton, A.M., Jenkins, R., Hancock, P.J.B., & White, D. (2005). Robust representations for face recognition: The power of averages. Cognitive Psychology, 51, 256-284.

Jenkins, R., Burton, A.M. (2008). 100% Accuracy in Automatic Face Recognition. Science, 319, 435. DOI: 10.1126/science.1149656

Evidence for shallow voters, or mere exposure? November 15, 2007

Posted by Johan in Applied, Face Perception, Social Psychology.
2 comments

Picture by Brandt Luke Zorn, Wikimedia Commons

Blogging on Peer-Reviewed ResearchIacoboni has gotten in trouble recently for some bizarre, non-peer reviewed and much publicised studies investigating voters’ neural reactions to the different presidential candidates. Vaughan noted that it is a little surprising that Iacoboni, who has done some fantastic work, would put his name on such weak research. I couldn’t help but be reminded of a post over at Dr Petra Boynton’s blog on the shameless proposals she has received from marketing companies. Essentially, the business model is that you as a researcher either gather some junk data yourself for handsome compensation, or alternatively, you simply sign off on a ready-made article. It is a credibility-for-cash transaction.

Unfortunately, such spin doctor stories might get in the way of real research on voter behaviour. In the latest issue of PNAS, Ballew and Todorov (2007) report that election outcomes can be predicted from fast face judgements in participants who know neither of the candidates. In other words, to some extent voting behaviour is influenced by quick judgments of appearance – maybe the guy with the better hair really does win. Although this study is very interesting, there are a few shortcomings that will be discussed at the end of this post.

Ballew and Todorov gathered pictures of the winner and the runner-up from 89 gubernatorial races. The pairs were shown to participants, who picked the candidate that seemed more competent (other measures were also used, but I’ll spare you the details). In order to avoid familiarity effects, Ballew and Todorov also included a check for whether the participants recognised any of the candidates. Trials in which the participant did recognise a candidate were excluded. The paper contains three experiments, of which I will cover the first two.

In experiment 1, participants were specifically instructed to base their decision on their gut feeling of which candidate would be more competent. The stimuli were presented for 100 ms, 250 ms, or until the participants responded.

Across all conditions, the competence judgements were significantly above chance (50 percent) in predicting the elected candidate. The three conditions did not differ significantly amongst themselves. Looking across all races, the participants’ averaged “vote” achieved an accuracy of 64 percent in predicting the election outcome. This may seem like a trivial increase over chance, but keep in mind that the participants based this decision on only a very brief exposure to an unfamiliar face. The fact that they could predict the winner suggests that voter behaviour is to some extent determined by the same type of fast, automatic evaluations.

In experiment two, Ballew and Todorov sought to investigate whether this effect could be modulated by the instructions that the participants received. Since Ballew and Todorov are advocating the notion that these judgments are automatic and fast, it becomes important to show that participants gain nothing when they have more time to plan their response. Thus, one group was instructed to deliberate carefully over their decision, and were given no time limits for viewing or responding. A response deadline group viewed the stimulus until they responded, which they had to do within 2 seconds. Finally, the 250 ms condition from experiment 1 was replicated for comparison.

In addition to this, Ballew and Todorov restricted the candidate photos to pairs in which the candidates shared the same gender and ethniticity. This was done since results in experiment 1 indicated that predictions were stronger for such pairs.

As in experiment 1, participants in all conditions were significantly more likely to pick a winning candidate. However, when investigating how each group’s “vote” predicted the election outcome, the deliberation group was not significantly above chance, while the two short-exposure non-deliberation groups were above chance, achieving an average accuracy of 70.9 percent between the two. In other words, careful deliberation and slow responding actually hindered performance.

I think these results are nice, since they offer an explanation for why candidates are so well-groomed (particularly the winners), even though no voter would ever admit to basing their choice on the candidate’s appearance. However, I see two issues with this research. First, although Ballew and Todorov asked their participants to rate competence, was this really what the participants were responding to? Given the fast processing that was necessary in the conditions where the participants performed well, it is perhaps unlikely that they were able to incorporate the instructions. Ballow and Todorov compared the ‘gut feeling’ instructions to a condition where participants were asked deliberate, but unfortunately they confounded the ‘instructions’ variable by giving the participants in the deliberation group unlimited time, in addition to different instructions effectively. It would also have been nice to see a control condition where participants indicated which face was more attractive rather than more competent, to show that participants were responding to something more abstract than attractiveness.

The second problem is more fundamental. Ballew and Todorov used participants from the US who viewed US gubernatorial candidates. In other words, it is likely that participants had been exposed to some of the candidates beforehand. We know from a phenomenon called the mere exposure effect that we tend to like things that we know better. It is not unlikely that winning candidates received more media exposure, so the participants may simply have responded to their increased familiarity with the winning candidate.

Ballew and Todorov tried to control for this by removing trials where the participants reported that they recognised the candidate, but this may be insufficient. Research on the mere exposure effect shows that even subliminal exposure to an object can increase self-rated liking for it. So even if the participants didn’t recognise the face, they may still have been exposed to it, and this may have biased their ratings. You might also think that winning candidates may have gained more exposure simply by acting as governor following the election. However, this account can be ruled out by the third experiment, which I haven’t reported here. Essentially, Ballew and Todorov replicated their findings with voters before an election.

To rule out mere exposure effects more conclusively, Ballew and Todorov would have done well to use candidates from local elections in other countries, where any kind of prior exposure would be more unlikely. You can’t help but feel that in using US voters and US guvernatorial candidates, Ballew and Todorov are sacrificing accuracy of measurement for face validity and impact. It is quite powerful to show that US voters respond this way to US candidates – it drives home the point that this is an effect that likely operates outside of the lab too. That being said, I’m not sure if this is a reasonable trade-off to make.

Finally, it’s worth noting that even if Ballew and Todorov’s results really do measure mere exposure (we would need to carry out more research to confirm that), that doesn’t render the findings invalid. It merely means that the mechanism that brings about the behaviour isn’t fast, automatic judgment of facial features, but fast, unconscious biasing based on prior exposure.

References
Ballew, C.C., and Todorov, A. (2007). Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences (USA), 104, 17948-17953.

It’s the socialising, not just the bingo: new take on brain training November 5, 2007

Posted by Johan in Applied, Cognition, Social Psychology.
2 comments

Blogging on Peer-Reviewed Research

Brain training is everywhere these days. From the Nintendo DS phenomenon Dr Kawashima’s Brain Training to the Neuroscientist-endorsed Mindfit, it is suddenly obvious to everyone that giving your brain a proper workout is as important to warding off dementia as getting your pulse up a few times a week is to avoiding heart disease.

I confess to being skeptical. Will my brain really benefit if I suffer through a mind-numbingly tedious working memory task? I think it depends on what your alternatives are. If your alternative is to sit silent in front of the TV, I suspect you will benefit, but isn’t there some other, less boring activity that might also help your brain?

A paper by Ybarra et al (in press) suggests that the answer to that is yes, and the alternative is socalising. Ybarra et al combined correlational and experimental designs to arrive at this conclusion. First, they used questionnaire data to show a positive relationship between the number of social interactions and cognitive functioning. The relationship held for all age groups (24-96), while controlling for a range of other factors.

This is a nice finding, but since there is no experimental manipulation, it is just as valid to interpret the findings to mean that intelligent people socialise more. So Ybarra et al went a step further, and recruited participants for an experimental study.

Participants were randomly assigned to three groups, where each group spent 10 minutes carrying out their task: the social interaction group discussed a current political issue, while the intellectual group did reading comprehension and mental rotation tasks, along with a crossword puzzle. There was also a passive control group who simply spent 10 minutes watching Seinfeld.

In order to assess how these different tasks affected cognitive functioning, Ybarra et al estimated processing speed via a task where participants made same-different judgements about dots, and a working memory task, where participants were read sentences which they had to answer questions about, all the while keeping a section of the sentence in memory.

The table below gives the results.

While the scores for the social interaction and intellectual groups are similar, the passive control group appears to have fared worse. Indeed, significance testing revealed that on each task, the experimental groups did significantly better than the passive control group, while the social interaction and intellectual groups did not differ from each other.

It is worth noting that the intellectual task is quite similar to the type of tasks that brain training programs consist of. These results indicate that instead of suffering under Dr Kawashima, you might as well get into an argument over politics with a friend (The alternative and equally valid interpretation of the data is that watching Seinfeld rots your brain). Discussing politics might just be more fun in any case – Ybarra et al did ask the participants to rate the tasks, but found no significant difference in how much the participants liked their tasks. Still, I would argue that most people will choose a debate over a working memory task any day.

This study is quite inspiring in that a single 10-minute session of intellectual or social stimulation was enough to bring about significant differences in task performance. Furthermore, it really is a testament to the power of social interaction that the intellectual task group didn’t come out ahead, even though they had basically spent 10 minutes doing very similar tasks to the ones they were assessed with. However, a few caveats should be considered. First of all, although the intellectual task resembles actual brain training, they are not one and the same. I would love to see a direct comparison between something like Mindfit and the social interaction condition used here. Secondly, although I wasn’t entirely serious about the possibility of Seinfeld rotting your brain, the fact that performance was tested immediately following the 10-minute training session is potentially problematic. It may be that carrying out an activity, any activity, simply raises your overall awareness more than watching TV does. It would have been nice to see a re-test the following day. Finally, this test only shows an immediate effect. If social interaction is to be taken seriously as an alternative to brain training, more longitudinal studies are needed, where regular training over a longer time is used.

So to conclude, these results indicate that bingo isn’t only good for granny for this reason:

But also for this reason:

Ok, so this particular bingo game might not be to granny’s taste, but you get the point.

Of course, no one can sell you social interactions, so expect brain training business to continue as usual.

References
Ybarra, O., Burnstein, E., Winkielman, P., Keller, M.C., Manis, M., Chan, E., and Rodriguez, J. (in press). Mental exercising through simple socializing: Social interaction promotes general cognitive functioning. Personality and Social Psychology Bulletin.

Thanks to Flickr users monkey123, Keees, and aphrodite-in-nyc for fantastic pictures.

In Defense of Electroconvulsive Therapy October 30, 2007

Posted by Johan in Abnormal Psychology, Applied, Emotion.
2 comments

Blogging on Peer-Reviewed ResearchThe TED talks website contains material for a hundred posts, but a video posted earlier today hits particularly close to home. In this talk, Sherwin Nuland, a surgeon turned writer, gives an authoritative and unexpectedly personal account of the history of electroconvulsive therapy (ECT), sometimes known as electric shock therapy. The talk is only about 20 minutes, and gets very interesting around the 7 minute mark where Nuland describes how ECT once saved his life, as he puts it.

If the general public could be accused of placing too much trust in antidepressant medication, the reverse is certainly true of ECT. Ask anyone about electric shock therapy, and they’ll conjure up horror stories, and associations with frontal lobotomy. This is unfair, since there is some evidence that ECT actually works for depression.

The research on this issue has produced mixed results and plenty of controversy, as reviews by Challiner and Griffiths (2000) and by the UK ECT Review Group (2003) outline. However, there is no shortage of positive findings, and this in itself is rather remarkable, when you consider the patients that receive it. Since ECT is considered rather drastic, it is only really considered for patients who are severely depressed, and who have failed to respond to antidepressants. In other words, ECT is usually only considered in cases with the worst possible prognosis, so the fact that it does seem to help at times is quite powerful in itself, given the probability of spontaneous recovery from such conditions. That being said, a read of the ECT literature is unsatisfying. Because ECT is viewed as such a dramatic intervention (even in the absence of evidence that it causes long-term harm), it has rarely been tested on “normal” depressives in random control trials.

As Challiner and Griffiths (2000) outline, a lot of the popular conceptions of ECT are untrue. It doesn’t cause massive spasms – muscle relaxants are administered. It is not going to be a traumatic experience, because you will be put under a general anaesthetic. Although bilateral administration of ECT has been associated with memory loss, this does not appear to happen with unilateral administration, where both electrodes are kept on one side of the head (as shown in the picture at the top).

There is another issue with ECT, which I think bothers practitioners than clients. In the case of antidepressants, we at least know how they work, although it is far from clear why boosting synaptic Serotonin levels should work, given the weak evidence for a lack of Serotonin in depression. With ECT, there are no convincing explanations for either the how or the why. Psychiatrists stumbled upon ECT in the happy days of wild experimentation that preceded Ethics Committees, without much of a theory. It is quite embarrassing that even to this day, we can say so little about what this treatment does, or indeed if it even does anything at all – a pertinent question given the claim on Wikipedia that 1 million people receive ECT each year worldwide.

If I ever developed a severe depression, I would try ECT before antidepressants. Unlike antidepressants, the effects of ECT can be instantaneous, and there are no long-term side-effects, nor any withdrawal symptoms when the treatment ends. Since the treatment is extremely safe when administered properly, there is really very little to lose.

References
Challiner, V., and Griffiths, L. (2000). Electroconvulsive therapy: a review of the literature. Journal of Psychiatric and Mental Health Nursing, 7, 191-198.

The UK ECT Review Group. (2003). Efficacy and safety of electroconvulsive therapy in depressive disorders: a systemic review and meta-analysis. Lancet, 361, 799-808.

Hearing limitations, pt. 2: Distinguishing MP3 from CD October 16, 2007

Posted by Johan in Applied, Sensation and Perception, Social Psychology.
add a comment

As a continuation of the recent post on audiophiles, let’s look closer at how good we are detecting the compression in digital music formats.

Most music formats, such as MP3 or the AAC format used by iTunes, define the rate of compression as the number of bits that is used to encode each second of music. The standard bitrate, as used by the iTunes Music Store and elsewhere, is 128 kbit/s. Music geeks (myself included) tend to use slightly higher bitrates, while the proper audiophiles use lossless formats that compress the file without actually removing any information. Recently, Radiohead released their new album as a free download, only to experience some fan backlash for their choice of a 160 kbit/s bitrate. Critics bemoaned the fact that this was half as much as the 320 kbit/s rate that is used on the mp3s available for purchase on their website. By comparison, the bitrate of a normal audio CD is approximately 1411 kbit/s, so clearly a lot of information is removed.

But can you tell the difference? I dug out a few non-peer-reviewed sources to get an idea – if someone knows of peer-reviewed studies into this, I’d be interested to hear about them. The most serious source is probably this 1998 report from the international organisation for standardisation (PDF), which reports some evidence that participants could distinguish 128 kbit/s compression from the original, uncompressed source. Unfortunately, no tests were made above 128 kbit/s. More recent, but less rigorous tests have been reported by Maximum PC and PC World.

Maximum PC elected to report their results participant-by-participant, and with a sample size of 4, maybe that’s just as well. There isn’t enough data reported in this article to actually run a binomial or another significance test, but the overall conclusion seems to be that none of the testers did well at distinguishing 160 kbit/s from the original source.

PC world’s test actually contains some descriptives, and used a sample size of 30. However, they used some fairly obscure ways of reporting their results. Clearly, in a case like this one, the optimal method is to ask the participants to guess which file is the mp3 and which is the cd, and run a number of trials without feedback. With this approach, you can easily assess whether performance is over the level of chance (50%) for each bitrate. With this in mind, here are their results:

The percentages represent the proportion of listeners who “felt they couldn’t tell the difference” – once again, this measure is far from ideal. While we have no idea which of these differences are significant, the trend is that the differences in ratings flatten off: there appears to be no difference in quality between 192 kbit/s and 256 kbit/s, and in the case of MP3s, no real difference between 128 and 192.

These studies aren’t exactly hard science, they do seem to indicate that those complaining about Radiohead’s 160 kbit/s bitrate wouldn’t necessarily be able to distinguish it from CD quality, let alone a 256 kbit/s mp3. This illustrates the human tendency to overestimate our own perceptual ability – if we know that two things are different, we will find differences, imagined or otherwise. Blind testing is the only way to establish whether a genuine difference in sound quality exists, yet, this is very rarely done.

If you want to test your own ears, try these examples. With the above in mind, it would be best to get a friend to operate the playback, so that you can’t tell from the outset which file is which. If you run a large number of trials, you can also look up whether your performance is above chance in this Binomial probability table. In psychology, .05 is the commonly accepted p value, so as an example you would need to get 15 out of 20 trials correct for your performance to be significantly better than chance at this level.

Update: Dave over at Cognitive Daily has answered my prayers by carrying out a nicely designed test of performance at discriminating different bitrates. In a nutshell, his results confirm the ones reported here – Although there participants rated the 64 kbit/s tracks as significantly poorer in quality, no differences appeared between 128 and 256 kbit/s. Read the complete write-up here.

Follow

Get every new post delivered to your Inbox.