jump to navigation

Reducing the problem of face recognition to an average February 5, 2008

Posted by Johan in Applied, Cognition, Face Perception, Theory.

ResearchBlogging.orgAlthough computer software is now adept at face detectionGoogle’s image search does it, and so does you camera if you bought it within the past year – the problem of recognising a face as belonging to a specific individual has proved a hard nut to crack.

Essentially, this is a problem of classification. A model for this process should be able to sort images of three persons into three separate categories. This is remarkably difficult to do. If you look at the sheer physical differences between images of the same person, they easily outnumber the differences between images of different persons, taken from the same angle under the same lighting conditions. In other words, the bulk of the physical variability between different face images is uninformative, as far as face recognition is concerned. Thus, this remains an area where humans effortlessly outperform any of the currently-available face recognition models.

Recent work by Mark Burton at the Glasgow Face Recognition Group suggests a solution by which computer models can achieve human-like performance at face recognition. By implication, such a model may also offer a plausible mechanism for how humans perform this task. The model that Burton et al (2005) proposed is best explained by this figure, which outlines the necessary processing steps:

For each face that the model is to learn, a number of example images are collected (as shown in A). These images are morphed to a standard shape (B), which makes it possible to carry out pixel-by-pixel averaging to create a composite (C). This composite is then used by the model to attempt to recognise a new set of images of the person.

This may sound relatively straight-forward, but the idea is novel. Most face recognition models that work with photographs use an exemplar-based algorithm, where the model stores each of the images it is shown. Such models do improve as more faces are added (since there are more exemplars that might possibly match), but not as much as an averaging model does as more pictures are added to the average (Burton et al, 2005). Furthermore, when noise is added in the form of greater variations in lighting, the exemplar model breaks down rapidly while the averaging model is largely unaffected.

Why is this model so effective? The averaging process appears to remove variability that is not relevant to personal identity (such as differences in lighting and shading, changes in hair style), while preserving information that is informative for recognition (eyebrows, eyes, nose, mouth, perhaps skin texture). The figure at the top of this post highlights this (from Burton et al, 2005). The pictures are shape-free averages, created from 20 exemplar pictures of each celebrity. To the extent that hair is present, it is usually blurry. But the pictures are eminently recognisable, even though you have in fact never seen any of these particular images before (since they are composites). Indeed, Burton et al (2005) showed that participants were faster to recognise these averages than they were at recognising the individual exemplar pictures.

In the latest issue of Science, Jenkins and Burton (2008) presented an unusual demonstration of the capabilities of this model. They pitted their model against one of the dominant commercial face-recognition systems (FaceVACS). The commercial model has been implemented at MyHeritage, a website that matches pictures you submit to a database of celebrities.

Jenkins and Burton (2008) took advantage of this by feeding the website a number of images from the Burton lab’s own celebrity face database. Note that the website is all about matching your face to a celebrity, so if an image of Bill Clinton from the Burton database is given as input, you would expect the face recognition algorithm to find a strong resemblance to the Bill Clinton images stored by MyHeritage. Overall, performance was unimpressive – 20 different images of 25 male celebrities were used, and the commercial face algorithm matched only 54% of these images to the correct person. This highlights how computationally difficult face recognition is.

In order to see how averaging might affect the model’s performance, Jenkins and Burton (2008) took the same 20 images and created a shape-free average for each celebrity. Each average was then fed into the model.

This raised the hit rate from 54% to 100%.

The model that Burton is advocating is really one where individual face images are recognised with reference to a stored average. This finding is essentially the converse – the commercial model, which attempts to store information about each exemplar, is used to identify an average. But there is no reason why it wouldn’t work the other way around.

This demonstration suggests that as far as computer science is concerned, the problem of face recognition may be within our grasp. There are a few remaining kinks before we all have to pose for 20 passport pictures instead of one, however: the model only works if each exemplar is transformed, as shown in the figure above. As I understand it, this process cannot be automated at present.

While we’re on the computer science side I think it is also worth mentioning that there may be some ethical implications to automatic face recognition, especially in a country with one CCTV camera for every 5 inhabitants (according to Wikipedia). I have always dismissed the typical Big Brother concerns with the practical issue of how anyone would have time to actually watch the footage. If, however, automatic face recognition becomes common-place, you had better hope that your government remains (relatively) benevolent, because there will be no place to hide.

Turning to psychology, the assertion by Burton et al is that this model also represents to some extent what the human face recognition system is doing. This sounds good until you realise that face recognition is not hugely affected by changes in viewing position – you can recognise a face from straight on, in profile, or somewhere in between. This model can’t do that (hence the generation of a shape-free average), so if the human system works this way, it must either transform a profile image to a portrait image in order to compare it to a single, portrait average, or it must store a number of averages for different orientations, which leads to some bizarre predictions (for example, you should have an easier time recognising the guy who sits next to you in lecture from a profile image, because that’s how you have usually viewed him).

That being said, this model offers an extremely elegant account of how face recognition might occur – read the technical description of FaceVACS to get a taste for how intensely complex most conventional face recognition models are (and by implication, how complex the human face recognition system is thought to be). The Burton model has a few things left to explain, but it is eminently parsimonious compared to previous efforts.

Burton, A.M., Jenkins, R., Hancock, P.J.B., & White, D. (2005). Robust representations for face recognition: The power of averages. Cognitive Psychology, 51, 256-284.

Jenkins, R., Burton, A.M. (2008). 100% Accuracy in Automatic Face Recognition. Science, 319, 435. DOI: 10.1126/science.1149656


It’s the socialising, not just the bingo: new take on brain training November 5, 2007

Posted by Johan in Applied, Cognition, Social Psychology.

Blogging on Peer-Reviewed Research

Brain training is everywhere these days. From the Nintendo DS phenomenon Dr Kawashima’s Brain Training to the Neuroscientist-endorsed Mindfit, it is suddenly obvious to everyone that giving your brain a proper workout is as important to warding off dementia as getting your pulse up a few times a week is to avoiding heart disease.

I confess to being skeptical. Will my brain really benefit if I suffer through a mind-numbingly tedious working memory task? I think it depends on what your alternatives are. If your alternative is to sit silent in front of the TV, I suspect you will benefit, but isn’t there some other, less boring activity that might also help your brain?

A paper by Ybarra et al (in press) suggests that the answer to that is yes, and the alternative is socalising. Ybarra et al combined correlational and experimental designs to arrive at this conclusion. First, they used questionnaire data to show a positive relationship between the number of social interactions and cognitive functioning. The relationship held for all age groups (24-96), while controlling for a range of other factors.

This is a nice finding, but since there is no experimental manipulation, it is just as valid to interpret the findings to mean that intelligent people socialise more. So Ybarra et al went a step further, and recruited participants for an experimental study.

Participants were randomly assigned to three groups, where each group spent 10 minutes carrying out their task: the social interaction group discussed a current political issue, while the intellectual group did reading comprehension and mental rotation tasks, along with a crossword puzzle. There was also a passive control group who simply spent 10 minutes watching Seinfeld.

In order to assess how these different tasks affected cognitive functioning, Ybarra et al estimated processing speed via a task where participants made same-different judgements about dots, and a working memory task, where participants were read sentences which they had to answer questions about, all the while keeping a section of the sentence in memory.

The table below gives the results.

While the scores for the social interaction and intellectual groups are similar, the passive control group appears to have fared worse. Indeed, significance testing revealed that on each task, the experimental groups did significantly better than the passive control group, while the social interaction and intellectual groups did not differ from each other.

It is worth noting that the intellectual task is quite similar to the type of tasks that brain training programs consist of. These results indicate that instead of suffering under Dr Kawashima, you might as well get into an argument over politics with a friend (The alternative and equally valid interpretation of the data is that watching Seinfeld rots your brain). Discussing politics might just be more fun in any case – Ybarra et al did ask the participants to rate the tasks, but found no significant difference in how much the participants liked their tasks. Still, I would argue that most people will choose a debate over a working memory task any day.

This study is quite inspiring in that a single 10-minute session of intellectual or social stimulation was enough to bring about significant differences in task performance. Furthermore, it really is a testament to the power of social interaction that the intellectual task group didn’t come out ahead, even though they had basically spent 10 minutes doing very similar tasks to the ones they were assessed with. However, a few caveats should be considered. First of all, although the intellectual task resembles actual brain training, they are not one and the same. I would love to see a direct comparison between something like Mindfit and the social interaction condition used here. Secondly, although I wasn’t entirely serious about the possibility of Seinfeld rotting your brain, the fact that performance was tested immediately following the 10-minute training session is potentially problematic. It may be that carrying out an activity, any activity, simply raises your overall awareness more than watching TV does. It would have been nice to see a re-test the following day. Finally, this test only shows an immediate effect. If social interaction is to be taken seriously as an alternative to brain training, more longitudinal studies are needed, where regular training over a longer time is used.

So to conclude, these results indicate that bingo isn’t only good for granny for this reason:

But also for this reason:

Ok, so this particular bingo game might not be to granny’s taste, but you get the point.

Of course, no one can sell you social interactions, so expect brain training business to continue as usual.

Ybarra, O., Burnstein, E., Winkielman, P., Keller, M.C., Manis, M., Chan, E., and Rodriguez, J. (in press). Mental exercising through simple socializing: Social interaction promotes general cognitive functioning. Personality and Social Psychology Bulletin.

Thanks to Flickr users monkey123, Keees, and aphrodite-in-nyc for fantastic pictures.

Can research be both relevant and fun? April 29, 2007

Posted by Johan in Cognition, Economics.
add a comment

While most science bloggers were up in arms over Shelley’s successful campaign against Wiley, a bit of controversy has been stirring up over in economics. (I had no idea I was interested in economics, but judging by the amount of blogging that I’ve done on it, I am. Go figure) Noam Scheiber wrote an article in New Republic, subtly titled How Freakonomics is Ruining the Dismal Science. The article has now found its way online, thanks to a blogger who almost certainly is in violation of fair use, unlike the Retrospectacle head honcho.

For those of you who have somehow missed it, Freakonomics is a book that rogue economist Steve Levitt co-wrote with Steve Dubner. Essentially, it’s a collection of pop-science write-ups of studies Levitt has published over the years. This research, concerning unusual topics like the economics of drug-dealing and regression analyses that investigate whether sumo wrestling is rigged, turned out to have quite a bit of mass appeal, as Freakonomics promptly sold in marginally fewer copies than the Bible back in 2005.

Not everyone is so impressed. As the title hints at, Scheiber’s article is a scathing attack on Levitt’s research, with some borderline ad hominem elements. The article’s central thesis is that Levitt’s popular and academic success is part of a larger movement in economics that has had a dangerous influence on impressionable economics grad students. Apparently, they have now abandoned the rigorous and perhaps dull study of the macro-economy in favour of fast and fun studies of unusual topics, Freak-style. Scheiber argues that the consequence of this development is that method has become more important than theory. The studies no longer reveal anything of theoretical significance – instead they are novelties, getting attention not because of what they reveal, but because of how they reveal it. Oh, and along the way we also get to learn that Levitt has a squeaky voice and is a poor lecturer, in a perhaps less well-considered comment towards the end of the article.

Anyone who achieves success on the level of Levitt is bound to have a few scathing critics on the web, but the interesting bit about this particular case is that Levitt has responded to the Scheiber’s criticisms on the Freakonomics blog. Apart from responding to Scheiber’s ad hominems and pointing out a few inaccuracies (apparently, Scheiber does not have a PhD in economics and has never met Levitt, contrary to what his article seems to suggest), Levitt argues rather forcefully that the use of “clever” methods in no way precludes theoretical relevance. He points to a number of hard, real-life issues that his research has tackled (not citing the sumo study, surprisingly), in support of this claim.

In a way, Levitt is absolutely right. Many of the studies in Freakonomics are ones that, to quote the awarding criterion for the IgNobel prize that Levitt is sure to win sooner or later, makes you laugh and then think. For instance, a chapter in the book is dedicated to Levitt’s somewhat controversial notion that the vast drop in violent crime that the US experienced in the 1980’s and 90’s is a direct consequence of Roe vs. Wade, 10-20 years earlier. Levitt conjures up a range of statistics and deductive reasoning to support an argument that goes something like this:

1. If aborted fetuses are unwanted, the babies that were born before 1973 rather than being aborted were unwanted.
2. Unwanted children are at risk for crime and anti-social behaviour.
3. Thus, Roe vs. Wade meant that unwanted children were no longer being born at the same rate following 1973. This results in a drop in crime some 15-20 years later because that’s when the unwanted children would have otherwise started their criminal careers.

The argument is simple enough, but it is also quite original. Most people do have an initial visceral reaction to the notion of somehow equating unborn babies with potential criminals, but once you get past that point the idea is not entirely easy to refute.

To be fair, Scheiber has a point in that Levitt’s research is light on theory – this is something that Levitt himself admits to in Freakonomics. The controversial crime drop theory aside, most of the research in Freakonomics makes a practical point about real life, but cannot be easily fitted into the theoretical framework of economics. A lot of it is really best classified as sociology or political science. Perhaps part of the reason why Levitt seems to bother some economists is that he does this research as a professor of economics, often publishing his results in economics journals.

It’s not dissimilar to the way most empirically-based psychologists react to psychoanalysts, reflexologists, or even Dr Phil – by calling themselves psychologists, they contribute to a definition or a stereotype of psychology that many people in research detest. Much of the ire that both Levitt and Dr Phil receive from their peers is probably caused by the way they “make us look bad.” Neither one would get nearly as much of a reaction if they didn’t insist on calling themselves economist and psychologist, respectively.

Anyway, I wonder if they will be a Freakonomics of psychology. The best-selling psychology researchers, people like Pinker or Damasio, are perhaps better known for their style of writing and insight rather than for the sheer originality or wow-factor of their research. Still, there is some psychology research out there that would fit the bill – for one, Godden and Baddeley’s (1975) study on context dependency of memory comes to mind. In this study, divers encoded and recalled lists either over or under water, which produced a nice crosswise interaction: recall was superior when the encoding and recall context was identical, as can be seen below.

(Apologies for the poor quality)

Another prime example would be the (now numerous) studies that use a person in a gorilla suit to probe inattentional blindness (one example). The idea is to have the participants perform a demanding visual task, while casually letting a gorilla walk by. It is strikingly unusual that the participant reports having even seen the gorilla, when asked afterwards.

However, there is no real Levitt in Psychology, yet. Psychologists win IgNobels all the time, but it’s possible that most are too concerned with their reputation to don the gorilla suit for more than the odd study…

Godden, D., & Baddeley, A.D. (1975). Context-Dependent Memory in Two Natural Environments: On Land and Under Water. British Journal of Psychology, 71, 99-104.

Encephalon #20 at Neurontic April 9, 2007

Posted by Johan in Cognition, Neuroscience.
add a comment

The new Encephalon is out, with a very nice write-up.

I particularly liked Madam Fathom’s account of a recent neuronal theory of sensory integration, using dodgeball as an example.

The Object Recognition Demons March 8, 2007

Posted by Johan in Cognition, Off Topic, Sensation and Perception.
add a comment

This figure popped up in one of my lecture, and I thought it was a rather nice summary of how most theories of object recognition imagine that the process works (click for bigger version):

A bit easier to remember than your average boxes-and-arrows model, isn’t it! The traditional, functional box-and-arrow model is already being replaced by connectionist models, of course.. But I promise, once that fad blows over, the next paradigm is going to be little demons in your head. Psychologists will be scratching their heads, sweating over animation courses instead of programming languages.