jump to navigation

Domain specificity follows from interactions between overlapping maps March 10, 2008

Posted by Johan in Face Perception, Neuroscience, Sensation and Perception, Theory.
1 comment so far
Object-face by Multitude

ResearchBlogging.orgI can’t simplify the title beyond that, but don’t run away yet, the idea itself is straight forward once the terminology is explained. Skip ahead two paragraphs if you know what domain specificity means.

Recognition of objects in the visual scene is thought to arise in inferior temporal and occipital cortex, along the ventral stream (see also this planned Scholarpedia article on the topic by Ungerleider and Pessoa – might be worth waiting for). That general notion is pretty much where consensus ends, with the issue of how different object categories are represented remaining controversial. Currently, the dominant paradigm is that of Nancy Kanwisher and colleagues, who hold that a number of domain-specific (that is, modular) areas exist, which each deal with the recognition of one particular object category. The most widely accepted among these are the fusiform face area (FFA), the parahippocampal place area (PPA), the occipital face area (OFA), the extrastriate body area (EBA), and the lateral occipital complex (LO), which is a bit of a catch-all region for the recognition of any object category that doesn’t fall into one of the domains with their own area. Usually, the face-selective part of the superior temporal sulcus (STS) is also included.

Typical locations of object areas by category. B is an upside down down, C is flattened

This modular view of the visual recognition has received a lot of criticism. However, the undeniable success of the functional localiser approach to fMRI analysis, in which responses are averaged across all voxels in each of the previously-mentioned areas, has led to widespread acceptance of the approach. Essentially, then, the domain specific account seems to be accepted because recording from a functionally-defined FFA, for instance, seems to yield results that make a lot of sense for face perception.

When you think about it, the domain specific account in itself is a pretty lousy theory of object recognition. It does map object categories onto cortex, but it is considerably more difficult to explain how such a specific representation might be built on input from earlier, non-object specific visual areas. This brings us to today’s paper, which proposes a possible solution (op de Beeck et al, 2008). The bulk of the paper is a review of previous research in this area, so give it a read for that reason if you want to get up to speed. The focus of this post is on the theoretical proposal that op de Beeck et al (2008) make towards the end of the paper, which goes something like this:

Ventral stream areas contain a number of overlapped and aligned topographical maps, where each maps encodes one functional property of the stimulus. Op de Beeck et al (2008) suggest that properties might include shape, functional connectivity, process, and eccentricity. Let’s go through each of those suggestions in turn (the following is based on my own ideas – op de Beeck et al don’t really specify how the topography of these featural maps might work):

A shape map might encode continuous variations of for instance angularity and orientation of parts of the stimulus. So one imaginary neuron in this map might be tuned to a sharp corner presented at an upright orientation (see Pasupathy & Connor, 2002 for an example of such tuning in V4), and topographically, the map might be laid out with angularity and curvature as the x and y dimensions in the simplest case.

Functional connectivity is hard to explain – read the article I just linked if you’re curious, but let’s just call it brain connectivity here. A map of brain connectivity is a topographical layout of connections to other areas – for instance, one part of the map might be more connected to earlier visual areas (such as V4), while another part of the map might connect more with higher-order areas that deal with memory or emotion (e.g., hippocampus, amygdala).

The process map is a tip of the hat to some of Kanwisher’s strongest critics, such as Tarr & Gauthier (2000), who argued that the ventral stream isn’t divided by object category, but by the visual processing that is used. So for example, the FFA is actually an area specialised for expert within-category discrimination of objects (faces or otherwise), which happens to appear face-specific because we have more experience with faces than with other categories. Some parts of the map might deal with such expertise discriminations, while others might deal with more general between-category classification.

Eccentricity is a fancy term for distance from the fixation point (ie, the fovea) in retinal coordinates. If you hold your finger slightly left of your fixation point and continue to move it left, you are increasing the eccentricity of the stimulus. Eccentricity and its complicated partner polarity (visual angle) reflect the two basic large-scale topographical principles in early visual areas, but such maps can be found throughout the visual system.

Incidentally, the eccentricity map is the only of these proposed maps for which there is currently good evidence in this part of the brain (Levy et al, 2001). The part that corresponds to the FFA has a foveal (or central) representation of the visual field, which makes sense considering that we tend to look directly at faces. Conversely, the PPA has a peripheral representation, as might be expected since most of us don’t spend much time fixating on the scenery.

The central proposal is that in an area such as the FFA, the face-specific response is actually the combination of the concurrent, aligned activation of a number of different maps. For example, the FFA might correspond to responses tuned to rounded shapes in the shape map, to input from earlier visual areas in the functional connectivity map, to expert within-category discrimination in the process map, and to a foveal (central) representation in the eccentricity map.

To really get the kind of strong domain-specificity that is observed, these maps must display multiplicative interactions – op de Beeck et al (2008) suggest that if their simultaneous activations were just added to make up the fMRI response, you wouldn’t get the strong selectivity that is observed (so by implication, less strict modularists could do away with the multiplicative bit and get a map that corresponds better to their view of ventral areas).

This is a pretty interesting idea, although wildly speculative. Note that with the exception of eccentricity, there really is very little evidence for this form of organisation. In other words, this theory is a theory not just in the scientific sense, but also in the creationist sense of the word. It definitely is an inspiring source of possible future experiments, however.

Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organization of human object areas. Nature Neuroscience, 4, 533-539. DOI: 10.1038/87490

Op de Beeck, H.P., Haushofer, J., Kanwisher, N.G. (2008). Interpreting fMRI data: maps, modules and dimensions. Nature Reviews Neuroscience, 9, 123-135. DOI: 10.1038/nrn2314

Pasupathy, A., & Connor, C.E. (2002) Population coding of shape in area V4. Nature Neuroscience, 5, 1332-1338. Link

Tarr, M.J., & Gauthier, I. (2000). FFA: a flexible fusiform area for subordinate-level visual processing automated by expertise. Nature Neuroscience, 3, 764-769. DOI: 10.1038/77666


Learning to recognise faces: perceptual narrowing? January 11, 2008

Posted by Johan in Animals, Developmental Psychology, Face Perception, Sensation and Perception.
add a comment

Blogging on Peer-Reviewed Research That image certainly piques your interest, doesn’t it? Sugita (2008) was interested in addressing one of the ancient debates in face perception: the role of early experience versus innate mechanisms. In a nutshell, some investigators hold that face perception is a hardwired process, others that every apparently special face perception result can be explained by invoking the massive expertise we all possess with faces, compared to other stimuli. Finally, there is some support for a critical period during infancy, where a lack of face exposure produces irreparable face recognition deficits (see for example Le Grand et al, 2004). Unfortunately, save for a few unfortunate children who are born with cataracts, there is no real way to address this question in humans.

Enter the monkeys, and the masked man. Sugita (2008) isolated monkeys soon after birth, and raised them in a face-free environment for 6, 12 or 24 months. After this, the monkeys were exposed to strictly monkey or human faces for an additional month.

At various points during this time, Sugita (2008) tested the monkeys on two tasks that were originally pioneered in developmental psychology as means of studying pre-lingual infants. In the preferential looking paradigm, two items are presented, and the time spent looking at either item in the pair is recorded. The monkeys viewed human faces, monkey faces, and objects, in various combinations. It is assumed that the monkey (or infant) prefers whichever item it looks at more. In the paired-comparison procedure, the monkey is primed with the presentation of a face, after which it views a face pair, where one of the faces is the same as that viewed before. If the monkey views the novel face more, it is inferred that the monkey has recognised the other face as familiar. So the preferential looking paradigm measures preference between categories, while the paired-comparison procedure measures the ability to discriminate items within a category.

Immediately following deprivation, the monkeys showed equal preference for human and monkey faces. By contrast, a group of control monkeys who had not been deprived of face exposure showed a preference for monkey faces. This finding suggests that at the very least, the orthodox hard-wired face perception account is wrong, since the monkeys should then prefer monkey faces even without previous exposure to them.

In the paired-comparison procedure, the control monkeys could discriminate between monkey faces but not human faces. By contrast, the face-deprived monkeys could discriminate between both human and monkey faces. This suggests the possibility of perceptual narrowing (the Wikipedia article on it that I just linked is probably the worst I’ve read – if you know this stuff, please fix it!), that is, a tendency for infants to lose their ability to discriminate between categories which are not distinguished in their environment. The classic example occurs in speech sounds, where infants can initially discriminate phoneme boundaries (e.g., the difference between /bah/ and /pah/ in English) that aren’t used in their own language, although this ability is lost relatively early on in the absence of exposure to those boundaries (Aslin et al, 1981). But if this is what happens, surely the face-deprived monkeys should lose their ability to discriminate non-exposed faces, after exposure to faces of the other species?

Indeed, this is what Sugita (2008) found. When monkeys were tested after one month of exposure to either monkey or human faces, they now preferred the face type that they had been exposed to over the other face type and non-face objects. Likewise, they could now only discriminate between faces from the category they had been exposed to.

Sugita (2008) didn’t stop there. The monkeys were now placed in a general monkey population for a year, where they had plenty of exposure to both monkey and human faces. Even after a year of this, the results were essentially identical as immediately following the month of face experience. This implies that once the monkeys had been tuned to one face type, that developmental door was shut, and no re-tuning occurred. Note that in this case, one month of exposure to one type trumped one year of exposure to both types, which shows that as far as face recognition goes, what comes first seems to matter more than what you get the most of.

Note a little quirk in Sugita’s (2008) results – although the monkeys were face-deprived for durations ranging from 6 to 24 months, these groups did not differ significantly on any measures. In other words, however the perceptual narrowing system works for faces, it seems to be flexible about when it kicks in – it’s not a strictly maturational process that kicks in at a genetically-specified time. This conflicts quite harshly with the cataract studies I discussed above, where human infants seem to lose face processing ability quite permanently when they miss out on face exposure in their first year. One can’t help but wonder if Sugita’s (2008) results could be replicated with cars, houses, or any other object category instead of faces, although this is veering into the old ‘are faces special’ debate… It’s possible that the perceptual narrowing observed here is a general object recognition process, unlike the (supposedly) special mechanism with which human infants learn to recognise faces particularly well.

On the applied side, Sugita (2008) suggests that his study indicates a mechanism for how the other-race effect occurs – that is, the advantage that most people display in recognising people of their own ethnicity. If you’ve only viewed faces of one ethnicity during infancy (e.g., your family), perhaps this effect has less to do with racism or living in a segregated society, and more to do with perceptual narrowing.

Sugita, Y. (2008). Face perception in monkeys reared with no exposure to faces. Proceedings of the National Academy of Sciences (USA), 105, 394-398.

Visual Cortex: A Schematic Map October 22, 2007

Posted by Johan in Neuroscience, Sensation and Perception, Social Neuroscience.
1 comment so far

I came across this figure in a review by Grill-Spector and Malach (2004). It condenses an already-dense 40-page review into a single figure, so I would have to write a post of similar length to explain it entirely in laymen’s terms. This may be one post to skip if you haven’t the slightest idea of visual perception.

Even if you know your vision, this figure isn’t entirely straightforward. Still, I think it serves as a useful reference for those dense vision papers. With one or two notable exceptions, vision scientists insist on ridiculous naming conventions (the motion sensitive area hMT+ being the case in point), so this might help you remember the plot.

This map is only schematic. However, it represents the rough relationships, as they stood in 2004. The areas are mapped onto the right hemisphere occipital lobe, which has been flattened so that the dark areas represent sulcii (grooves), and the light gyrii. The posterior-anterior axis is sort of bottom-left to top-right, so V1 is (predictably) at the very back of the brain, while the Parahippocampal Place Area (PPA) is on the ventral (bottom) side.Height in this picture represents hierarchy in the processing, as Grill-Spector conceives of it. In other words, the first area is V1, and then we move up the stairs to V2, V3, and so on.

The colours code specialization. In the early areas (V1-V3), this is represented by central versus peripheral mappings, where the cortical magnification factor ensures that the centre is largest, and the highest acuity. Helpfully, they are labelled P-D (down) for the superior end of each map, and P-U (up) for the ventral end (your retinal image of the world is upside down, and apparently the visual system has no need to reverse this representation in later areas).

In later areas, the areas are filled in with one colour, presumably for simplicity – Grill-Spector actually believes that these have some retinotopic organisation as well. The colours still reflect specialization though – we can see that areas such as the Fusiform Face Area (FFA) and the Lateral Occipital complex (LO) are based on central, high-acuity representations, while other areas such as the PPA are based on more peripheral, lower-resolution representations. The letters that are strewn over the areas are meant to approximate locations of sensitivity to certain object categories: places (Pl), objects (O), and faces (F).

Do note that the Superior Temporal Sulcus (STS) is treated as somewhat of a black sheep, placed out in the corner with no colouring or height. This is probably because it is relatively poorly understood. The STS responds to biological motion, such a Johansson figures (see a demo), but its activation also appears to be strongly modulated by the social significance of the stimulus. For instance, Pelphrey et al (2005) found that the STS response in normal controls was greater when a face looked away from an obvious object rather than when gaze was directed towards it, which suggests that the STS does more than merely detect biological motion. Interestingly, people with Autism failed to show the same modulation by expectation in the STS.

The poor understanding of the STS is in part because it responds so specifically to biological motion, which makes conventional retinotopy techniques impossible. Also, I suspect there is a deep-rooted fear in some vision scientists of anything that starts with “Social.”

Another thing to note is the chasm between the last V area and the STS. Presumably, the intermittent areas are also involved in vision, but we don’t know much about what they do yet.

Grill-Spector, K, and Malach, R. (2004). The Human Visual Cortex. Annual Review of Neuroscience, 27, 649-677.

Pelphrey, K.A., Morris, J.P., and McCarthy, G. (2005). Neural Basis of Eye Gaze Processing Deficits in Autism. Brain, 128, 1038-1048.

Hearing limitations, pt. 2: Distinguishing MP3 from CD October 16, 2007

Posted by Johan in Applied, Sensation and Perception, Social Psychology.
add a comment

As a continuation of the recent post on audiophiles, let’s look closer at how good we are detecting the compression in digital music formats.

Most music formats, such as MP3 or the AAC format used by iTunes, define the rate of compression as the number of bits that is used to encode each second of music. The standard bitrate, as used by the iTunes Music Store and elsewhere, is 128 kbit/s. Music geeks (myself included) tend to use slightly higher bitrates, while the proper audiophiles use lossless formats that compress the file without actually removing any information. Recently, Radiohead released their new album as a free download, only to experience some fan backlash for their choice of a 160 kbit/s bitrate. Critics bemoaned the fact that this was half as much as the 320 kbit/s rate that is used on the mp3s available for purchase on their website. By comparison, the bitrate of a normal audio CD is approximately 1411 kbit/s, so clearly a lot of information is removed.

But can you tell the difference? I dug out a few non-peer-reviewed sources to get an idea – if someone knows of peer-reviewed studies into this, I’d be interested to hear about them. The most serious source is probably this 1998 report from the international organisation for standardisation (PDF), which reports some evidence that participants could distinguish 128 kbit/s compression from the original, uncompressed source. Unfortunately, no tests were made above 128 kbit/s. More recent, but less rigorous tests have been reported by Maximum PC and PC World.

Maximum PC elected to report their results participant-by-participant, and with a sample size of 4, maybe that’s just as well. There isn’t enough data reported in this article to actually run a binomial or another significance test, but the overall conclusion seems to be that none of the testers did well at distinguishing 160 kbit/s from the original source.

PC world’s test actually contains some descriptives, and used a sample size of 30. However, they used some fairly obscure ways of reporting their results. Clearly, in a case like this one, the optimal method is to ask the participants to guess which file is the mp3 and which is the cd, and run a number of trials without feedback. With this approach, you can easily assess whether performance is over the level of chance (50%) for each bitrate. With this in mind, here are their results:

The percentages represent the proportion of listeners who “felt they couldn’t tell the difference” – once again, this measure is far from ideal. While we have no idea which of these differences are significant, the trend is that the differences in ratings flatten off: there appears to be no difference in quality between 192 kbit/s and 256 kbit/s, and in the case of MP3s, no real difference between 128 and 192.

These studies aren’t exactly hard science, they do seem to indicate that those complaining about Radiohead’s 160 kbit/s bitrate wouldn’t necessarily be able to distinguish it from CD quality, let alone a 256 kbit/s mp3. This illustrates the human tendency to overestimate our own perceptual ability – if we know that two things are different, we will find differences, imagined or otherwise. Blind testing is the only way to establish whether a genuine difference in sound quality exists, yet, this is very rarely done.

If you want to test your own ears, try these examples. With the above in mind, it would be best to get a friend to operate the playback, so that you can’t tell from the outset which file is which. If you run a large number of trials, you can also look up whether your performance is above chance in this Binomial probability table. In psychology, .05 is the commonly accepted p value, so as an example you would need to get 15 out of 20 trials correct for your performance to be significantly better than chance at this level.

Update: Dave over at Cognitive Daily has answered my prayers by carrying out a nicely designed test of performance at discriminating different bitrates. In a nutshell, his results confirm the ones reported here – Although there participants rated the 64 kbit/s tracks as significantly poorer in quality, no differences appeared between 128 and 256 kbit/s. Read the complete write-up here.

Audiophiles and the limitations of human hearing October 13, 2007

Posted by Johan in Applied, Sensation and Perception, Social Psychology.

The other week Gizmodo posted an amusing rant about a set of $7250 speaker cables, and the gushing review they received. Among other things, the reviewer referred to the cables as “danceable.” James Randi soon popped around to offer his $1 million prize to the cable company, if they could prove that their cables outperform “normal” Monster cables in a double-blind test.

This is actually an issue of the limits of human perception. Is it really possible to tell the difference between normal high-end equipment, and equipment that veers into the audiophile range? It’s clear that according to many audiophiles, the answer is going to be yes. Wikipedia informs us that there are actually two schools among audio enthusiasts: the objectivist school, which favours double-blind testing, and subjectivists, who favour a more philosophical approach. The review that caused so much ire comes from Positive Feedback, an online magazine that concerns itself with the “audio arts” – guess which school they subscribe to.

Among subjectivist audiophiles, there is a belief that almost any change to the stereo setup results in a perceptible difference in sound. This results in bizarre behaviours, as in this picture from the Positive Feedback staff page:

Note how the speaker cables are carefully propped up on stilts to keep them off the floor, and how what looks like power amplifiers are propped up on massive slabs of wood (I can assure you those didn’t come from the local lumberyard). Another nice example comes from an article in Hi-Fi magazine Masters on Video and Audio:

“The [product] tightened up the sounds of a wide variety of equipment, the improvements often most noticeable in the bass. Imaging and focus usually improved, as did the interstitial quiet, which raised the level of overall palpability, air, and transparency.”

The product? Shelves.

There is one obvious objection to raise here: judging by the pictures of the reviewers on sites such as Positive Feedback, most of them are in their 40’s and beyond. As the following figure shows, this spells trouble:

I grabbed this from a lecture handout, so unfortunately I don’t know the source. The lines plot performance at detecting sounds over age, with each line representing a frequency. In this case “hearing level” is a standardised measure where normal hearing is at 0 dB. The clear pattern is that the higher frequencies disappear with age. This figure only goes up to 6 khz, but it’s worth noting that the human ear can hear up to 20 khz, and that the loss is more dramatic the higher up you go.

In other words, it’s not really worth trusting an audio reviewer who is older than you are, because there is a range of higher frequencies that you can hear while they cannot.

Apart from the overall lack of evidence and the sheer physical implausibility of some of the products, there is some classic research in social psychology that have implications for this topic.

Cognitive dissonance theory was primarily developed by Festinger. Briefly, the idea is that when the individual finds himself in a state where internal beliefs conflict with reality, there is dissonance, which is an unpleasant state. The individual may then employ a number of mechanisms to get around the dissonance, ranging from simply acknowledging that the beliefs were wrong to attacking the reality of external events, or devaluing the conflict.

The classic cognitive dissonance study is one where students perform a dull experiment, and are then paid a small or a large amount of money for telling the next participant that the experiment is actually fun (Festinger & Carlsmith, 1959). Surprisingly, students who are paid less actually rate the dull task as more interesting. In this case, the student finds himself (all males in Psychology studies in those days, generally) in a conflict: he has just done a boring experiment and lied to a fellow student for a very small reward. According to Festinger and Carlsmith, the student then reduces dissonance by re-evaluating the task. If the task was actually fun, then there is no dissonance between the student’s actions and beliefs.

The implication for consumer behaviour is that when your green $7250 cables arrive in the mail and you plug them in, finding that they do nothing would result in unacceptable dissonance. In fact, cognitive dissonance theory predicts that the more you pay for the cables, the more inclined you will be to conclude that they sound good, regardless of the actual quality of the cables. In this context, it is worth noting that the Positive Feedback website states that their policy is that reviewers should own the equipment they review, which is a very unusual policy in light of cognitive dissonance theory.

There is another classic social psychology study that is relevant here: Sherif’s investigation of the autokinetic effect (1935). To observe this effect, place yourself in an absolutely dark room with a single, faint light source. The spot of light will appear to move around as a result of small eye movements that your brain normally filters out. Sherif’s participants didn’t know about this however, so they really thought the light moved.

When the participants rated when the light moved individually, there was considerable variation between the participants in how far the light moved. Sherif then placed participants in groups and asked them to call out the movements of the light. Now, there was a convergence effect, so that the estimates of the different participants came closer to each other, and remained close in subsequent individual re-tests.

If you and your friend are listening to a new stereo and she mentions that the low bass sounds a bit flat, you are going to hear it too. The sound itself is ambiguous, not to mention the terminology that audiophiles use, so Sherif’s study suggests that in such situations, you will align with the group. You can imagine that this tendency to conform is quite useful in many real-life contexts, but it does mean that wine sampling and stereo testing are unlikely to reflect anything other than your tendency toward conformity. That doesn’t mean it can’t be fun, of course.

You can test this out yourself if you ever find yourself at a wine sampling. Make up associations: say the wine tastes like blackcurrant (always a a winner), sandal wood, tobacco, myrrh. As long as your ideas aren’t too far off, you will find that others suddenly experience the taste too.

While our senses are rather limited, our ability to fool ourselves is almost endless.

Festinger, L., and Carlsmith, J.M. (1959). Cognitive Consequences of Forced Compliance. Journal of Abnormal and Social Psychology, 58, 203-210.

Sherif, M. (1935). A study of some social factors in perception. Archives of Psychology, 27.