2.5 Object Recognition - Sensation and Perception

MCAT Behavioral Sciences Review - Kaplan Test Prep 2021–2022

2.5 Object Recognition
Sensation and Perception


After Chapter 2.5, you will be able to:

· Compare and contrast bottom-up processing and top-down processing

· Describe each of the Gestalt principles: proximity, similarity, good continuation, subjective contours, closure, and prägnanz

Modern theories of object recognition assume at least two major types of psychological processing: bottom-up processing and top-down processing. Bottom-up (data-driven) processing refers to object recognition by parallel processing and feature detection, as described earlier. Essentially, the brain takes the individual sensory stimuli and combines them together to create a cohesive image before determining what the object is. Top-down (conceptually driven) processing is driven by memories and expectations that allow the brain to recognize the whole object and then recognize the components based on these expectations. In other words, top-down processing allows us to quickly recognize objects without needing to analyze their specific parts. Neither system is sufficient by itself: if we only performed bottom-up processing, we would be extremely inefficient at recognizing objects; every time we looked at an object, it would be like looking at the object for the first time. On the other hand, if we only performed top-down processing, we would have difficulty discriminating slight differences between similar objects. This distinction is also partially responsible for the feeling of déjà vu described in the introduction to this chapter: when we believe we are experiencing something for the first time, we expect to rely on bottom-up processing; however, when the mind is able to recognize an experience more quickly than expected (through top-down processing), the mind searches for a reason for this recognition. In other words, déjà vu is often evoked when we have recognition without an obvious reason: I know that guy from somewhere . . . but where? The distinction between top-down and bottom-up processing is relevant for all senses, but is most commonly applied in the context of vision.

Perceptual organization refers to the ability to create a complete picture or idea by combining top-down and bottom-up processing with all of the other sensory clues gathered from an object. Most of the images we see in everyday life are incomplete; often, we may only be able to see a part of an object and we must infer what the rest of the object looks like. By using what information is available in terms of depth, form, motion, constancy, and other clues, we can often “fill in the gaps” using Gestalt principles (described below).

Depth perception relies on a number of visual cues that are interpreted by the brain to deduce an object's distance. These visual cues are separated into monocular and binocular cues. Monocular cues only require one eye and include relative size, interposition, linear perspective, motion parallax, and other minor cues. Relative size refers to the idea that objects appear larger the closer they are. Interposition means that when two objects overlap, the one in front is closer. Linear perspective refers to the convergence of parallel lines at a distance: the greater the convergence, the further the distance. Motion parallax is the perception that objects closer to us seem to move faster when we change our field of vision (look at something else).

Real World

If you've ever looked out the side window of your car on a clear night, you've experienced motion parallax. Parallax is the reason why the objects on the side of the road are a blur as you drive past, why objects further away move more slowly as you pass them, and why the moon seems to follow your car as you drive along.

Binocular cues primarily involve retinal disparity which refers to the slight difference in images projected on the two retinas. This feature of depth perception is exploited in virtual reality (VR) devices: the images supplied to each eye are slightly different giving the perception of depth even though the VR device displays 2D images. A secondary binocular cue is convergence, in which the brain detects the angle between the two eyes required to bring an object into focus. If a person was looking at a distant object, both of their eyes would stare straight ahead. However if they were looking at something nearby (perhaps their own nose!) the left and right eyes would be held at an extreme angle. This difference in the degree of convergence is used to perceive distance.

The form of an object is usually determined through parvocellular cells and feature detection, and the motion of an object is perceived through magnocellular cells, as described earlier. Constancy refers to our ability to perceive that certain characteristics of objects remain the same, despite changes in the environment. For example, we perceive a white piece of paper as essentially the same color whether the paper is illuminated by fluorescent lights, incandescent bulbs, or sunlight—this type of constancy is called color constancy. We also have constancy for brightness, size, and shape, depending on context.


The brain constantly uses incomplete information to try to create a complete picture of the environment. Gestalt principles are a set of general rules that account for the fact that the brain tends to view incomplete stimuli in organized, patterned ways. There are dozens of Gestalt principles, but the highest-yield are summarized below and can be visualized in Figure 2.11.

ImageFigure 2.11. Gestalt Principles

The law of proximity says that elements close to one another tend to be perceived as a unit. In Figure 2.11a, we do not see ten unrelated dots; rather, we see a triangle and a square, each composed of a certain number of dots. The law of similarity says that objects that are similar tend to be grouped together. In Figure 2.11b, we see the big hollow dots as being distinct from the others, forming a triangle against a background of small filled-in dots. The law of good continuation says that elements that appear to follow in the same pathway tend to be grouped together. That is, there is a tendency to perceive continuous patterns in stimuli rather than abrupt changes. As seen in Figure 2.11c, our mind tends to break down this complex figure into a sawtooth line and a wavy line, rather than two lines that contain both sawtooth and wavy elements. Some researchers have argued that the phenomena of subjective contours may arise from this law. Subjective contours have to do with perceiving contours and, therefore, shapes that are not actually present in the stimulus. In Figure 2.11d, subjective contours lead to the perception of a white diamond on a black square with its corners lying on the four circles. Finally, the law of closure says that when a space is enclosed by a contour, the space tends to be perceived as a complete figure. Closure also refers to the fact that certain figures tend to be perceived as more complete (or closed) than they really are. In Figure 2.11e, we don’t see four right angles; instead, we see a square, even though the four sides aren’t complete. All these laws operate to create the most stable, consistent, and simplest figures possible within a given visual field. Taken altogether, the Gestalt principles are governed by the law of prägnanz, which says that perceptual organization will always be as regular, simple, and symmetric as possible.

MCAT Concept Check 2.5:

Before you move on, assess your understanding of the material with these questions.

1. How is sensory information integrated in bottom-up processing? Top-down processing?

o Bottom-up processing:

o Top-down processing:

2. Briefly describe each of the Gestalt principles below:

Gestalt Principle




Good continuation

Subjective contours