Typified by Selfridge’s pandemonium model, feature net models explain perception entirely in terms of data driven processes. They do not account for the complexity of a human environment, or explain the importance of context.
Both approaches are unified in a bi-directional model of pattern recognition, such as that offered by Mcclelland, Rumelhart, and Hinton (1986; cited in Gleitman 1999).
This system offers answers where there is ambiguity, and bi-directional inhibition of alternative detectors explains why priming effects result in faster reaction times. This model begins with a knowledge-driven hypothesis, which makes the visual system more sensitive to data from feature detectors. The important difference is that each level is capable of influencing any other level, in both directions, and thus the term parallel processing has be used. Thus when presented with the ambiguous dalmation figure figure, data driven processing alone is not able to organise the stimulus into a recognisable pattern, but after being told what to look for, the task is easily solved.
A major achievement of our visual system is the creation of a three dimensional reality, from a flat retinal image. Depth cues describe how this is achieved. There is evidence to suggest that depth cues in simpler animals such as salamanders is innate (see Spery, 1943; cited in Coren et al, 1994). Visual cliff experiments show that depth cues are used by a wide range of very young animals (Walk and Gibson, 1961; cited in Gleitman, 1999). Binocular depth perception has been found in humans as young as four months( Granrud, 1986; cited in Coren et al 1994). However, depth perception is improved, as with dark reared kittens, through learning during sensitive periods in an animals development (see Tees, 1974; cited in Coren et al, 1994).
Monocular or pictorial depth cues such as occlusion and familiar size offer information about relative distances in the world around us. Since these cues are learned they must be examples of top down processes. As evidence of this Turnbull (1961; cited in Chandler….) found that forest dwelling pygmies, unused to environments requiring long range depth perception, applied size constancy only over short distances. They interpreted a far off herd of buffalo as insects. Other learned cues are texture gradients, height in the plane, linear and aerial perspective
Binocular depth cues are now believed to be physiological. These include accommodation, convergence, retinal size and stereopsis.
Julesz (1959) illustrated that the visual system is able to reconstruct three dimensional perception from random dot stereograms , in the absence of any other depth cues. This occurs when the visual system is able to match and fuse the disparate images in each retina. In other words, stereopsis produces form and not the other way around. In support of this Bishop and Pettigrew (1986; cited in Coren et al, 1994) located disparity tuned detectors in the in the visual cortex of cats.
Marr and Poggio (1976) solved the matching problem of how the visual system is able to match which element in each eyes view belong together, and how alternative fusional possibilities are eliminated. Evidence of neurons that respond to crossed (near) and uncrossed (far) disparities have been found in monkeys (Poggio and Fischer, 1997; cited in coren et al, 1994). Their computational theory illustrates the level of complexity that an automatic and bottom up process is capable of, and led the way for developments in object recognition. Their model of object recognition allows for a 3 dimensional representation that is independent of the observer’s viewpoint. Earlier models such as template matching theories were confounded by the enormous variation that any one object has in three dimensions.
It is important to note that various depth cues may be used at once, and the effectiveness of a particular cue will be determined by it’s reliability in the past. Ittleson (1951; cited in Coren et al, 1994) presented participants with different sized playing cards in a dark room. Since all other depth cues were lacking the largest cards were perceived as being closer, thus illustrating how retinal size was interpreted in terms of past experience (familiar size). Our ability to combine depth cues then is learned.
Biederman’s Geon theory of object recognition (1987, 1990) is a development of computational theories. It reduces the information about the components of a visual object to 36 basic shapes or Geons, which are matched in the long term memory. The success of the theory relies on the understanding of non-accidental properties, those aspects of a perception that do not change as we look at an object from a different angles (for example symmetry, parallel lines, curvature). Biederman’s theory explains object recognition as consisting of various processes. Thus patients with visual agnosia are able to recognise each separate aspect of an object: a brush made of handle and bristles, but are incapable of organising the features into a recognisable whole.
However, the theory relies on bottom up processes and does not explain the importance of context in object recognition. This was illustrated by Bruner et al (1951; cited in Eysenck, M. 1998) who found that hypotheses or expectations (top down processes) might influence perception of colour. Thus playing cards of irregular colours, such as black hearts, were reported as brown or purple. The implication is that bottom up processing is used in optimal viewing conditions, and supplemented by top down processes, in less than optimal viewing conditions.
Gregory (1970, 1980) a constructivist, used the Muller-Lyer illusion to illustrate how context might be applied to perception, in this case in terms of misapplied size constancy. He suggests that the line that appears as longer reminds us of the inside corner of a room, thus we interpret it as further away than the other figure. Since the retinal images are the same size, the visual system mistakenly infers that the line on the right must be bigger. Experiments reported in 1966 by Segall, Campbell and Herskovitz (cited in Chandler) suggested that the Müller-Lyer illusion may be absent or reduced amongst people who grow up in right-angle free environments. This would indeed be evidence for knowledge driven influence on perception. However, Gross (1992; cited in Eysenck,M. 1998) found the same illusory effect when the arrows were replaced with other shapes, that could not represent the 3D corners.
Helmholtz, and later Gregory (1978) and Rock (1983) exemplify the Intelligent Perception approach. In their view, visual perception of space goes beyond the image itself to include previous experience and habitual cognitive processing strategies.
The main problem with the constructivist approach is that many of the top down effects described are produced in a lab, under brief exposure in conditions removed from everyday life. As Tulving et al (1964; cited in Gleitman, 1999) found, top down processing is important under ambiguous circumstances, such as brief exposure of a stimulus, where bottom up processes are reduced in effectiveness. But we would expect that perception would be inaccurate much more of the time, if expectation plays such an important role. This just isn’t the case in everyday life.
Gibson’s direct perception approach is a bottom up ecological theory, which suggests that automatic higher-order patterns between such elements as size, shape and distance remain the same as we move around, and are used to produce a coherent view of the world. They explain the phenomena of size and shape constancy. Thus, although the size of the retinal image may change as we move closer to the visual stimulus, other relationships between visual elements are invariant.
However,Size constancy can be found even when other relationships between object and background such as texture cues are absent from the visual scene. Also, as distance cues are reduced, size constancy is also reduced, (see Holway and Boring, 1947; cited in Gleitman, 1999).
Favouring one process over another in perception, as reflected in the opposition between constructivist and direct perception theories is too limiting to explain all of the evidence. Innate components exist and are important, but to achieve high levels of visual functioning, these components must mature, and experience allows this to happen. Both processes may occur together, or in sequence, but both must occur.
The idea that perception is immediate and begins with primitive features has been investigated in the work of Treisman through Feature Integration Theory. She suggests that primitive features do not have to be analysed or located, they jump in to perception effortlessly. Measuring response times in visual search tasks, differences in shape, colour, orientation and direction of movement embedded in a display, are perceived very quickly. Thus a single letter ‘O’ embedded in a display of many ‘Vs’ can be perceived as fast as a single ‘O’ between only two ‘Vs.’ This indicates that the visual system doesn’t have to inspect every detail to determine whether it has relevant properties and implies that this process is pre-attentive.
However, perception of a collection of features is not as immediate, and when presented with features such as a colour and a letter simultaneously for 200 m/s, certain illusory conjunctions were apparent: letters were identified, but their colours were often confused or swapped (see Treismann and Gelade, 1980; cited in Gleitman, 1999). Thus indicating that coordinating several features in a stimulus requires a discrete step that occurs after that of feature identification. This stage requires active focal attention. Importantly, the process is independent of the stimulus, and involves selecting a locus in space, and integrating the features there into a perceptual object. Feature integration is less prone to error when we know which objects to expect. This aspect of the process involves attention, which is an important conceptually driven aspect of perception.