An Evaluation of the theories of viewpoint-invariant and viewpoint- dependent approaches of three dimensional objects

Authors Avatar by aaronscott (student)

Melanie Rodger    X4416846

An Evaluation of the theories of viewpoint-invariant and viewpoint- dependent approaches of three dimensional objects

We have the ability, with relative ease, to recognise a multitude of objects from several different and sometimes unfamiliar viewpoints. How do we achieve this and what cognitive processes are involved in enabling this achievement? Explanations that can definitively account for how we can compensate for the many changes to an image and still recognise it, remain at present elusive, but they continue to be a subject of great controversy and debate. Two main approaches to these issues have been adopted stating that object recognition is either viewpoint-invariant or viewpoint-dependent. A recent study has suggested we use a combination of the two. This essay will firstly give a description of these two main approaches. It will then attempt to compare two approaches to viewpoint-invariance, that of Marr and Nishihara (1978) and Biederman, later comparing these theories to the one of viewpoint-dependence. Briefly Gilbson and Foster’s (2002) combination theory will be discussed. The essay will conclude by summarising what has so far been learnt and suggest that maybe the visual system uses multiple system presentations for different tasks of object recognition, hence implying that a combination theory is possibly more salient in understanding the object recognition process.

As earlier mentioned, two main approaches have been adopted to address the issues of object recognition. One theory is said to be viewpoint-invariant, it suggests that there are specific cues, projecting enough information to allow for ease of recognition from any viewing angle (G.Pike and N. Brace 2005) As long as the appropriate invariants are recovered, recognition will be successfully achieved.  (Appropriate invariants will be discussed later)

The second approach postulates that recognition is viewpoint-dependent – that recognition occurs when novel features are compared to different feature representations from visual memory (Tarr and Bulthoff 1995) hence recognition depends upon viewpoint differences from how they were represented and how they appeared when they were originally learnt (William G Haywood 2003)

Marr and Nishihara’s (1978) viewpoint-invariant theory of object recognition is based upon the assumption that there  are four visual representations or modules increasing in both detail and complexity, formed during the recognition process ( G.Pike  & G Edgar 2005 p89 ch3)

The input of intensity light from each point of an image on the retina, enables the identification of edges and textures to be identified (the Primal Sketch) which then, they suggest, is used to formulate what Marr named ‘The 2 ½ D Sketch. (G.Pike & G.Edgar 2005 ch3 p86, 87)This sketch contains more detail and includes vital information such as orientation of visual surfaces and depth. According to Marr (1978), the information formulating the 2 ½ d Sketch is the simultaneous analysis of other information such as shading, depth, and binocular disparity, that, he states, is vital for the next stages of recognition. At this point the representation is viewer-centred, meaning that it relies on the exact angle from where an object input is on the retina. (G.Pike & G.Edgar 2005 ch3 p116). But viewer-centred representations do not allow for an object to be recognised from all viewing angles. So how does the brain process the input from irregular views?  Marr and Nishihara (1978) attempt to solve this problem by proposing that object recognition is based on the generation of 3D object-centred models that allow for the image to be recognised from all angles (viewpoint-invariant) To achieve this they suggest that an object must be described within a form of reference that is based on the shape itself. They termed this a ‘Canonical coordinate frame’. To achieve this, it is vital they state (G>Pike & N.Dawson ch4, p126) to define a central axis for the representation of the shape, restricted in Marr’s and Nishihara’s theory to easily described generalised cones.(G Pike & N Dawson ch4,p116,117) Axis generation and shape representation is ultimately formulated by using the information from the 2 ½ D sketch of occluding contours and combining this with concavity and convexity (primitives) locations (figure 4.16 G.Pike & N.Dawson 2005, ch4, p120) To understand how the global and detailed information necessary for a full description of an object is provided, Marr and Nishihara suggest that 3D model is hierarchical in nature. That a small number at the top of the hierarchy, for example a human body is progressively broken down into constituent parts of finer and finer scales. For example an arm can be decomposed down to a forearm, hand and ultimately fingers (fig 4.17 G pike & N.Dawson 2005)

Join now!

The final part of the processing involves the comparison of ‘the target’ (G.Pike & N.Dawson ch4, p123) to the hierarchically detailed ‘organised catalogue’ of previously seen objects stored within the visual memory.

It has been witnessed that according to Marr and Nishihara (1978) object viewpoint-invariant recognition is dependent on the ability to establish a central axis, hence an assumption would be that if this feature were difficult to locate, then in turn the recognition of the object would be impaired. The findings of Lawson and Humphrey’s (1996- cited in G.Pike & N.Dawson 2005) support this. Their study found that whilst ...

This is a preview of the whole essay