Problems of structural descripition theories
Some researchers believe that a major problem with structural description theories of object recognition is that
Object recognition overview
Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as ---- bitte auswählen ----
3D-based ; Neuroscience-based; object recognition; view based ; picture based; 2D-based; global statistic; Gestalt; summary statistic; structural description; early vision based;
models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called ______, published in ---- bitte auswählen ----
2010 2003 1968 1996 1982 1987 1971 1975 1979
by _____[surname only!]. An opposing theory – usually referred to as ---- bitte auswählen ----
Gestalt picture based global statistic structural description 3D-based summary statistic view based Neuroscience-based object recognition early vision based 2D-based
models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is _________[surname only!]. ---- bitte auswählen ----
Luckily Nicely Convincingly Unfortunately
both theories are ---- bitte auswählen ----
not; well; intimately; necessarily
connected to the findings, theories and models of early spatial vision.
Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the ---- bitte auswählen ----
view-based structural description Gestalt summary statistic global statistic early vision based
approach to object recognition.
Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as structural description models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called recognition-by-components , published in 1987 by Biederman [surname only!]. An opposing theory – usually referred to as view based models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is Tarr [surname only!]. Unfortunately both theories are not connected to the findings, theories and models of early spatial vision.
Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the view-based approach to object recognition.
Yamins et al.’s HMO-Model (PNAS 2014)
Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for ______. The HMO model belongs in the larger class of
---- bitte auswählen ---- DDN NDD NND DNN DND
models, standing for _______ model. The HMO model's essential architectural characteristic is its
---- bitte auswählen ---- computational complexity heterogeneity harmony homogeneity :
There are, for example, many
---- bitte auswählen ---- bypass feedback
connections and different parameter settings
---- bitte auswählen ---- only at different levels of the hierarchy ; even at the same level of the hierarchy .
---- bitte auswählen ---- However In addition Furthermore ,
the basic operations performed locally are
---- bitte auswählen ---- the same different heterogenous more or less the same
throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around
---- bitte auswählen ---- 5000 100 500 1.000 10.000 100.000
---- bitte auswählen ---- DNN parameterizations DNN architectures HMO model parametrizations HMO architectures
. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N =
---- bitte auswählen ---- 1.000 300 3.000 100 30
cells) as well as on how well the models categorized a set of images (roughly N =
---- bitte auswählen ---- 600 10.000 3.000 6.000 1.000
images). One central finding was that models optimized for
---- bitte auswählen ---- explained variance in IT categorization performance IT-cell predictivity discrimination performance
were also superior at
---- bitte auswählen ---- explaining variance in IT categorization performance IT-cell predictivity discrimination performance
.
In order to obtain a categorization performance from the HMO model, a
---- bitte auswählen ---- linear non-linear partially linear
decoder was
---- bitte auswählen ---- derived from trained on assumed to exist taken from
the activity of units at the
---- bitte auswählen ---- highest intermediate lowest across
level(s) of the HMO network. Using such a procedure, the HMO model's performance was
---- bitte auswählen ---- better only slightly worse on par
than that of
---- bitte auswählen ---- both computer vision and neuronally inspired computer vision neuronally inspired
models of object recognition on the difficult
---- bitte auswählen ---- low high
variation task.
Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for hierarchical modular optimization. The HMO model belongs in the larger class of DNN models, standing for deep neural network model. The HMO model's essential architectural characteristic is its heterogeneity: There are, for example, many bypass connections and different parameter settings even at the same level of the hierarchy. However, the basic operations performed locally are the same throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around 5000 DNN architectures. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N = 300 oder 100 cells) as well as on how well the models categorized a set of images (roughly N = 6.000 images). One central finding was that models optimized for categorization performance were also superior at explaining variance in IT.
In order to obtain a categorization performance from the HMO model, a linear decoder was trained on the activity of units at the highest level(s) of the HMO network. Using such a procedure, the HMO model's performance was better than that of both computer vision and neuronally inspired models of object recognition on the difficult high variation task.
Viewpoint invariance
Viewpoint invariance refers to the idea that
Subordinate level category terms
Which of the following is a subordinate level category term?
Time required for object recognition
Tarr and his colleagues found that the amount of time needed to recognize novel objects is at least partially determined by
Fundamentals of view-based theories
What are object representations made of, according to view-based theories of object recognition?
Specificity of IT cells
A study of cells in IT cortex of a human patient showed that they responded to very specific stimuli, such as
Properties of generalized-cone components in RBC Theory(2)
The essential non-accidental properties of the generalized-cone components are:
(more than one can be true)
Superordinate level category terms
Which of the following is a superordinate level category term?
Stages of visual processing
Which of the following is a loosely defined stage of visual processing that comes after basic features have been extracted from the image, and before object recognition and scene understanding?
Entry-level categorization terms
Which of the following is an entry-level category term?
Fundamentals of RBX theory
What are object representations made of, according to the recognition-by-components (RBC) model of object recognition?
Give examples:
Superordinate level:
Basic level:
Subordinate level:
Superordinate level: Animal
Basic level: Dog
Subordinate level: Golden Retriever
Failure of object recognition
---- bitte auswählen ---- Prosopagnosia Agnosia Anomia Alexia Dyslexia is a failure to recognize objects in spite of the ability to see them.
Agnosia
Central empirical findings about human object recognition
Typically … (choose all that are ture)
Problems of view-based theories
A major problem with naïve template theories of object recognition is that
Prosopagnosia
Prosopagnosia is a neuropsychological disorder in which the patient
Properties of generalized-cone components in RBC Theory (1)
One central property of the generalized-cone components is that they have so-called ---- bitte auswählen ---- generic non-accidental non-generic accidental properties, because these properties would ---- bitte auswählen ---- rarely often always never be produced by ---- bitte auswählen ---- non-accidental non-generic generic accidental alignements of viewpoint and object features. Thus RBC theory claims that certain properties of the ---- bitte auswählen ---- 3D 1D 2D image are taken by the visual system as strong evidence that the edges in the ---- bitte auswählen ---- 3D 1D 2D world contain the same properties.
One central property of the generalized-cone components is that they have so-called non-accidental properties, because these properties would rarely be produced by accidental alignments of viewpoint and object features. Thus RBC theory claims that certain properties of the 2D image are taken by the visual system as strong evidence that the edges in the 3Dworld contain the same properties.
Anantomy of object recognition
Evidence indicates that structures in ---- bitte auswählen ---- striate parietal occipital frontal inferotemporal cortex are especially important in end-stage object recognition processes.
inferotemporal
Zuletzt geändertvor 8 Monaten