Buffl

Q7

D
von David

Object recognition overview


Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as    ---- bitte auswählen ----  3D-based  Neuroscience-based object recognition  view based  picture based  2D-based  global statistic  Gestalt  summary statistic  structural description  early vision based   models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called ______, published in   ---- bitte auswählen ----  2010  2003  1968  1996  1982  1987  1971  1975  1979   by _____[surname only!]. An opposing theory – usually referred to as   ---- bitte auswählen ----  Gestalt  picture based  global statistic  structural description  3D-based  summary statistic  view based  Neuroscience-based object recognition  early vision based  2D-based   models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is _________[surname only!].   ---- bitte auswählen ----  Luckily  Nicely  Convincingly  Unfortunately   both theories are   ---- bitte auswählen ----  not  well  intimately  necessarily   connected to the findings, theories and models of early spatial vision. 

Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the   ---- bitte auswählen ----  view-based  structural description  Gestalt  summary statistic  global statistic  early vision based   approach to object recognition.

Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as  structural description models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called recognition-by-components , published in 1987 by Biederman [surname only!]. An opposing theory – usually referred to as view based models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is Tarr [surname only!]. Unfortunately both theories are not connected to the findings, theories and models of early spatial vision. 

Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the view-based approach to object recognition.

Yamins et al.’s HMO-Model (PNAS 2014)


Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for ______. The HMO model belongs in the larger class of   ---- bitte auswählen ----  DDN  NDD  NND  DNN  DND   models, standing for _______ model. The HMO model's essential architectural characteristic is its   ---- bitte auswählen ----  computational complexity  heterogeneity  harmony  homogeneity  : There are, for example, many   ---- bitte auswählen ----  bypass  feedback   connections and different parameter settings   ---- bitte auswählen ----  only at different levels of the hierarchy  even at the same level of the hierarchy  .  ---- bitte auswählen ----  However  In addition  Furthermore  , the basic operations performed locally are   ---- bitte auswählen ----  the same  different  heterogenous  more or less the same   throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around   ---- bitte auswählen ----  5000  100  500  1.000  10.000  100.000     ---- bitte auswählen ----  DNN parameterizations  DNN architectures  HMO model parametrizations  HMO architectures  . Yamins et al. compared their models both to the response of cells in IT cortex (roughly N =   ---- bitte auswählen ----  1.000  300  3.000  100  30   cells) as well as on how well the models categorized a set of images (roughly N =   ---- bitte auswählen ----  600  10.000  3.000  6.000  1.000   images). One central finding was that models optimized for   ---- bitte auswählen ----  explained variance in IT  categorization performance  IT-cell predictivity  discrimination performance   were also superior at   ---- bitte auswählen ----  explaining variance in IT  categorization performance  IT-cell predictivity  discrimination performance  .

In order to obtain a categorization performance from the HMO model, a   ---- bitte auswählen ----  linear  non-linear  partially linear   decoder was   ---- bitte auswählen ----  derived from  trained on  assumed to exist  taken from   the activity of units at the   ---- bitte auswählen ----  highest  intermediate  lowest  across   level(s) of the HMO network. Using such a procedure, the HMO model's performance was   ---- bitte auswählen ----  better  only slightly worse  on par    than that of   ---- bitte auswählen ----  both computer vision and neuronally inspired  computer vision  neuronally inspired   models of object recognition on the difficult   ---- bitte auswählen ----  low  high   variation task.

Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for hierarchical modular optimization. The HMO model belongs in the larger class of DNN models, standing for deep neural network model. The HMO model's essential architectural characteristic is its heterogeneity: There are, for example, many bypass connections and different parameter settings even at the same level of the hierarchy. However, the basic operations performed locally are the same throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around 5000 DNN architectures. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N = 300 oder 100 cells) as well as on how well the models categorized a set of images (roughly N = 6.000 images). One central finding was that models optimized for categorization performance were also superior at explaining variance in IT.

In order to obtain a categorization performance from the HMO model, a linear decoder was trained on the activity of units at the highest level(s) of the HMO network. Using such a procedure, the HMO model's performance was betterthan that of both computer vision and neuronally inspired models of object recognition on the difficult high variation task.

Author

David

Informationen

Zuletzt geändert