Buffl

alles

D
von David

Prior to the seminar work of [__________] and [__________] published in [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , researchers in pattern perception, often referred to as [__________] [no capital letters!], thought of the stimuli exclusively in the [space; Fourier] domain, in terms of [lines, corners and edges; spatial frequency content]. After the publication of "Application of Fourier Analysis to the Visibility of Gratings" in the Journal of Physiology, however, vision researchers up to this day always consider stimuli [also ; only; never] in the [Fourier; space] domain. Additional experimental data  [consistent; inconsitent] with the linear, independent multi-channel model came, e.g., from Blakemore and Campbell's [adaptation; masking; recognition; identification; detection] studies, or from the famous 1f, 3f and phase manipulation experiments by [__________] [surname only!] and [__________][surname only!], published in [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , or the elegant experiment by [__________][surname only!] and [__________] [surname only!] from [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , showing that [a single; two many] cycle(s) of a sine-wave grating could be [easier; equal] to detect than [many; a single; two] cycle(s) if the signal was [inhibited; masked; adapted away] by [narrow-band; wide-band; broad-band; white; pink] visual noise. Whilst there exists a large body of work [questioning; supporting; non-conclusive with respect to] the linear, independent multi-channel model, there are notable exceptions. One of the most prominent is a study by [__________] [surname only!] and colleagues from [1996; 1968; 1975; 1971; 1982; 1987; 2010; 2003; 1979] , based on an [__________] phenomenon, the "missing fundamental".

Prior to the seminar work of Campbell and Robson published in 1968 , researchers in pattern perception, often referred to as early spatial vision oder spatial vision [no capital letters!], thought of the stimuli exclusively in the space domain, in terms of lines, corners and edges. After the publication of "Application of Fourier Analysis to the Visibility of Gratings" in the Journal of Physiology, however, vision researchers up to this day always consider stimuli also in the Fourier domain. Additional experimental data  consistent with the linear, independent multi-channel model came, e.g., from Blakemore and Campbell's adaptation studies, or from the famous 1f, 3f and phase manipulation experiments by Graham  [surname only!] and Nachmias [surname only!], published in 1971 , or the elegant experiment by Carter [surname only!] and Henning  [surname only!] from 1971 , showing that a single cycle(s) of a sine-wave grating could be easier to detect than many cycle(s) if the signal was masked by narrow-band visual noise. Whilst there exists a large body of work supporting the linear, independent multi-channel model, there are notable exceptions. One of the most prominent is a study by Henning  [surname only!] and colleagues from 1975 , based on an auditory phenomenon, the "missing fundamental".

Bruce Henning and colleagues published a series of experiments in the mid 1970s which were [inconsitent; consistent] with the [non-independent; independent; correlated; covarying] [multi-channel; single-channel] model of Campbell & Robson. Henning et al.'s experiments were inspired by the "missing fundamental" in  [object recognition without the fundamental spatial frequency; auditory frequency discrimination; auditory pitch perception; motion perception; auditory aound localization] , and they used both [___________] (AM) as well as  [___________] (QFM) gratings as stimuli. Figure A shows the amplitude spectrum of (AM ; QFM ; both) gratings

. Figure B shows the appearance and the cross-section through a __ grating and its constituent 4f, 5f and 6f gratings, Figure C shows the respective graphs for a  ___ grating. Henning et al. reported to find strong interactions (masking) between  (a sine-wave with frequency 1f and a QFM ; a sine-wave with frequency 1f and an AM) grating composed of 4f, 5f and 6f: According to Campbell and Robson there should have been  (a, no) interaction (masking) between the stimuli, however. Furthermore, there was clearly less masking with a  (AM ; QFM)

grating as opposed to the  (QFM; AM) grating, and this should (not have ; have)

happened, pointing to the importance of  (phase relations ; …)

between the stimulus components, contrary to the findings of  (Campbell ; Robson; Nachmias ; Graham)

and  (Campbell ; Robson; Nachmias ; Graham) .






Bruce Henning and colleagues published a series of experiments in the mid 1970s which were inconsistent with the independent multi-channel model of Campbell & Robson. Henning et al.'s experiments were inspired by the "missing fundamental" in auditory pitch perception, and they used both amplitude modulated (AM) as well as quasi-frequency modulated (QFM) gratings as stimuli. Figure A shows the amplitude spectrum of  both AM and QFM gratings . Figure B shows the appearance and the cross-section through a QFM grating and its constituent 4f, 5f and 6f gratings, Figure C shows the respective graphs for a  AM grating. Henning et al. reported to find strong interactions (masking) between  a sine-wave with frequency 1f and an AM grating composed of 4f, 5f and 6f: According to Campbell and Robson there should have been  no interaction (masking) between the stimuli, however. Furthermore, there was clearly less masking with a  QFM grating as opposed to the AM grating, and this should not have happened, pointing to the importance of  phase relations between the stimulus components, contrary to the findings of  Graham and  Nachmias .

The graph shows one of the two central figures from Campbell & Robson (Fig. 3, p. 556). What is plotted for the open symbols are the [contrast thresholds; contrast sensitivities (1 over detection threshold); luminance thresholds; luminance sensitivities (1 over detection threshold)] on the y-axis against [line width in deg; spatial frequency in wavelength lambda; line width in cpd; spatial frequency in cpd; luminance modulation rate]. The open squares show the data for the [square-wave grating; sin-wave grating; rectangular-wave grating; sawtooth-grating], the open circles for the [square-wave grating; sin-wave grating; rectangular-wave grating; sawtooth-grating] . The filled black circles show the ratio of the [square-to-sine; square-to-rectangular; square-to-sawtooth; sin-to-rectangular; sin-to-sawtooth] [thresholds; sensitivities]. The solid black line marks the prediction at [4/pi; pi/4; 4/3; 3/4] derived from the Fourier series of the stimuli. The dashed line marks the prediction of a [simple corner detector; simple edge detector; multi-channel; simple peak-detector] model of early spatial vision.



The graph shows one of the two central figures from Campbell & Robson (Fig. 3, p. 556). What is plotted for the open symbols are the contrast sensitivities (1 over detection threshold) on the y-axis against spatial frequency in cpd. The open squares show the data for the [square-wave grating, the open circles for the sin-wave grating. The filled black circles show the ratio of the square-to-sine sensitivities. The solid black line marks the prediction at 4/pi derived from the Fourier series of the stimuli. The dashed line marks the prediction of a simple peak-detector model of early spatial vision.

Thorpe, Fize & Merlot published a study in Nature which exerted a very strong influence on the object recognition community. In their paper they showed that  

---- bitte auswählen ----  human observers  monkeys   cats  

could decide whether a previously unseen

  ---- bitte auswählen ----  photograph  line drawing  painting 

  of a natural scene contained an animal or not. The median reaction time (RT) of the observers was around  

---- bitte auswählen ----  400-500  100-200  200-300  500-600  600-700  

ms with a mean percentage correct of   

---- bitte auswählen ----  85-90  90-95  95-100  80-85

   % correct (note that the observers showed  

---- bitte auswählen ----  a slight  no  a strong  

speed-accuracy trade-off). Subsequent

  ---- bitte auswählen ----  ERP  fMRI  PET  multi-unit  single-unit  

analyses showed that roughly  

---- bitte auswählen ----  150  100  200  250  

ms after stimulus onset the measured neurophysiological correlate could already reliable signal the presence or absence of an animal in a post-hoc analysis. Thus processing of the natural scene stimulus was already completed after such a comparatively short time. According to the authors this result provides strong evidence in favour of essentially 

  ---- bitte auswählen ----  feedforward  dynamic feedback  multi-level feedforward & feedback  feedback  deep-belief neural network   

theories of visual object recognition. This, in turn, argues against object recognition theories requiring an explicit   

---- bitte auswählen ----  image segmentation  2D-to-3D  Fourier transform  Wavelet transform  multi-scale image decomposition  

step prior to recognition, as such a step is presumed to require  

---- bitte auswählen ----  time consuming  fast  computationally complex    

---- bitte auswählen ----  iterative  feedback  feedforward  non-linear processing  linear decomposition  fast Fourier  

algorithms.

Thorpe, Fize & Merlot published a study in Nature which exerted a very strong influence on the object recognition community. In their paper they showed that human observers could decide whether a previously unseen photograph of a natural scene contained an animal or not. The median reaction time (RT) of the observers was around 400-500 ms with a mean percentage correct of  90-95 % correct (note that the observers showed  a slight speed-accuracy trade-off). Subsequent ERP analyses showed that roughly 150 ms after stimulus onset the measured neurophysiological correlate could already reliable signal the presence or absence of an animal in a post-hoc analysis. Thus processing of the natural scene stimulus was already completed after such a comparatively short time. According to the authors this result provides strong evidence in favour of essentially  feedforward theories of visual object recognition. This, in turn, argues against object recognition theories requiring an explicit  image segmentation step prior to recognition, as such a step is presumed to require  time consuming iterative/feedback (both correct as it seems) algorithms.

Object recognition overview


Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as    ---- bitte auswählen ---- 

3D-based ; Neuroscience-based; object recognition;  view based  ; picture based;  2D-based;  global statistic;  Gestalt;  summary statistic;  structural description;  early vision based;  

models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called ______, published in   ---- bitte auswählen ---- 

2010  2003  1968  1996  1982  1987  1971  1975  1979

   by _____[surname only!]. An opposing theory – usually referred to as   ---- bitte auswählen ---- 

Gestalt  picture based  global statistic  structural description  3D-based  summary statistic  view based  Neuroscience-based object recognition  early vision based  2D-based

   models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is _________[surname only!].   ---- bitte auswählen ---- 

Luckily  Nicely  Convincingly  Unfortunately  

both theories are   ---- bitte auswählen ---- 

not;  well;  intimately; necessarily

   connected to the findings, theories and models of early spatial vision. 

Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the   ---- bitte auswählen ---- 

view-based  structural description  Gestalt  summary statistic  global statistic  early vision based  

approach to object recognition.

Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as  structural description models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called recognition-by-components , published in 1987 by Biederman [surname only!]. An opposing theory – usually referred to as view based models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is Tarr [surname only!]. Unfortunately both theories are not connected to the findings, theories and models of early spatial vision. 

Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the view-based approach to object recognition.

Yamins et al.’s HMO-Model (PNAS 2014)


Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for ______. The HMO model belongs in the larger class of 

---- bitte auswählen ----  DDN  NDD  NND  DNN  DND  

models, standing for _______ model. The HMO model's essential architectural characteristic is its  

---- bitte auswählen ---- computational complexity  heterogeneity  harmony  homogeneity  :

There are, for example, many  

---- bitte auswählen ----  bypass  feedback 

  connections and different parameter settings  

---- bitte auswählen ----  only at different levels of the hierarchy ; even at the same level of the hierarchy  .

  ---- bitte auswählen ----  However  In addition  Furthermore ,

the basic operations performed locally are  

---- bitte auswählen ---- the same  different  heterogenous  more or less the same

   throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around  

---- bitte auswählen ----  5000  100  500  1.000  10.000  100.000    

---- bitte auswählen ---- DNN parameterizations  DNN architectures  HMO model parametrizations  HMO architectures  

. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N =

  ---- bitte auswählen ----  1.000  300  3.000  100  30

   cells) as well as on how well the models categorized a set of images (roughly N =  

---- bitte auswählen ----  600  10.000  3.000  6.000  1.000 

  images). One central finding was that models optimized for  

---- bitte auswählen ----  explained variance in IT  categorization performance  IT-cell predictivity  discrimination performance 

  were also superior at  

---- bitte auswählen ----  explaining variance in IT  categorization performance  IT-cell predictivity  discrimination performance 

 .

In order to obtain a categorization performance from the HMO model, a  

---- bitte auswählen ----  linear  non-linear  partially linear  

decoder was

  ---- bitte auswählen ----  derived from  trained on  assumed to exist  taken from 

  the activity of units at the  

---- bitte auswählen ----  highest  intermediate  lowest  across  

level(s) of the HMO network. Using such a procedure, the HMO model's performance was  

---- bitte auswählen ----  better  only slightly worse  on par  

  than that of  

---- bitte auswählen ----  both computer vision and neuronally inspired  computer vision  neuronally inspired 

  models of object recognition on the difficult  

---- bitte auswählen ----  low  high  

variation task.

Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for hierarchical modular optimization. The HMO model belongs in the larger class of DNN models, standing for deep neural network model. The HMO model's essential architectural characteristic is its heterogeneity: There are, for example, many bypass connections and different parameter settings even at the same level of the hierarchy. However, the basic operations performed locally are the same throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around 5000 DNN architectures. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N = 300 oder 100 cells) as well as on how well the models categorized a set of images (roughly N = 6.000 images). One central finding was that models optimized for categorization performance were also superior at explaining variance in IT.

In order to obtain a categorization performance from the HMO model, a linear decoder was trained on the activity of units at the highest level(s) of the HMO network. Using such a procedure, the HMO model's performance was better than that of both computer vision and neuronally inspired models of object recognition on the difficult high variation task.

Wichmann, Drewes, Rosas and Gegenfurtner published a paper  

---- bitte auswählen ----  casting doubt on;  confirming  

the conclusions made by Torralba & Oliva. The two main conclusions of the study – based on

  ---- bitte auswählen ----  computational analysis ; psychophysical experiments;  neuro-imaging techniques  

– were, first, that for human observer animal detection in typical photographs of natural scenes

   ---- bitte auswählen ----  is independent of the power spectrum  ; relies on many PCA components of the power spectrum;  depends on the power spectrum as claimed by Torralba & Oliva ; is independent of the phase spectrum 

 . Second, they may indicate that in typical, commercial databases the statistics of the images may 

  ---- bitte auswählen ----  be as;  not be as ; even be more  

natural as/than often presumed, because photographs typically represent a

  ---- bitte auswählen ----  true ; random sample  Gaussian sample;  biased ; unbiased  

view of the world.

Wichmann, Drewes, Rosas and Gegenfurtner published a paper  casting doubt on the conclusions made by Torralba & Oliva. The two main conclusions of the study – based on psychophysical experiments – were, first, that for human observer animal detection in typical photographs of natural scenes is independent of the power spectrum . Second, they may indicate that in typical, commercial databases the statistics of the images may not be as natural as/than often presumed, because photographs typically represent a biased view of the world.

Torralba and Oliva published a paper entitled "Statistics of natural image categories". They report that, contrary to what was believed before, the   

---- bitte auswählen ----  power spectrum  pixel intensity histogram 

  of (images of ) natural scenes is only 

  ---- bitte auswählen ----  non-isotropic  isotropic 

  if averaged across image categories, but if analysed separately for different image categories, they found strong correlations between

  ---- bitte auswählen ----  the shape  the total power  phase  the complex conjugate  

of the power spectrum and image categories. Typically a density-plot the power spectrum of an image of a man-made scene is more 

  ---- bitte auswählen ----  egg-shaped  circular   triangluar-shaped  star-shaped 

 . Based on  

---- bitte auswählen ----  two; a large number of ; a small number of ; a single  

component(s) of a principal component analysis (PCA) performed on the power spectrum, Torralba & Oliva were able to correctly categorise images into animal and non-animal scenes in % of the cases. Calculating the PCA of the power spectrum is a  

---- bitte auswählen ----  non-linear  linear  

operation.   

---- bitte auswählen ----  However  Still   

this operation could be performed  

---- bitte auswählen ----  only with great difficulty;  already in the retina;   in a feedforward manner;  only using feedback  

in the human brain given what is currently known about physiology. Thus Torralba & Oliva concluded that animal versus non-animal categorization is so rapid because their

   ---- bitte auswählen ----  structural description model  summary statistic  image segmentation  view-based  

approach does not require an explicit   

---- bitte auswählen ----  image segmentation  Fourier transformation  image alignement  

step.

Torralba and Oliva published a paper entitled "Statistics of natural image categories". They report that, contrary to what was believed before, the  power spectrum of (images of ) natural scenes is only  isotropic if averaged across image categories, but if analysed separately for different image categories, they found strong correlations between the shape of the power spectrum and image categories. Typically a density-plot the power spectrum of an image of a man-made scene is more  star-shaped . Based on a small number of component(s) of a principal component analysis (PCA) performed on the power spectrum, Torralba & Oliva were able to correctly categorise images into animal and non-animal scenes in 80 % of the cases. Calculating the PCA of the power spectrum is a non-linear operationStill this operation could be performed in a feedforward manner in the human brain given what is currently known about physiology. Thus Torralba & Oliva concluded that animal versus non-animal categorization is so rapid because their  summary statistic approach does not require an explicit  image segmentation step.





Author

David

Informationen

Zuletzt geändert