The principle of univariance refers to the fact that
Given is a sine wave, 𝑔_{sine}(𝑥)=sin[𝜔₀𝑥],as a function of 𝑥 with fundamental frequency 𝑓₀ — and thus 𝜔₀=2𝜋𝑓₀— and unit amplitude. Which of the following equations correctly expresses the Fourier series of a rectangular wave 𝑔_{sq}(𝑥) with the same fundamental frequency and the same unit amplitude?
Prior to the seminar work of [__________] and [__________] published in [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , researchers in pattern perception, often referred to as [__________] [no capital letters!], thought of the stimuli exclusively in the [space; Fourier] domain, in terms of [lines, corners and edges; spatial frequency content]. After the publication of "Application of Fourier Analysis to the Visibility of Gratings" in the Journal of Physiology, however, vision researchers up to this day always consider stimuli [also ; only; never] in the [Fourier; space] domain. Additional experimental data [consistent; inconsitent] with the linear, independent multi-channel model came, e.g., from Blakemore and Campbell's [adaptation; masking; recognition; identification; detection] studies, or from the famous 1f, 3f and phase manipulation experiments by [__________] [surname only!] and [__________][surname only!], published in [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , or the elegant experiment by [__________][surname only!] and [__________] [surname only!] from [1996; 2010; 1971; 1975; 1968; 1987; 1982; 2003; 1979] , showing that [a single; two many] cycle(s) of a sine-wave grating could be [easier; equal] to detect than [many; a single; two] cycle(s) if the signal was [inhibited; masked; adapted away] by [narrow-band; wide-band; broad-band; white; pink] visual noise. Whilst there exists a large body of work [questioning; supporting; non-conclusive with respect to] the linear, independent multi-channel model, there are notable exceptions. One of the most prominent is a study by [__________] [surname only!] and colleagues from [1996; 1968; 1975; 1971; 1982; 1987; 2010; 2003; 1979] , based on an [__________] phenomenon, the "missing fundamental".
Prior to the seminar work of Campbell and Robson published in 1968 , researchers in pattern perception, often referred to as early spatial vision oder spatial vision [no capital letters!], thought of the stimuli exclusively in the space domain, in terms of lines, corners and edges. After the publication of "Application of Fourier Analysis to the Visibility of Gratings" in the Journal of Physiology, however, vision researchers up to this day always consider stimuli also in the Fourier domain. Additional experimental data consistent with the linear, independent multi-channel model came, e.g., from Blakemore and Campbell's adaptation studies, or from the famous 1f, 3f and phase manipulation experiments by Graham [surname only!] and Nachmias [surname only!], published in 1971 , or the elegant experiment by Carter [surname only!] and Henning [surname only!] from 1971 , showing that a single cycle(s) of a sine-wave grating could be easier to detect than many cycle(s) if the signal was masked by narrow-band visual noise. Whilst there exists a large body of work supporting the linear, independent multi-channel model, there are notable exceptions. One of the most prominent is a study by Henning [surname only!] and colleagues from 1975 , based on an auditory phenomenon, the "missing fundamental".
Bruce Henning and colleagues published a series of experiments in the mid 1970s which were [inconsitent; consistent] with the [non-independent; independent; correlated; covarying] [multi-channel; single-channel] model of Campbell & Robson. Henning et al.'s experiments were inspired by the "missing fundamental" in [object recognition without the fundamental spatial frequency; auditory frequency discrimination; auditory pitch perception; motion perception; auditory aound localization] , and they used both [___________] (AM) as well as [___________] (QFM) gratings as stimuli. Figure A shows the amplitude spectrum of (AM ; QFM ; both) gratings
. Figure B shows the appearance and the cross-section through a __ grating and its constituent 4f, 5f and 6f gratings, Figure C shows the respective graphs for a ___ grating. Henning et al. reported to find strong interactions (masking) between (a sine-wave with frequency 1f and a QFM ; a sine-wave with frequency 1f and an AM) grating composed of 4f, 5f and 6f: According to Campbell and Robson there should have been (a, no) interaction (masking) between the stimuli, however. Furthermore, there was clearly less masking with a (AM ; QFM)
grating as opposed to the (QFM; AM) grating, and this should (not have ; have)
happened, pointing to the importance of (phase relations ; …)
between the stimulus components, contrary to the findings of (Campbell ; Robson; Nachmias ; Graham)
and (Campbell ; Robson; Nachmias ; Graham) .
Bruce Henning and colleagues published a series of experiments in the mid 1970s which were inconsistent with the independent multi-channel model of Campbell & Robson. Henning et al.'s experiments were inspired by the "missing fundamental" in auditory pitch perception, and they used both amplitude modulated (AM) as well as quasi-frequency modulated (QFM) gratings as stimuli. Figure A shows the amplitude spectrum of both AM and QFM gratings . Figure B shows the appearance and the cross-section through a QFM grating and its constituent 4f, 5f and 6f gratings, Figure C shows the respective graphs for a AM grating. Henning et al. reported to find strong interactions (masking) between a sine-wave with frequency 1f and an AM grating composed of 4f, 5f and 6f: According to Campbell and Robson there should have been no interaction (masking) between the stimuli, however. Furthermore, there was clearly less masking with a QFM grating as opposed to the AM grating, and this should not have happened, pointing to the importance of phase relations between the stimulus components, contrary to the findings of Graham and Nachmias .
[________] is a mathematical procedure by which a signal can be separated into component sine waves at different frequencies. Combining these sine waves will reproduce the original signal.
What is Fourier analysis and why is it useful for understanding perception? How is Fourier analysis related to the types of stimuli that psychophysicists use?
Fourier analysis is a mathematical procedure for decomposing a complex signal into its component sine waves. If the individual sine waves are re-combined, they will reproduce the original signal. Fourier analysis is used extensively by perception researchers because it provides a good description of stimuli and also because several perceptual systems perform Fourier analysis when processing stimuli (e.g., the visual and auditory systems). In terms of stimuli based on sine waves, psychophysicists tend to use pure tones in the auditory domain and sine wave gratings in the visual domain. Sine waves may vary in their wavelength (distance for one full cycle of oscillation of the wave), period (time for one full cycle of oscillation of the wave), phase (relative shift of the sine wave) and amplitude (height of the sine wave, i.e. contrast in vision and loudness in hearing).
In [______] Campbell and Robson published their seminal paper entitled
Application of Convolution Analysis to the Visibility of Natural Stimuli
Application of Fourier Synthesis to the Visibility of Natural Stimuli
Application of Fourier Analysis to the Visibility of Natural Stimuli
Application of Fourier Synthesis to the Visibility of Gratings
Application of Fourier Analysis to the Visibility of Gratings
Application of Convolution Analysis to the Visibility of Gratings
In 1968 Campbell and Robson published their seminal paper entitled Application of Fourier Analysis to the Visibility of Gratings
The graph shows one of the two central figures from Campbell & Robson (Fig. 3, p. 556). What is plotted for the open symbols are the [contrast thresholds; contrast sensitivities (1 over detection threshold); luminance thresholds; luminance sensitivities (1 over detection threshold)] on the y-axis against [line width in deg; spatial frequency in wavelength lambda; line width in cpd; spatial frequency in cpd; luminance modulation rate]. The open squares show the data for the [square-wave grating; sin-wave grating; rectangular-wave grating; sawtooth-grating], the open circles for the [square-wave grating; sin-wave grating; rectangular-wave grating; sawtooth-grating] . The filled black circles show the ratio of the [square-to-sine; square-to-rectangular; square-to-sawtooth; sin-to-rectangular; sin-to-sawtooth] [thresholds; sensitivities]. The solid black line marks the prediction at [4/pi; pi/4; 4/3; 3/4] derived from the Fourier series of the stimuli. The dashed line marks the prediction of a [simple corner detector; simple edge detector; multi-channel; simple peak-detector] model of early spatial vision.
The graph shows one of the two central figures from Campbell & Robson (Fig. 3, p. 556). What is plotted for the open symbols are the contrast sensitivities (1 over detection threshold) on the y-axis against spatial frequency in cpd. The open squares show the data for the [square-wave grating, the open circles for the sin-wave grating. The filled black circles show the ratio of the square-to-sine sensitivities. The solid black line marks the prediction at 4/pi derived from the Fourier series of the stimuli. The dashed line marks the prediction of a simple peak-detector model of early spatial vision.
Weber’s Law
Weber proposed that the smallest change in a stimulus that can be detected is a(n) _____ proportion of the stimulus level.
The method of ______
requires the observer to alter the strength of a stimulus until it matches some criterion.
Stimuli and private experience
_____ is the ability to detect a stimulus and, perhaps, to turn that detection into a private experience.
Psychophysical methods (no. 4)
If you are asked to taste a lemon and then adjust a light until it is as bright as the lemon is sour, you have been asked to engage in
Relation between stimulus and sensation
(Fechner’s law/ Steven’s power law/ Weber’s law / Fourier analysis / signal detection theory) describes the relationship between a stimulus and its resulting sensation by proposing that the JND is a constant fraction of the stimulus intensity.
Webers law
Stevens power law
Stevens’ power law describes the relationship between a
Binary decisions (no. 1)
If a stimulus is present and the observer reports it as present, this is called a
Meaning and sensation
(Perception/Distortion/Observation/Adaptation) is the art of giving meaning to a detected senesation.
Perception
Founder of psychophysics
(Wundt/Berkeley/Fechner/Weber/Plato) is the founder of psychophysics.
Fechner
Matter and consciousness
The idea that all matter has consciousness is known as
Binary decisions (no. 2)
If a stimulus is present and the observer reports it as absent, this is called a
Internal threshold in STD
In signal detection theory the (shift/criterion/method of limits/sensitivity/method of adjustment) is an internal threshold that is set by the observer.
In signal detection theory, the criterion is an internal threshold that is set by the observer.
STD visualization
The curves in the figure below are known as
_____ describes the relationship between a stimulus and its resulting sensation by proposing that the JND is a constant fraction of the stimulus intensity.
Weber’s law describes the relationship between a stimulus and its resulting sensation by proposing that the JND is a constant fraction of the stimulus intensity.
Difference between sensitivity and criterion
What is the difference between sensitivity and criterion in signal detection theory?
The sensitivity of a binary decision describes the distance of the peak of the two distributions (YES and NO likelihood) across a perception axis: If the maxima of the distributions are far from each other (high sensitivity), it is usually easier to make less mistakes (false YES and falso NO). The criterion describes the point of perception that marks when a subject shifts to the other answer if crossed. It can be the point where both are equally likely, or any other point.
STD and airport security
Airport security is very tight. If a traveler even jokes about a bomb, they are detained and questioned to ensure that no real terrorist threat succeeds. In terms of signal detection theory, airport security would rather have a
Binary decisions (no. 3)
If a stimulus is absent but the observer reports it as present, this is called a
Binary decisions (no. 4)
If a stimulus is absent and the observer reports it as absent, this is called a
Sensation and perception
What is the difference between sensation and perception?
Sensation is concerned with how our senses transduce energy from the world (light, sound, mechanical pressure) into neural energy. Perception is the interpretation of sensations and the assignment of meaning to them. For example, this paragraph is an array of black and white spots of light on the backs of our retinas that excites our rods and cones (sensation), but we perceive it as text that has meaning (perception).
Materialism
Materialism is the notion that
What is the difference between sensation and perception? (Essay)
STD and objective difficulty
In signal detection theory, the (shift/criterion/method of limits/ sensititvity/ method of adjustment) is a value that defines the objective difficulty with which an ideal observer could tell the difference between the presence and absence of a stimulus or the difference between two stimuli.
In signal detection theory, the sensitivity is a value that defines the objective difficulty with which an ideal observer could tell the difference between the presence and absence of a stimulus or the difference between two stimuli.
Dualism
Dualism is the idea that
Thorpe, Fize & Merlot published a study in Nature which exerted a very strong influence on the object recognition community. In their paper they showed that
---- bitte auswählen ---- human observers monkeys cats
could decide whether a previously unseen
---- bitte auswählen ---- photograph line drawing painting
of a natural scene contained an animal or not. The median reaction time (RT) of the observers was around
---- bitte auswählen ---- 400-500 100-200 200-300 500-600 600-700
ms with a mean percentage correct of
---- bitte auswählen ---- 85-90 90-95 95-100 80-85
% correct (note that the observers showed
---- bitte auswählen ---- a slight no a strong
speed-accuracy trade-off). Subsequent
---- bitte auswählen ---- ERP fMRI PET multi-unit single-unit
analyses showed that roughly
---- bitte auswählen ---- 150 100 200 250
ms after stimulus onset the measured neurophysiological correlate could already reliable signal the presence or absence of an animal in a post-hoc analysis. Thus processing of the natural scene stimulus was already completed after such a comparatively short time. According to the authors this result provides strong evidence in favour of essentially
---- bitte auswählen ---- feedforward dynamic feedback multi-level feedforward & feedback feedback deep-belief neural network
theories of visual object recognition. This, in turn, argues against object recognition theories requiring an explicit
---- bitte auswählen ---- image segmentation 2D-to-3D Fourier transform Wavelet transform multi-scale image decomposition
step prior to recognition, as such a step is presumed to require
---- bitte auswählen ---- time consuming fast computationally complex
---- bitte auswählen ---- iterative feedback feedforward non-linear processing linear decomposition fast Fourier
algorithms.
Thorpe, Fize & Merlot published a study in Nature which exerted a very strong influence on the object recognition community. In their paper they showed that human observers could decide whether a previously unseen photograph of a natural scene contained an animal or not. The median reaction time (RT) of the observers was around 400-500 ms with a mean percentage correct of 90-95 % correct (note that the observers showed a slight speed-accuracy trade-off). Subsequent ERP analyses showed that roughly 150 ms after stimulus onset the measured neurophysiological correlate could already reliable signal the presence or absence of an animal in a post-hoc analysis. Thus processing of the natural scene stimulus was already completed after such a comparatively short time. According to the authors this result provides strong evidence in favour of essentially feedforward theories of visual object recognition. This, in turn, argues against object recognition theories requiring an explicit image segmentation step prior to recognition, as such a step is presumed to require time consuming iterative/feedback (both correct as it seems) algorithms.
The average and distribution of properties, like orientation or color, over a set of objects or a region in a scene are called the
---- bitte auswählen ---- ensemble statistics ; guiding features; None of the two answers is true
of the scene, and is/are computed by the
---- bitte auswählen ---- selective ; None of the two answers is true; nonselective
pathway.
The average and distribution of properties, like orientation or color, over a set of objects or a region in a scene are called the ensemble statistics of the scene, and is/are computed by the nonselective pathway.
Object recognition overview
Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as ---- bitte auswählen ----
3D-based ; Neuroscience-based; object recognition; view based ; picture based; 2D-based; global statistic; Gestalt; summary statistic; structural description; early vision based;
models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called ______, published in ---- bitte auswählen ----
2010 2003 1968 1996 1982 1987 1971 1975 1979
by _____[surname only!]. An opposing theory – usually referred to as ---- bitte auswählen ----
Gestalt picture based global statistic structural description 3D-based summary statistic view based Neuroscience-based object recognition early vision based 2D-based
models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is _________[surname only!]. ---- bitte auswählen ----
Luckily Nicely Convincingly Unfortunately
both theories are ---- bitte auswählen ----
not; well; intimately; necessarily
connected to the findings, theories and models of early spatial vision.
Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the ---- bitte auswählen ----
view-based structural description Gestalt summary statistic global statistic early vision based
approach to object recognition.
Many researchers in vision science believe that object recognition is one of the most important functions of the human visual system. Thus it is perhaps not surprising that there exists a large body of research on object recognition. One dominant approach – usually referred to as structural description models – postulates a "visual alphabet" made from 3D geometric primitives. The most prominent theory of this kind is called recognition-by-components , published in 1987 by Biederman [surname only!]. An opposing theory – usually referred to as view based models – instead believes the human visual system recognizes objects by storing and matching "2D-images" or "snap-shots" of objects and inter- and extrapolates between them if required to recognize an object from a novel viewpoint. One of the most well-known proponents of this theory is Tarr [surname only!]. Unfortunately both theories are not connected to the findings, theories and models of early spatial vision.
Very recently DiCarlo, Cox and colleagues have argued for a computational neural network approach to explain object recognition. This can be seen as a neuroscience-inspired computational instantiation of the view-based approach to object recognition.
Problems of structural descripition theories
Some researchers believe that a major problem with structural description theories of object recognition is that
Yamins et al.’s HMO-Model (PNAS 2014)
Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for ______. The HMO model belongs in the larger class of
---- bitte auswählen ---- DDN NDD NND DNN DND
models, standing for _______ model. The HMO model's essential architectural characteristic is its
---- bitte auswählen ---- computational complexity heterogeneity harmony homogeneity :
There are, for example, many
---- bitte auswählen ---- bypass feedback
connections and different parameter settings
---- bitte auswählen ---- only at different levels of the hierarchy ; even at the same level of the hierarchy .
---- bitte auswählen ---- However In addition Furthermore ,
the basic operations performed locally are
---- bitte auswählen ---- the same different heterogenous more or less the same
throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around
---- bitte auswählen ---- 5000 100 500 1.000 10.000 100.000
---- bitte auswählen ---- DNN parameterizations DNN architectures HMO model parametrizations HMO architectures
. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N =
---- bitte auswählen ---- 1.000 300 3.000 100 30
cells) as well as on how well the models categorized a set of images (roughly N =
---- bitte auswählen ---- 600 10.000 3.000 6.000 1.000
images). One central finding was that models optimized for
---- bitte auswählen ---- explained variance in IT categorization performance IT-cell predictivity discrimination performance
were also superior at
---- bitte auswählen ---- explaining variance in IT categorization performance IT-cell predictivity discrimination performance
.
In order to obtain a categorization performance from the HMO model, a
---- bitte auswählen ---- linear non-linear partially linear
decoder was
---- bitte auswählen ---- derived from trained on assumed to exist taken from
the activity of units at the
---- bitte auswählen ---- highest intermediate lowest across
level(s) of the HMO network. Using such a procedure, the HMO model's performance was
---- bitte auswählen ---- better only slightly worse on par
than that of
---- bitte auswählen ---- both computer vision and neuronally inspired computer vision neuronally inspired
models of object recognition on the difficult
---- bitte auswählen ---- low high
variation task.
Yamins and colleagues from the DiCarlo lab at MIT published an article in 2014 in which they presented their HMO model, standing for hierarchical modular optimization. The HMO model belongs in the larger class of DNN models, standing for deep neural network model. The HMO model's essential architectural characteristic is its heterogeneity: There are, for example, many bypass connections and different parameter settings even at the same level of the hierarchy. However, the basic operations performed locally are the same throughout the network. In the Yamins et al. (2014) article they report a large-scale modelling effort, evaluating around 5000 DNN architectures. Yamins et al. compared their models both to the response of cells in IT cortex (roughly N = 300 oder 100 cells) as well as on how well the models categorized a set of images (roughly N = 6.000 images). One central finding was that models optimized for categorization performance were also superior at explaining variance in IT.
In order to obtain a categorization performance from the HMO model, a linear decoder was trained on the activity of units at the highest level(s) of the HMO network. Using such a procedure, the HMO model's performance was better than that of both computer vision and neuronally inspired models of object recognition on the difficult high variation task.
Viewpoint invariance
Viewpoint invariance refers to the idea that
Time required for object recognition
Tarr and his colleagues found that the amount of time needed to recognize novel objects is at least partially determined by
Specificity of IT cells
A study of cells in IT cortex of a human patient showed that they responded to very specific stimuli, such as
Superordinate level category terms
Which of the following is a superordinate level category term?
Fundamentals of RBX theory
What are object representations made of, according to the recognition-by-components (RBC) model of object recognition?
Give examples:
Superordinate level:
Basic level:
Subordinate level:
Superordinate level: Animal
Basic level: Dog
Subordinate level: Golden Retriever
What kinds of processes happen in middle vision?
Middle vision refers to a set of processes that combine features detected in early vision (such as edges and contours) into objects. Middle vision utilizes rules and principles for combining elements into perceptual groups, many of which were discovered by psychologists from the Gestalt tradition. Some important steps in middle vision include finding edges of objects, dealing with occlusion, texture segmentation and grouping, and determining figure/ground assignments.
What is unique about face perception and how is it different than object perception?
Faces are different than other objects because all faces have the same parts in the same relationships with one another (e.g., eyes above nose, which is above the mouth). Therefore, fine metric details of faces are important in recognition, and it seems the visual system represents faces holistically in terms of these fine metric details, whereas it does not in the case of objects. Further evidence that the visual system treats faces and objects differently is the double dissociation between face and object recognition regions of the brain. Some patients with brain damage develop object agnosia and cannot recognize objects but can still recognize faces. Other patients develop prosopagnosia and thus cannot recognize faces but can recognize other objects. Finally, inverted faces are much harder for us to recognize than inverted objects, suggesting that faces are processed differently than objects.
Gestalt Psychology
Gestalt psychologists emphasize that
Perceptual organization no. 4
Perceptual organization no. 9
Perceptual orgaization no. 7
Refer to the figure; which portion of the figure is interpreted as “ground” according to the Gestalt figure-ground assignment principles? ---- bitte auswählen ---- Neither the red nor the yellow portions /The yellow portion / Both the red and yellow portions / There is no “ground” portion in the figure / The red portion.
Again refer to the figure; which Gestalt figure-ground assignment principle is most responsible for this interpretation of “ground”? ---- bitte auswählen ---- Symmetry / Size Parallelism / Surroundedness / Proximity
Refer to the figure; which portion of the figure is interpreted as “ground” according to the Gestalt figure-ground assignment principles? The yellow portion
Again refer to the figure; which Gestalt figure-ground assignment principle is most responsible for this interpretation of “ground”? Surroundedness
Perceptual organization no. 1
Perceptual organization no. 2
Perceptual organization no. 5
Which of the following is a viewing position that produces some regularity in the visual image that is not present in the world?
Perceptual organization no. 6
The word “figure” in the term “figure-ground assignment” refers to
What are the “What” and “Where” pathways?
The “What” pathway, also known as the ventral pathway, extends from the occipital lobe to the temporal lobe of the brain and is primarily concerned with object identity. The “Where” pathway, also known as the dorsal pathway, extends from the occipital lobe to the parietal lobe and is primarily concerned with the locations of objects in space and the actions required to interact with them.
Compare and contrast the structural description and view-based approaches to understanding object recognition
Structural description theories of object recognition, such as Biederman’s recognition-by-components (RBC) model, suggest that when an object is perceived, it is represented as a series of volumetric parts (e.g., geons in RBC) and the categorical relations between the parts (e.g., above, below, beside). Once an object is represented as volumetric parts and spatial relations, the process of object recognition itself is rather straightforward and invariant with viewpoint. View-based models of object recognition, on the other hand, propose that objects are represented as a collection of remembered views of the object, where views are stored as templates. Accordingly, initial representation of the object is easy, but matching the perceived view to representations in memory is difficult. Structural description models propose that object recognition is viewpoint invariant whereas view-based theories propose that object recognition should be slower for objects seen from novel viewpoints. There is much debate in the literature, but it seems that observers do not show complete viewpoint invariance in object recognition
Anantomy of object recognition
Evidence indicates that structures in ---- bitte auswählen ---- striate parietal occipital frontal inferotemporal cortex are especially important in end-stage object recognition processes.
inferotemporal
Prosopagnosia
Prosopagnosia is a neuropsychological disorder in which the patient
Central empirical findings about human object recognition
Typically … (choose all that are ture)
Problems of view-based theories
A major problem with naïve template theories of object recognition is that
Failure of object recognition
---- bitte auswählen ---- Prosopagnosia Agnosia Anomia Alexia Dyslexia is a failure to recognize objects in spite of the ability to see them.
Agnosia
Entry-level categorization terms
Which of the following is an entry-level category term?
Stages of visual processing
Which of the following is a loosely defined stage of visual processing that comes after basic features have been extracted from the image, and before object recognition and scene understanding?
Properties of generalized-cone components in RBC Theory(2)
The essential non-accidental properties of the generalized-cone components are:
(more than one can be true)
Fundamentals of view-based theories
What are object representations made of, according to view-based theories of object recognition?
Subordinate level category terms
Which of the following is a subordinate level category term?
What are ensemble statistics?
(50 characters)
Ensemble statistics are rapidly extracted representations of visual scenes that include the average and distribution of properties like orientation or color over a set of objects or a region of space. Ensemble statistics represent knowledge about the properties of a group of objects rather than individual objects themselves.
---- bitte auswählen ---- Physical organization ; Spatial layout; Physical setting; Spatial organization Setting - - -
describes the structure of a scene without reference to the identity of specific objects in the scene.
Spatial layout describes the structure of a scene without reference to the identity of specific objects in the scene.
What are the two pathways to scene perception?
(max 380 characters)
There is both a selective and nonselective pathway for scene perception. The selective pathway involves the allocation of attention to one or a few objects at a time and is governed by the attentional bottleneck. Thus, there is selective processing of objects in the selective pathway, meaning that it is responsible for visual search, binding, and the existence of phenomena such as the attentional blink, change blindness, and inattentional blindness. The nonselective pathway, on the other hand, processes visual scenes holistically, encoding scene gist, spatial layout, and ensemble statistics very quickly. The representations in the nonselective pathway are generated as a whole and do not include descriptions of individual objects within the scene. The nonselective pathway has connections with the selective pathway and can, for instance, guide visual search for particular objects in a scene by helping the observer restrict attention to particular locations in the scene.
Wichmann, Drewes, Rosas and Gegenfurtner published a paper
---- bitte auswählen ---- casting doubt on; confirming
the conclusions made by Torralba & Oliva. The two main conclusions of the study – based on
---- bitte auswählen ---- computational analysis ; psychophysical experiments; neuro-imaging techniques
– were, first, that for human observer animal detection in typical photographs of natural scenes
---- bitte auswählen ---- is independent of the power spectrum ; relies on many PCA components of the power spectrum; depends on the power spectrum as claimed by Torralba & Oliva ; is independent of the phase spectrum
. Second, they may indicate that in typical, commercial databases the statistics of the images may
---- bitte auswählen ---- be as; not be as ; even be more
natural as/than often presumed, because photographs typically represent a
---- bitte auswählen ---- true ; random sample Gaussian sample; biased ; unbiased
view of the world.
Wichmann, Drewes, Rosas and Gegenfurtner published a paper casting doubt on the conclusions made by Torralba & Oliva. The two main conclusions of the study – based on psychophysical experiments – were, first, that for human observer animal detection in typical photographs of natural scenes is independent of the power spectrum . Second, they may indicate that in typical, commercial databases the statistics of the images may not be as natural as/than often presumed, because photographs typically represent a biased view of the world.
Torralba and Oliva published a paper entitled "Statistics of natural image categories". They report that, contrary to what was believed before, the
---- bitte auswählen ---- power spectrum pixel intensity histogram
of (images of ) natural scenes is only
---- bitte auswählen ---- non-isotropic isotropic
if averaged across image categories, but if analysed separately for different image categories, they found strong correlations between
---- bitte auswählen ---- the shape the total power phase the complex conjugate
of the power spectrum and image categories. Typically a density-plot the power spectrum of an image of a man-made scene is more
---- bitte auswählen ---- egg-shaped circular triangluar-shaped star-shaped
. Based on
---- bitte auswählen ---- two; a large number of ; a small number of ; a single
component(s) of a principal component analysis (PCA) performed on the power spectrum, Torralba & Oliva were able to correctly categorise images into animal and non-animal scenes in % of the cases. Calculating the PCA of the power spectrum is a
---- bitte auswählen ---- non-linear linear
operation.
---- bitte auswählen ---- However Still
this operation could be performed
---- bitte auswählen ---- only with great difficulty; already in the retina; in a feedforward manner; only using feedback
in the human brain given what is currently known about physiology. Thus Torralba & Oliva concluded that animal versus non-animal categorization is so rapid because their
---- bitte auswählen ---- structural description model summary statistic image segmentation view-based
approach does not require an explicit
---- bitte auswählen ---- image segmentation Fourier transformation image alignement
step.
Torralba and Oliva published a paper entitled "Statistics of natural image categories". They report that, contrary to what was believed before, the power spectrum of (images of ) natural scenes is only isotropic if averaged across image categories, but if analysed separately for different image categories, they found strong correlations between the shape of the power spectrum and image categories. Typically a density-plot the power spectrum of an image of a man-made scene is more star-shaped . Based on a small number of component(s) of a principal component analysis (PCA) performed on the power spectrum, Torralba & Oliva were able to correctly categorise images into animal and non-animal scenes in 80 % of the cases. Calculating the PCA of the power spectrum is a non-linear operation. Still this operation could be performed in a feedforward manner in the human brain given what is currently known about physiology. Thus Torralba & Oliva concluded that animal versus non-animal categorization is so rapid because their summary statistic approach does not require an explicit image segmentation step.
Measuring optical quality
In 1966, Campbell and Gubisch estimated the (pointspread/linespread) of the human eye using the ___ method. They found that for (large/medium/small)
pupil sizes, the retinal image quality was mainly limited by ____.
In 1966, Campbell and Gubisch estimated the linespread of the human eye using the double pass method. They found that for small pupil sizes, the retinal image quality was mainly limited by diffraction.
Diffraction is the bending and spreading of light waves when they pass through a small aperture
Nearest focal point
To bring near objects into focus, the ciliary muscles (contract/relax/strengthen), making the lens of the eye (less thick/more round/more thick) , in a process known as ____ . This capacity decreases with age such that by (40-50/50-60/60-70)
years, accommodation amplitude is less than 2.5 diopters (compared to around __ diopters for 20-year-olds).
To bring near objects into focus, the ciliary muscles contract , making the lens of the eye more thick , in a process known as accommodation . This capacity decreases with age such that by 50-60 years, accommodation amplitude is less than 2.5 diopters (compared to around 10 diopters for 20-year-olds).
Crowding is a bottleneck of [object recognition; object detection; text recognition] . Crowding zones follow " [Bouma's; Triesman's; Rosenholtz's] Law", which states that flanking objects will [cause; reduce] crowding when they fall within approximately half the retinal eccentricity of the target.
Crowding is a bottleneck of object recognition. Crowding zones follow " Bouma's Law", which states that flanking objects will cause crowding when they fall within approximately half the retinal eccentricity of the target.
According to pioneering American psychologist William James, who knows what "attention" is?
According to Feature Integration Theory ( [ Broadbent; Rosenholtz; Wolfe; Treisman] and [Gelade; Treisman; Rosenholtz; Posner] , 1980), a limited set of basic visual features such as colour, orientation and [good continuation; gist; size; ensemble] can be processed [spatially; serially; temporally; preattentively] . Search slopes for stimuli defined by these features alone should therefore be approximately [_______] ms/item, and slopes for [slow; incorrect; correct; fast] target present and target absent trials should be [parallel; serial] .
According to Feature Integration Theory (Treisman and Gelade, 1980), a limited set of basic visual features such as colour, orientation and size can be processed preattentively. Search slopes for stimuli defined by these features alone should therefore be approximately 0 ms/item, and slopes for correct target present and target absent trials should be parallel.
A circle flashed at the spatial location in the visual periphery that a target stimulus might appear is an example of a(n)
Relationship between pointspread and linespread
The pointspread and linespread functions describe the spread (blurring) induced by an optical system on a point and a line respectively. Which statements are true (multiple answers possible):
Optical path through the eye
Before reaching the photoreceptors, light passes through the (aqueous humor/cornea/lens/retina/vitreous humor/cornea/pupil) ,
then the (aqueous humor/cornea/lens/retina/vitreous humor/cornea/pupil),
then the (aqueous humor/cornea/lens/retina/vitreous humor/cornea/pupil) ,
followed by the (aqueous humor/cornea/lens/retina/vitreous humor/cornea/pupil) ,
then the (aqueous humor/cornea/lens/retina/vitreous humor/cornea/pupil).
Before reaching the photoreceptors, light passes through the cornea ,
then the aqueous humor ,
then the pupil ,
then the lens ,
followed by the vitreous humor ,
then the retina.
(picture does not show aqueous humor)
Blind spot
The "blind spot" is a region of the retina containing (only rods/only cones/no photoreceptors) . It lies on the (temporal/nasal/central) side of the retina. To find the blind spot in your right eye, focus on a point in front of you, close your left eye, then move your finger (leftwards/rightwards) from the focal point.
The "blind spot" is a region of the retina containing no photoreceptors . It lies on the nasal side of the retina. To find the blind spot in your right eye, focus on a point in front of you, close your left eye, then move your finger rightwards from the focal point.
ML approaches to saliency make ---- bitte auswählen ---- ( minimal ; maximal ) ---- bitte auswählen ----( posterior ; prior ) assumptions regarding the computational architecture of the visual system. Typically a ---- bitte auswählen ---- ( rather generic ; carefully tuned ; highly specific ) ML algorithm is trained on a ---- bitte auswählen ---- ( small ; large ) dataset and prediction of the fixation locations is ---- bitte auswählen ---- ( optimised ; randomised ; treated as nuisance variable) . The currently best performing visual saliency models are based on ---- bitte auswählen ---- deep neural networks spatial point processes support vector machines Gaussian processes Bayesian analysis .
ML approaches to saliency make minimal prior assumptions regarding the computational architecture of the visual system. Typically a rather generic ML algorithm is trained on a large dataset and prediction of the fixation locations is optimised . The currently best performing visual saliency models are based on deep neural networks .
In the beginning, the mid 1990s until before DNNs became prominent as models of visual saliency, saliency was thought to be a ---- bitte auswählen ---- ( bottom-up ; top-down ) ---- bitte auswählen ---- ( stimulus-driven ; salient ; predictive ) signal. The main idea of early saliency models is that signals ---- bitte auswählen ---- ( select ; compete for ; align visual ; synchronise the ) representation(s).
In the beginning, the mid 1990s until before DNNs became prominent as models of visual saliency, saliency was thought to be a bottom-upstimulus-driven signal. The main idea of early saliency models is that signals compete for representation(s).
Splitting attention between two or more stimuli is called
What is visual saliency?
Retinal cell types 1
Cells that send signals from the retina to the rest of the brain are called
Rods and Cones
Cones and rods are photoreceptors. Cones are mainly responsive in the (night/day/dawn), meaning they have (low/high/medium) sensitivity to absolute light levels. They are concentrated in the (retina/lense/fovea) , and there are estimated to be about __ million cones.
Rods are not found in the __ [one word]. They express the photopigment ____ and are mainly sensitive to (short/medium/long) wavelength light. There are estimated about million rods __.
Cones and rods are photoreceptors. Cones are mainly responsive in the day , meaning they have low sensitivity to absolute light levels. They are concentrated in the fovea , and there are estimated to be about 5 million cones.
Rods are not found in the fovea [one word]. They express the photopigment rhodopsin and are mainly sensitive to short wavelength light. There are estimated to be about 100 million rods.
Astigmatism
An optical system that demonstrates astigmatism is one whose point spread function is (antisymmetric/asymmetric/circuilar). This means that the system's linespread function will depend on the (line orientation/line length/input size).
An optical system that demonstrates astigmatism is one whose point spread function is asymmetric . This means that the system's linespread function will depend on the line orientation.
Eigenfunction of a shift-invariant linear system
Consider the shift-invariant linear system 𝑓 with time-varying input 𝑥(𝑡), and time-varying response 𝑦(𝑡). Thus:
𝑦(𝑡)=𝑓[𝑥(𝑡)]
Sinewaves are the eigenfunctions of a shift-invariant linear system. Which of the following mathematical statements express this fact correctly?
Superposition of a linear system
A system is a transformation from one signal—called the input—to another signal—called the output or the response of the system. Assume you have a system 𝑓 with time-varying input 𝑥(𝑡), and time-varying response 𝑦(𝑡). Thus:
If you your system 𝑓 was a linear system, its input-output relationship obeys two important properties: homogeneity and superposition. Which of the following statements express the propertiy of superposition mathematically?
Mixing paints to create new colours is an example of [additive; subtractive; multiplicative] colour mixing, while shining lights to create new colours is an example of [multiplicative; subtractiv; divisive; additive] colour mixing.
Mixing paints to create new colours is an example of subtractive colour mixing, while shining lights to create new colours is an example of additive colour mixing.
What is the term for the light that illuminates a surface?
A unique blue is a blue that has no ___________ or green tint.
A _________________ is an individual who suffers from colour blindness that is due to the absence of L-cones.
_________________ is a colour perception effect in which two colours bleed into each other, each taking on some of the chromatic quality of the other.
Some animals achieve colour vision not with different photopigments, but rather with
_________________ is the idea that basic perceptual experiences may be determined in part by the cultural environment.
Which of the following is not a unique hue?
Which of the following is a typical argument from the textbook about the usefulness of colour vision?
Which of the following is not one of the colour-opponent pairs coded by the visual system?
A __________________ is an individual who suffers from colour blindness that is due to the absence of M-cones.
Which of the following colour pairs is furthest apart in wavelength?
Which of the following is not a basic colour term?
Which of the following colours is “illegal” for our visual systems?
________________ is the inability to perceive colours due to damage to the central nervous system.
A ________________ is an individual who suffers from colour blindness that is due to the absence of S-cones.
According to the _______________ theory, the colour of any light is defined in our visual system by the relationships among three numbers of a set.
Which of the following is not a type of cone?
In the case of a negative afterimage, a yellow stimulus would produce a ________________ afterimage.
What kind of lighting conditions are depicted in the photograph?
________________ are different mixtures of wavelengths that look identical.
In the hue cancellation experiments described in the textbook, if the starting colour were too reddish, you would add
Which scientist developed the colour-matching technique depicted in the figure below?
How many lights (of the correct type) are required to match any colour that humans can see? (More precisely: can be made indistinguishable?)
_________________ is a colour perception effect in which the colour of one region induces the opponent colour in a neighboring region.
What type of lighting conditions occurs during the daytime in full sunlight?
Homogeinity of a linear system
If you your system 𝑓 was a linear system, its input-output relationship obeys two important properties: homogeneity and superposition. Which of the following statements express the propertiy of homogeneity mathematically?
Last changed8 months ago