Integration of multimodal cues: Temporal segmentation & visual motion
Robert Sekuler and Victoria Wong Volen Center, Brandeis University, Waltham MA
Results: Cues one at a time
The visual worlds continuous stream of spatio-temporal events
must be segmented into appropriate constituents. Normally,
visual segmentation cues are accompanied by correlated inputs
from other senses. Content-based, automatic image recognition
systems can learn to exploit correlated multi-sensory
information (Rui et al., 2001). Can humans do the same? And
if so, what rules are used to harmonize multiple segmentation
For answers, we studied responses to a bistable motion
stimulus whose visual outcome varies with perceived
segmentation. The basic stimulus is shown in Fig. 1.
Fig. 1. Space-time plot
of the stimulus. Two
identical discs move
steadily toward and then
past one another. Their
an ambiguous, bistable
percept: The discs seem
to stream through or to
bounce off one another.
When their trajectories are perceived as continuous and
uninterrupted, the discs appear to stream through one another;
when the trajectories are perceived as broken or interrupted, the
discs appear to bounce off one another. We take streaming to be
the default response to the bistable stimulus. Departures from
this default require evidence of a spatial or temporal perturbation
in the discs trajectories (Sekuler & Sekuler, 1999; Tripathy &
Barrett, 2003; MacKay, 1958).
Using this bistable stimulus, we gauged the perceptual
influences of auditory and visual segmentation cues, presented
singly and in combination. But first, we equated the cues so that
individually all would be equally effective for each subject.
Materials and methods
Sekuler, Sekuler & Lau (1997) showed that various cues,
auditory or visual, could bias the perceptual result of the basic
stimulus (Figure 1), altering the relative probabilities of seeing
bouncing, pr(bouncing), or seeing streaming, 1-pr(bouncing).
For the basic stimulus, a pair of black, 1.0 deg diameter discs
moved at 5.9 deg/sec. To this stimulus, four different cues were
added; two cues were auditory, two were visual.
Timing (T). A single tapping sound, 85 dbSPL presented at
varying times relative to the discs coincidence.
Intensity (I). The same tapping sound presented at varying
intensity levels, but always while the discs coincided.
Contrast (C). When the discs coincided, their contrast was
temporarily reduced by varying amount.
Duration (D). The duration of the discs period of coincidence
For each subject individually, we identified the value at
which each cue produced pr(bouncing) = 0.21, and the value at
which pr(bouncing)= 0.33. The subjects two values for each
cue were then used to study integration of cues.
Results from 10 subjects showed that pr(Bouncing) increased as
The coincident discs contrast momentarily decreased
The discs period of coincidence lengthened
The added sound increased in intensity
The sound occurred near the time when the discs coincided.
Fig. 2. Psychometric functions for one subject. Arrows
mark stimuli that produce pr(bouncing) = 0.21 and 0.33.
Putting cues together: 1+1=?
We added the four cues together, in pairs or in trios, using both
cue strengths, pr(bouncing)= 0.21 or 0.33, which had been
estimated previously for each subject.
Note that the two auditory cues (Timing and Duration), were
mutually exclusive and could not be combined. This left 37
different combinations of cues. Each combination was
presented 24 times in random order to the 10 subjects. Subjects
responses to various combinations were used to evaluate
alternative models of sensory integration.
The first model, Winner Take All, incorporates a nonlinear, max operator:
pr(b1 ,....n ) = max[( pr(b1 ),..., pr(bn )]
where max returns the highest probability associated with any
cue. The model asserts that pr(bouncing) is controlled solely by
strongest cue present in any combination. Fig. 3A plots the
obtained data against predictions from Winner Take All.
(Predictions are shown by the red Xs.) This model clearly fails.
Fig. 3. Observed pr(bouncing) vs. pr(bouncing)
predicted by various alternative models.
Three nested, linear sum models
We evaluated three other, nested models. Each assumes that
influences from various cues sum linearly, but the models differ
in their depictions of the summing process.
The Equal Weights model is structurally simple:
We assumed that streaming is the default perceptual response to
our basic stimulus, and that departures from default are
governed by evidence -- unimodal or multimodal -- of a
perturbation in the discs trajectories.
Not all putative cues actually affect the bistable percept.
For example, pr(bouncing) is unaffected when the discs deform
so as to simulate a collision between non-rigid objects. We
cannot rule out the possibility, though, that this ineffective cue
might gain power when combined with other cues.
The quality of fit achieved by the Cue-Specific Weights
model was good, but showed a small, systematic error at the
highest predicted values (Fig. 3D). The error suggests that the
simple, linear summation model may have ignored a genuine,
but small, non-linear interaction among cue effects.
It seems that all cues lose some potency when placed in
combination. The loss is most substantial for the two time-based
cues, Duration and Timing. It remains to be seen whether this
result is merely coincidental, or signifies a genuine difference
between time-related cues and other cues.
The reduction of individual cues influence in mixture is
qualitatively consistent with a recent fMRI study. Using a
stimulus like ours, Bushara et al. (2003) presented a brief sound
when the moving discs coincided. Brain activation patterns
associated with the two perceptual outcomes, streaming and
bouncing, suggested a competitive interaction between a system
of multi-modal regions and a system of predominantly unimodal
regions. Activation in unimodal regions diminished when the
sound promoted bouncing, which seems to parallel the failure of
complete summation in our results. As Bushara et al. used just
one value for sound timing and intensity, other comparisons are
Finally, to avoid possible saturation, our individual cues
were relatively weak, which maximizes linear or near-linear
summation. We cannot say whether linear summation would
continue to hold with stronger cues.
pr(b1 ,....n ) = wi=1 pr (bi)
When w=1, the model has no free parameters: the prediction for
is given by the sum of the component
probabilities. This prediction is shown by the solid line in Fig.
3B. With RMSD=0.058, it fares more poorly than Winner Take
All. Treating w as a free, scaling parameter improves the fit
(shown by dashed in Fig. 3B), with RMSD=0.018.
Although all cues had been adjusted to produce equal effects
individually, when placed into combination, some cues gained,
while other cues lost influence. Generally, the two visual cues
retained stronger influence. Therefore, we tested a ModalitySpecific Pooling model, which sums cues effects within separate
modality-specific pools. This model is described as:
pr(b1 ,....n ) = w A i=1 pr (bA i ) + wV i=1 pr (bV i )
where Vi and Ai are the ith visual and auditory cues, and wM is the
weight for modality M. Weights were determined by multiple
linear regression with dummy variables; interaction terms were
set to zero. The weight for visual cues was 5x the weight for
auditory cues. The two free, modality-specific parameters
produced a slightly-improved fit (Fig. 3C), with RMSD = 0.015.
The final model, Cue-Specific Weights, added two
additional free parameters, assigning each of the four cue types its
own weight, wi:
pr(b1 ,....n ) = w i i=1 pr (bi )
The additional free parameters reduced RMSD to 0.073 (Fig. 3D).
Information Criterion, which takes account of a
models degrees of freedom when evaluating its goodness of fit,
identified the Cue-Specific Weights model as far superior to the
other nested, linear models.
Webers Law and Cue Impact
The detectability of a pause or change in motion varies with
stimulus velocity, in accord with Webers Law (Dzhafarov et al,
1993). If perceptual evidence governed the bistable percept, the
impact of pause duration should also vary with velocity.
To test this idea, we measured pr(bouncing) with twenty
combinations of pause duration and disc speed. Normalized for
the distance that the discs would have traveled during their
pause, the results are shown in Fig 4. The pauses impact is
governed by perceptual quality, not by physical value alone.
Fig. 4. pr(bouncing) for
varying combinations of
disc speed and pause
duration, normalized for
distance discs would
have traveled during
pause had motion not
account of pause
duration and disc speed.
Bushara, K.O., Hanakawa, T., Immisch, I., Toma, K., Kansuku, K. & Hallett, P.M.
(2003). Neural correlates of cross-modal binding. Nature Neuroscience 6, 190195.
Dzhafarov, E.N.,, Sekuler, R. & Allik, J. (1993) Detection of changes in speed and
direction of motion: Reaction time analysis. Perception & Psychophysics, 54,
MacKay, D.M. (1958) Perceptual stability of a stroboscopically lit visual field
containing self-luminous objects. Nature 181 507-508.
Rui, Y., Gupta, A. & Acero, A. (2000) Automatically extracting highlights for TV
baseball programs. In Proceedings of ACM Multimedia, L. A., Pp. 105-115.
Sekuler, A.B. & Sekuler, R. (1999) Collisions between moving visual targets: What
controls alternative ways of seeing an ambiguous display? Perception 28, 415432.
Sekuler, R., Sekuler, A.B., & Lau, R. (1997) Sound alters visual motion perception.
Nature 385, 308.
Tripathy, S.P. & Barrett, B.T. (2003) Gross misperceptions in the perceived
trajectories of moving dots. Perception 32, 1403-1408.
Acknowledgments & information
We thank Takeo Watanabe, Larry Abbott, Yuko Yotsumoto, and
Allison B. Sekuler for excellent suggestions. Victoria Wong is
now at the Medical School of the University of Hawaii. For a a
QuickTime version of the basic stimulus, or a full description of
experimental details, e-mail: [email protected]
F10 Alcohol F11Opioids F12 Cannabinoids F13 Sedatives/hypnotics F14 Cocaine F15 Other stim.,incl. caffeine F16 Hallucinogens F17 Tobacco F18 Volatile solvents F19 Mult drug use & other psycho. substances 4th digit of these codes: .0 Acute intoxication .1 Harmful use .2...
* IGS - International GNSS Service * Slr - satellite laser ranging Doris - Doppler Orbitography and Radiopositioning Integrated by Satellite * IERS - International Earth Rotation and Reference Systems Service SINEX - Solution Independent Exchange Format * This is...
The culture that developed in Nubia was heavily influenced by Egyptian political, religious and trade activities for much of it's history, most notably in the later Napatan/Meroitic kingdom, but Egyptian influence is seen even in the period of the kingdom...
Gresham and Defoe (underwriters):The Origins of London Marine Insurance Dr Adrian Leonard, Affiliated Researcher, Centre for Financial History, University of Cambridge. Marine insurance is a financial transaction designed to reduce by sharing or to transfer the risk of economic loss...
Beliefs - Perceptions about attitude object. Feelings - Positive or negative evaluations of the attitude object. Behavioral Intentions - Motivation to engage in a particular behavior towards attitude object. Relationships Between Values, Attitudes, Moods, and Emotions.
Woodland Hills. Physician practices and clinics ... Dedicated care and leading-edge technology for individuals with spinal-cord and brain injury, stroke, cancer, amputation, neurological disorders and musculoskeletal and orthopaedic conditions ... UCLA Health Innovative Patient Engagement/Care ...
Ready to download the document? Go ahead and hit continue!