undefined

Buffl

0. Visual Data Analytics

von Carina S.

3 types of resource limitations (Muzner 2014)

Visualization designers must take into account three very different kinds of resource limitations: those of computers, of humans and of displays
- Computational limits: Processing time / system memory
- Display limits: Number of pixels, Information density (ratio of used space vs unused whitespace)
- Human limits: Perception, attention and memory (e.g. change blindness)

Explain the visualization pipeline. What are the four stages?

Visualization Pipeline

A general model of visualization process with 4 stages where the user can interact with the process at each stage
4 stages: data acquisition -> filtering / enhancement -> visualization mapping -> rendering

Explain the data acquisition stage. What are three general cases?

Visualization Pipeline

3 general cases: simulation, data bases, sensors
with this 3 cases you get raw data as a result which then will be visualized
Examples: global climate simulation, images of particles in fluid, weather radar data,…

Explain the filtering / enhancement stage. Give at least two examples.

Visualization Pipeline

process to obtain useful data (e.g. 3D volume) / derived data computed from the raw data
Examples:
- data format conversion
- co-registration of data sets
- resampling to grid
- interpolation / approximation of missing values
- data reduction
- clipping / cleaning / denoising
- etc.

Explain the visualization mapping stage. Give at least two examples.

Visualization Pipeline

Derived data that was computed during the filtering / enhancement step now is mapped to some rederable representation
Examples:
- scalar field mapped to (->) isosurface
- 2D field -> height field
- etc.
Decide
- which parts of the data are shown?
- How to represent them?
Graph.primitives: points, lines, surface, volumes
Visual channels: color, texture, transparency

Explain the rendering stage. Give at least two examples.

Visualization Pipeline

from rederable representation to display images / video
Generate 2D image / video
- viewpoint specification
- visibility calculation
- lighting / shading, composition
- compute values at sampling points

In which stage of the visualization pipeline happens resampling to a regular grid?

Visualization Pipeline

Filtering / enhancement step

In which stage of the visualization pipeline are the viewpoint and lighting parameters specified?

Visualization Pipeline

Rendering stage

In which stage of the visualization pipeline happen lighting and shading?

Visualization Pipeline

Rendering stage

In which stage of the visualization pipeline are colors assigned to every voxel?

Visualization Pipeline

Visualization mapping stage, for instance using a transfer function to assign color and opacity to each voxel.

In which stage of the visualization pipeline happen smoothing and noise suppression?

Visualization Pipeline

Filtering and enhancement stage (?)

Discuss independent vs. dependent variables in data. Give at least two examples each.

Data Representation

independent variables refer to the dimensionality of the domain of the problem
- e.g. 2D or 3D Space, time
- Do not depend on anything else
Dependent variables refer to the type and dimension of the data to be visualized
- e.g. temperature, density values, velocity vectors
- Depend on the location where we measured those, e.g. measured in respect to the independent variables

What are the independent and dependet variables in a 3D spatial curve 𝟇 : ℝ -> ℝ^3

Data Representation

R would be independent variable and R3 woul be the dependent variable

Basically a function of a parameter and the output would always be a 3D location for instance

What are independent and dependent variables in a 3D vector field?

Data Representation

We have 3D space = independent variables

3D vectors given at every location so the vectors itself would be the dependent variables

What type of attribute are the following: Categorical, ordinal, or quantitative?

a) Type of cheese (e.g. Swiss, Brie)

b) Tire pressure (e.g. 2.3 bar, 2.5 bar)

c) first name (e.g. Alice, Bob)

d) Unemployment rate (e.g. 6%, 10%)

e) T-shirt sizes (e.g. medium, large)

Data Representation

TODO

Draw an illustration of a Cartesian grid. Describe how such a grid is different from a regular grid. Which information needs to be specified explicitly for such a grid?

Data Representation

Cartesian grid = equidistant grid
Spacing is constant in x and y (or z in 3D) dimension
Difference to uniform / regular grid: same spacing within one direction, but spacing in x and y direction are different
Information that has to be stored: Index, coordinates
Positions of the individual cells can be computed from the indices - don’t have to be stored
Neighborship information is implicit

What is a curvilinear grid? How is it characterized? How is it different from an unstructured grid? Which information needs to be specified explicitly for such a grid?

Data Representation

non-orthogonal grid
Grid points / position of individual vertices specified explicitly
still have regular grid cells so neighborhood information is still implicit
Unstructured grid: both location of the individual vertices and neighborhood information has to be specified explicitly
- cells are tetrahedra or hexahedra

How to proof for a triangulation that it is a Delaunay triangulation?

Draw a circle aroung each of the triangles and whenever we draw such a circle we have to make sure that no vertex from another triangle is inside that circle. If this would happen then the Delaunay property is violated and its locally non-Delaunay and we would have to perform e.g. an edge-flip operation to get it Delaunay.

Visualization (Definition)

The use of computer-supported, interactive, visual representations of data to amplify cognition

Medical Visualization

Preoperative planning of tumor resection, Virtual fetoscopy (4D Ultrasound), MRI scans,…

Why visualization?

lets you see things that would rather go unnoticed (data trends, outliers, dependencies, etc.)
gives answers faster
lets you interact with your data, study causes and effects, etc
helps to deal with increasing size and diversity of data
produces pretty, informative and interactive pictures

Lie factor

Size of effect shown in graphic / Size of effect in data

What is Visualization good for?

Visual exploration (Nothing is known about the data)
- find the unknown / unexpected
- generate new hypothesis
Visual analysis (confirmative vis.) (There are hypotheses)
- confirm or reject hypotheses
- information drill-down
Presentation („Everything“ is known)
- effective / efficient communication of results

The visible human project

bodies were frozen in a special material to preserve tissues and organs
Sections were „shaved“ off the frozen block micro-thin layers to expose underlying tissues
a picture is taken
A „stack“ of 2D images is obtained

Passive Visualization Scenario

complexity / technical demands: low
benefits, possibilities: Low

Interactive Visualization Scenario

complexity / technical demands: middle
benefits, possibilities: middle

Computational / Visual Steering Scenario

complexity / technical demands: high
benefits, possibilities: high

Characteristics of data values

Attribute types (quantitative vs. qualitative)
Domain (continuous vs. discrete)
Value range (includes precision of values)
Data type (categorical, scalar, vector, tensor data)
Dimension (number of components)
Error and uncertainty
(physical) interpretation

Attribute types

Quantitative: numerical, measurable
- e.g. length, mass, temperature
- metric scale - allows measure of distance
- Continuous (real) or discrete (distinct & separate values)

Qualitative: categorical data, not measurable

No metric scale; cannot be measured
Requires a subjective decision in order to be categorized
Discrete
Nomial
- No natural ordering or indication of values, only equivalence and membership (=, ≠)
- e.g. eye color (blue, green, brown)
Ordinal
- Logical order relation (<, >) but no relative size or degree of difference
- e.g. judgement of size (small, medium, large), Attitudes (strongly disagree, disagree, neutral, agree, strongly agree)

Categorical data: Values from a fixed number of categories

Scalar data: Given by a function

Vector data: Represent direction and magnitude and given by an n-tupel; 2D vector field where every sample represents a 2D vector

Tensor data: A multi-dimensional matrix

Scientific Visualization

Deals with the reconstruction of a continuous real object from a given discrete representation
Data that has some physical or geometric correspondence

Information Visualization / Visual Analytics

Deals with data that is discrete and more abstract
Does not have a physical or geometric correspondence
Symbolic, tabular, networked, graphs, textual information

Scattered Data

Grid-free data
data points given without neighborhood relationship
influence on neighborhood defined by spatial proximity
Scattered data interpolation

Isocontours

Curves on which all points have a certain (constant) value
Are hyperbolas, not lines

Radial basis function

Construct a continuous function f from a given set of points and values which approximates the given values
Independet of dimension of parameter domain (1D / 2D / 3D)
Function represented as weighted sum of N radial functions
Nearby points have higher influence than far-away points
Each radial function is centered around a data point. Values decrease quickly the further away from this point / the functions center
Interpolation: we want a curve which is going through the initial data points and changes smoothly in between them
Drawbacks of radial basis functions
- Every sample point has influence on whole domain
- Adding a new sample requires re-solving the equation system
- Computationally expensive (solving a system of linear equations)
What can we do?
- Find a different radial function
- Give up finding a smooth reconstruction and try finding a piecewise (local) reconstruction function

What is a good triangulation?

A measure for the quality of a triangulation is the aspect ratio of the so-defined triangles
Avoid long, thin triangles
Make triangles as „round“ as possible
Maximize the minimum angle in the triangulation
Maximize radius_of_in-circle / radius_of_circumcircle ratio
A Delaunay triangulation is an optimal triangulation

Delaunay triangulation

The circumcircle of any triangle does not contain another point of the set
- Maximizes the minimum angle in the triangulation
- Such a triangulation is unique (independet of the order of samples) for all but trivial cases
Building a Delanuay triangulation from initial, non-optimal triangulation: successively improving the initial triangulation via local operations
- Edge-flip operation:
  - An edge is local Delaunay if there exists an empty circumcircle / The circumcircle of any triangle does not contain another point if the set
  - If an edge shared by two triangles is illegal, a flip operation generates a new edge that is legal

If a triangulation is locally Delaunay everywhere -> globally Delaunay

Voronoi Diagram

Problem: Looking for nearest neighbor
Partitions domain into Voronoi regions: Each Voronoi region contains one initial sample - the Voronoi samples
Points in Voronoi region are closer to respective sample than to any other sample (blue circle)
Centers of circumcircles of Delauney triangulation (borwn circle)

The geometric dual (topologically equal) of Delaunay triangulation (blue = Delaunay, yellow = Voronoi)
Points in a Voronoi region are closer to the respective sample than to any other sample

Isolines (2D)

An isoline (iso-contour) consists of all points at which the data has a specific value c: {(x,y) | f(x,y) = c} (Given a 2D scalar function and a scalar iso-value c)
Can be seen as a special kind of data condensation
Isolines are always closed curves (except when they exit the domain)
Isolines never (self-) intersect, thus they are nested
Isolines are always orthogonal to the scalar field’s gradient
The true isolines within a cell are hyperbolas

Marching-Cubes Algorithm

Approximates the surface by a triangle mesh
Surface vertices are found by linear interpolation along cell edges
Efficient triangulation by means of lookup tables
the standard geometry-based surface extraction algorithm for 3D scalar field
Computes isosurface for specific iso-value
Cell consists of 8 vertices
indices: (i+[0,1], j+[0,1], k+[0,1])

Consider a cell (defined by 8 vertices with associated data values) independently
Classify each vertex as inside or outside (outside the surface: value < iso-value, inside the surface: value => iso-value)

Use the binary labeling of each cell to compute an index: outside = 0, inside = 1
Get per-cell triangulation from index: look up the triangulation for every index from a pre-computed table
Interpolate the edge location : for each triangle edge, find the vertex location along the edge using linear interpolation of the vertex values
Compute gradients : Calculate normals at each cube vertex (via finite differences), and interpolate along the edges
Consider ambiguous cases: use asymptotic decider as in 2D for this
Go to next cell

Summary:
- 256 Cases, Reduce to 15 Cases by symmetry
- Causes holes if arbitrary choices are made
- Up to 5 triangles per cube
- Not that triangulation is only approximation of true isosurfaces produced by trilinear interpolation

Voxel

Data values are initially given at vertices of a 3D grid = voxels (volume elements)
Voxel = point sample in 3D

Phong’s illumination model (+ 3 components)

Considers ambient light and point lights as well as the material color and reflection properties
Ambient light: background light, constant everywhere
Diffuse reflector: reflects equally into all directions
Specular reflector: reflects mostly into the mirror direction

Lighting

Necessary to emphasite iso-surface shape: simulate reflection of light and effect on color
Phong’s illumination model

How can a perfect mirror be simulated via the Phong illumination model?

Ambient light

(Formel!)

C = ka Ca Od

Background light
constant everywhere
ka = ambient reflection coefficient aus [0,1]
Ca = Color of the ambient light
Od = object color

Diffuse reflection

(Formel!)

Scatters light equally in all direction
C = kd Cp Od cos θ bzw. C = kd Cp Od (l * n)
- kd = diffuse reflection coefficient aus [0,1]
  - if kd = 0 -> black, kd = 1 -> am hellsten
- Cp = color of the point light
- Od = object color
- cos θ = angle between light vector l and normal n
if l = n meaning light is precisely above the point and θ = 0 the cos 0° = 1 -> 100% intensity
if light shines onto point in a 45° angle -> cos 45° ≈ 0.7 -> 70% intensity
if 90° angle = 0% intensity
highest diffuse reflection when normal vector = light vector, point precisely below light source

Specular reflection

(Formel!)

Highlight = reflection of light source
Glossy surfaces
Reflects mostly into the mirror direction
view dependent!
C = ks Cp Od cos^n 𝞅 bzw. C = ks Cp Od (r * v)^n
- ks = specular reflection coefficient aus [0,1]
- Cp = color of the point light
- Od = object color
- cos 𝞅 = angle between the reflected light ray r and the vector to the viewer v
- ^n = shininess factor (controls extend of highlight)
  - the larger n the smaller the highlight / intensity

highest specular reflection when reflected vector = view vector

How can a perfect mirror be simulated via the Phong illumination model?

when the expont in the Specular reflection formula (shininess factor) gets close to infinity we have almost a perfect mirror

Calculate vector to the viewer (Phong illumination model)

Camera position vector minus surface point vector

Calculate light vector (Phong illumination model)

Vector of position of a point light source minus vector of surface point

Where does volumetric data come from?

medical scanners e.g. CT scan
automotive engineering

Volume rendering techniques

Techniques for 2D scalar fields (e.g. slicing)
Indirect volume rendering techniques (e.g. surface fitting)
direct volume rendering techniques

Indirect volume rendering techniques

e.g. surface fitting
Convert / reduce volume data / raw data to intermediate representation (surface representation) first, which can then be rendered with traditional techniques
MC-algorithm (Marching-cubes algorithm)

Direct volume rendering (DVR) techniques

consider data as a semi-transparent gel with physical properties and directly get a 3D representation of it
e.g. Ray-casting
Volume material attenuates reflected or emitted light
Get a 3D representation of the volume data taking into account emission and absorption (without making an intermediate representation first)
considers the physics of light transport in a dense medium
- Optical properties are mapped to each voxel (emission = color, absorption = opacity)
- The light reaching the viewer is simulated by ray casting

Slicing

Display volume data, mapped to colors, on a slice plane

Iso-surfacing

Generate opaque / semi-opaque surfaces (e.g. via the MC-Algorithm)

Transfer function

performs mapping of data values to visual properties like color and opacity
associate distinct materials (value ranges) to disting properties (color & opacity: as (RGBa))
assign a different color to each scalar value (for scalar data)

Opacity

Opacity alpha = 1 -> completely opaque
Opacity alpha = 0 -> completely transperent

Volume rendering integral

Ray-casting (Direct volume rendering)

Numerical approximation of the volume rendering integral
- A ray is cast into volume for each output pixel -> volume is resampled at equidistant intervals along the ray (integral as a sum over samples) -> Sample values are tri-linearly interpolated -> apply transfer function
Or Volumetric ray integration using front-to-back strategy (alpha-compositing), first-hit, average or maximum

Ray-casting method (Direct volume rendering)

Defines a virtual image plane where viewer is looking through
Cast a ray through every pixel on the screen

for each pixel on the image plane

compute entry- and exit-point in volume

while current position inside volume

read density at current position

apply transfer function: scalar value -> color + alpha-value

compute shading

apply compositing

compute new position along ray

end while

set pixel color in image plane

end for

Volumetric compositing (Ray-casting)

accumulation of color and opacity along rays

Variations of compositing schemes

alpha-compositing
Surface rendering / first hit
Average
Maximum

Surface rendering / first hit

stop ray traversal if an iso-surface is hit (larger than a certain threshold), and shade the surface points
produces same result as marching cubes, but with higher accuracy

Average compositing scheme

simply accumulate colors but does not account for opacity
values along the viewing ray are averaged
produces an x-ray image

Maximum intensity projection (MIP) scheme

only takes the maximum color along the ray and displays it
doesn’t account for opacity
Often used for magnetic resonance angiograms
good to extract vessel structures

Problems when doing direct rendering: Sampling artifacts

Too few samples along the ray
Interrupted artifacts -> no smooth surface of the visualization
Solution: Increase sampling rate to Nyquist frequency -> at least 2 samples per voxel
remove artifacts by stochastic jittering of ray-start position

Direct volume rendering vs. Surface rendering

Direct volume rendering

Direct representation
Conveys volume impression
Often realized in software (slow?!)
Transfer function specification

Surface rendering

Indirect representation
conveys surface impression
Hardware supported rendering (fast?!)
Iso-value-definition

How do you get values along the viewing ray (from volume data)?

Data in volumetric datasets are usually given in some kind of grid
ray casting: interpolation scheme e.g. trilinear interpolation or some other higher-order interpolation scheme

Flow visualisation

Visualize stuff that is moving, e.g. wind tunnel / wind fields, weather / climate simulations, aerospace / car / ship design

Flow visualization - data sources

Flow simulation:
- Design of ships, cars, airplanes
- Weather simulations (e.g. atmospheric flow)
- Medical blood flows
Measurements
- wind tunnel
- schlieren imaging
Modeling
- Differential equations systems (dynamical systems)

Main application of flow visualization

Motion of fluids (gases, liquids)
Geometric boundary conditions
Velocity / flow field v(x,t)
Conservation of mass, energy, and momentum
Navier-Stokes equations
Computational fluid dynamics (CFD)

Flow visualization - classification

Dimension (2D or 3D)
Time-dependency: steady vs. time-varying flows
Direct vs. indirect flow visualization

Steady (time-independent) vs. time-varying (unsteady) flow

Steady (time-independent) flow:

flow static over time
e.g. laminar flows
simpler interrelationships

time-dependent (unsteady) flow

flow changes over time
e.g. turbulent flows
more complex interrelationships

Flow visualization - Approaches

Direct flow visualization (arrows, color coding,…)
Geometric flow visualization (stream lines/surfaces,…)
Sparse (feature-based) visualization
Dense (texture-based) visualization

1. Direct flow visualization

e.g. color coding, arrow plots, glyphs
Gives overview on current flow state
Visualization of vectors

Glyphs

Visualize local features of the vector field
Map vector or curl to arrow glyphs
Can visualize more features of vector field, e.g.using Velocity, Curvature, Rotation, Convergence,…

Flow visualisation with arrows

Vector per grid point pointing into the flow direction
use arrow length and / or color to highlight special regions

Arrows in 3D: Advantages - Disadvantages

Advantages

Simple
3D effects

Disadvantages

Ambiguity
Difficult spatial perception (1D-objects in 3D)
Inherent occlusion effects
Poor results if magnitude of velocity varies significantly and changes rapidly

-> Use 3D arrows of constant length and color code magnitude

2. Geometric flow visualization

Use intermediate representation (vector-field integration over time)
Visualization of temporal evolution (also consider data over time)
Stream lines, path lines, streak lines

Basic idea: trace particles along characteristic trajectories and map trajectories to particles, lines, balls, bands

Types of characteristic lines (Geometric flow visualization)

Stream lines: trajectories of massless particles in a “frozen” (steady) vector field
- trajectories of massless particles at one time step
- doesnt show its movement over time but within a frozen flow / vector field
Path lines: trajectories of massless particles in (unsteady / time-varying) flow
- follow one particle through time and space
Streak lines: trace of dye that is continuously released into (unsteady / time-varying) flow at a fixed position
- connect all particles that started at the same seed point
- a new particle is continuously injected at the same seed point
- all existing particles are advected and connected (from youngest to oldest)

-> Comparison of path / streak / stream lines: Identical for steady flows

Stream ribbons (flow oriented)

we liked to see places where the flow twists (vortices)
Trace two close-by particles (keep distance constant)
Or rotate band according to curl

Streak surface

simultaneously release aprticles along a seeding structure (line) and connect all them to form a surface
e.g. particle-based, triangulated, or semi-transparent streak surface

Characteristic lines are tangential to the flow. What does that mean

means that the line tangent (1st derivate) is aligned to the vector field / is in vector field direction

Particle Tracing on Grids - most simple case: Cartesian grid

(Basic algorithm)

Select start point (seed point)

Find cell that contains start point

While (particle in domain) do

interpolate vector field at current position

integrate to new position

find new cell

draw line segment between latest particle positions

EndWhile

Stream line placement in 2D

irregular results when using regular grid
Evenly-spaced streamlines:
- Idea: streamlines should not get too close to each other
- Choose seed point with distance d_sep from existing stream line
- forward- and backward-integration until distance d_test
Stop stream line integration
- when distance to neighboring stream line <= d_test
- when stream line leaves flow domain
- when stream line runs into fixed point (v(x*)=0)
- When stream line gets too close to itself
- after a certain number of maximal steps
the smaller d_sep the nearer together the streamlines are
the smaller d_test in relation to d_sep the nearer together the streamlines are

Streamline placement on surfaces

Image-space technique
- Vector field is first projected to 2D image
  - 2D image is scanned at intervals d_sep
  - Seedpoint is found and stream lines are traced in that region
  - Scanning continues until seedpoint in new region is found
- Seeding and integration happen in image space
Discontinuity detection
- Stop stream line integration when z-depth drops to zero (edge of model) or when z-depth changes too abruptly (edge of overlapping regions)

What challenges does arrow-based direct flow visualization have?

due to the perspective it is unclear in which direction the vector is pointing so there is some ambiguity assigned to having arrows in 3D for instance, or there can be occlusion effects

Give two examples for geometric (integration-based) flow visualization. How do these techniques relate to direct flow visualization?

Examples:

Streamlines
Streaklines
Pathlines

-> When using direct flow visualization we directly show the flow.

-> With geometric (integration-based) visualization we consider movement of particles along trajectories in the flow.

True or False: The Jacobian matrix at a point in a constant 3D vector field has non-zero elements on the main diagonal.

False

True or False: If the Jacobian matrix at every point in a 3D vector field is the identity matrix, then the vector field is divergence free.

False

True or False: The divergence at every point in a 3D vector field is a scalar value.

True

True or False: Streamlines in a steady 3D vector field never cross.

True

True or False: Path lines in a time-varying 2D vector field never cross.

False

3. Sparse (feature-based) visualization

Global computation of flow features
Vortices, shockwaves, vector field topology

Vortices

one of the most prominent features
Important in many applications (turbulent flows)
No formal, well accepted definition yet (“something swirling”)

Shock waves

characterized by sharp discontinuities in flow attributes (pressure, velocity magnitude,…)

Vector field topology (2D)

Idea: do not draw “all” stream lines, but only the “important” ones
show only tological skeleton
- Connection of critical points
- Characterization of global flow structures
Critical points: singularities in vector field such that v(x*) = 0 (source, saddle, sink)
- Points where magnitude of vector goes to zero and direction of vector is undefined
- Stream lines reduced to single point
- Type of critical point determines flow pattern around it

Vector field topology (3D)

Critical points in 3D

More complicated
Line and surface separatices exist

Saddle connectors in 3D (Vector field topology (3D))

The intersection of the separation surfaces of the two saddles is the saddle connector

Dense (Texture-based) flow visualization

Global method to visualize vector fields
Dense sampling
- better coverage of information
- Critical point detection and classification
- (Partially) solved problem of seeding
Flexibility in visual representation
- Good controllability of visual style
- From line-like (crisp) to fuzzy

Line Integral Convolution (LIC)

Global visualization technique (not only one particle path)
Dense representation of flow fields
Convolution along characteristic lines -> correlation along these lines
for 2D and (3D flows)

Start with a random texture (white noise)
Smear out the texture along trajectories of vector field
Results in low correlation between neighboring lines but jigh correlation among them (in flow direction)

Algorithm for 2D LIC

look at stream line that passes through a pixel
Smear out -convolve- noise texture in direction of vector field (along stream line)
LIC is a convolution of
- a noise texture T(x,y)
- and a smoothing filter

Algorithm:

Stream line containing the point
randomly generated noise texture
compute intensity using an integral
smoothing filter kernel, normalized and usually symmetric

Influence of filter length: The bigger the L the finer the lines

Filtering by convolution

Sliding a function g(x) along a function f(x)
Function f is averaged with a weight function g
-> Horizontal / Vertical Gaussian blur

Oriented LIC (OLIC)

Visualizes orientation (in addition to direction)
uses a sparse texture; i.e. smearing of individual drops
Asymmetric convolution kernel

3D LIC

only good if non-relevant data is discarded

True or False: LIC is a local method for visualizing a vector field

False, its a global method

True or False: The larger the extent of the convolution kernel used in LIC, the lower is the correlation between adjacent intensity values along a streamline

False

True or False: LIC images show high correlation between the intensity values at adjacent streamlines

False

True or False: LIC is restricted to 2D vector fields

False

True or False: The convolution kernel used in LIC must be symmetric

False

Mapping techniques - Graphical primitives

represent data items or links
points, lines, areas, surfaces
representation of links between data items: Connection, Containment

Mapping techniques - Visual channels

control appearance of graphical primitives based on data attributes
Position (horizontal, vertical, both), Color, Slope, Size (length (1D), 2D area, 3D volume), Shape

Effectiveness principle

some visual channels are better than others
encode most important data attributes with most effective / accurate channels

Expressive mapping

match type of visual channel to data type

Properties of visual channels

Pop-out (emphasize important information)
Discriminability (how many usable steps?)
Separability (judge each channel independently)
Relative vs. absolute judgement

-> Perceived color is highly context depenent

-> percetion is relative

-> use popout to emphasize data

-> choose carefully with the mapping and color

Pop-out

Preattentive processing: automatic and parallel detection of basic features in visual information (200-250 msec)
Speed independent of distractor count
Works on many individual channels

Discriminability

How many usable steps?
Must be sufficient for number of discriminable bins
we can only distinguish a limited number of colors / brightness level

Separable vs. integral visual channels

Relative vs. absolute judgements

Perception highly context-dependent
Perceptual system mostly operates with relative judgements, not absolute ones
Weber’s Law: just-noticeable difference is a fixed percentage of the magnitude of the stimuli (e.g. bar length) -> if i have a visual stimuli, to see the difference between the two, if the difference is smaller than a certain percentage of the total, you cannot see the difference anymore

Diagram techniques

Categorical + quantitative data: Bar / pie chart, stacked bars
Time-dependent data: Line graph, ThemeRiver, Horizon graph
Single and multiple variables: Histogram, scatterplot, parallel coordinates, Glyphs, color mapping

Quantitative data

numerical, measurable
objective data produced through a systematic process, not subject to interpretation (e.g. lenght, mass, temperature)
Metric scale: allows measure of distance
Continuous (real) or discrete (distinct & separate values)

Qualitative data

categorical, not measurable
no metric scale, cannot be measured
Requires a subjective decision in order to be categorized
Discrete

Bar chart

Attribute 1: categorical -> horizontal position
Attribute 2: quantitative (dependent) -> length / vertical position
Bars should always start at zero!
Bars support comparison

Pie chart

Attribute 1: categorical -> color
Attribute 2: quantitative (dependent) -> angle
angle / area judgement less accurate than bar length
often bar chart better choice

Stacked bar chart

Quantitative data wrt 2 categorical vars (horizontal & vertical)
Investigate part-to-whole relationship (100%)
Length and color hue

Parallel sets

Quantitative data wrt. multiple categorical attributes
Shows connections and proportions

Given a 2D scatterplot using color, point size, and position to encode data. With respect to separability of visual channels there is…

A) some interference between color and position

B) some interference between color and point size

C) some interference between point size and position

D) some interference between all visual channels

E) no interference betweem the different channels

Which ranking sorts the visual channels for encoding quantitative data according to accuracy, starting with the highest accuracy?

A) angle/tilt - position - luminance

B) position - luminance - angle/tilt

C) length - 2D area - curvature

D) position - 2D area - length

Which statement on pre-attentive processing or pop-out is true?

A) it works on many combinations of visual channels

B) Speed depends on the number of distractors

C) Automatic and parallel detection of basic visual features

Line Graph

quantitative data on common scale(s) wrt. time
Connection between points - trends, structures, groups
banking to 45 degrees
- Perecptual principle: most accurate angle judgment around 45°
- Pick aspect ration (height/width) accordingly
Lines imply trends

ThemeRiver

Thematic changes in documents
Occurrence per topic / category mapped to width of river band
less distorted around center
Rearranging bands

Horizon graph

Reduces vertical space without losing precision
Split vertically into layered bands
Collapse color bands to show calues in less vertical space
Optional mirroring of negative values

What is the pop-out effect / pre-attentive processing? How can it be used?

pre-attentive processing: automatic and parallel detection of basic features in visual information (200-250 msec)
independent of the number of distractors
usually only works on individual channels, when having a combination of channels it usually requires a serial search instead of pre-attentive processing

Sort the following visual channels according to how accurately humans can compare them starting with the highest accuracy: 2D area – length – curvature – angle/slope

quantitative data: length - angle / slope - 2D area - curvature
qualitative / categorical data: different ranking but didn’t talk about it in the lecture

What is the difference between separable and integral visual channels?

two visual channels that are separable (color and position): we can focus on one group (either one color or one position) -> separable visual channels

integral visual channels: not possible to see the different channels separately

Name an example for fully separable / integral visual channels.

fully separable visual channel: color + position, fully separable, 2 groups each
fully integral visual channel: red + green make different colors, major interference, 4 groups total: integral color

Which visual channel(s) can be used in a bar chart? For what types of data?

Attribute 1: categorical -> horizontal position
Attribute 2: quantitative (dependent) -> length / vertical position

From a perceptual point of view, what works better: Bar charts or pie charts? Why?

Pie chart: usually estimate the angle or area
Bar chart: length and precision of the ending is read out
humans ar much better at comparing length and position instead of angle and ares -> Bar chart better

How do Parallel sets work? What kind of data can be shown?

quantitative data with respect to multiple categorical attributes
shows connections and proportions

How does the ThemeRiver work? Which visual channels are used for which type(s) of data?

usually have categorical data with a certain frequency (how often it appears)

How do Horizon Graphs work? How can you read out values at a position?

split an original graph into some layers and use color coding to distinguish different layers, those layers can be collapsed on top of each other

Histogram

Binning: group values into equally spaced intervals (bins)
Bin width affects representation

Box plot variations

shows summary statistics of a distribution (1 variable)
Probability density function (PDF)
- q1: lower quartile: 25% of data below
- q2: median
- q3: upper quartile: 25% of data above, 75% of data below
Interquartile range: between q1 and q3
Variations:
- Tukey’s box plot
- Tufte’s quartile plot
  - median = dot
  - easier to follow trend of median, more compact

Scatterplots

Show correlations between 2 dependent variables
Typically quantitative (measurable) data attributes
find trends, outliers, distributions, correlations, clusters,…
encode additional attributes by size, color, shape,…

Scatterplot matrix

show (all possible) combinations of attributes in a scatterplot matrix
Each row / column is one attribute
overview of correlation and patterns between data attributes

-> Brushing: mark data subset

-> Linking: highlight brushed data in linked views

-> Move / alter / extend brush

Parallel coordinates

Represent multiple data variables
each variable is represented by a vertical axis, which are organized as evenly spaced parallel lines
Data on each axis is normalized to min / max
One data sample is represented by a connected set of points, one on each axis
recognize patterns between adjacent axes
steep learning curve for novices
Axis ordering is major challenge: order by quality metrics

Line point duality

Points in scatterplot map to lines in parallel coordinates
points in parallel coordinates map to lines in scatterplot

Radar chart (star plot, spider chart)

Radial axis arrangement
Items are polylines
axes in center very crowded: too much information in the center

Function plot for 1D scalar field

showing single variable
1D curve
Mapping of a discrete set of points to a set of lines by connecting adjacent points

Height field for a 2D scalar field

function plot for 2 independent variables x and y and 1 dependent variable
2D surface, f(x,y) can be interpreted as height value at (x,y)
Mapping of a discrete set of points to a set of faces by connecting adjacent points

Glyphs

small independent visual objects that depict attributes of a data record
Discretely placed in a display space
Data attributes are represented by different visual channels (e.g. shape, color, size, orientation)
Visual channels should be easy to distinguish and combine
Mainly used for multivariate data

Star glyphs

A star is composed of equally spaced spikes, originating from the center
Length of spikes represents value of respective attribute
end of rays connected by line

Stick figures

2D figures with limbs
Data encoded by
- length
- line thickness
- angle between lines
recognize texture patterns: see changes looking at texture and neighborship, not individual glyphs but patterns that they form

Chernoff faces

Data attributes represented by features of a face (eye position, nose length, mouth form,…)
Problem:
- Faces are preceived holistically: we interpret mood of glyphs but its actually data visualization
- Efficiency?

Color

light is electro magnetic radiation
different wavelengths are perceived as different colors
- Human eye can only see light between 380nm and 780nm (visible spectrum)

Visual effect of chromatic light (light spectrum) can be characterized by 3 channels

Hue: dominant wavelength
Saturation: pureness, amount of white light
Luminance / Brightness: intensity of light

Color mapping

Emphasize a specific target in a crowded display (pop-out)
Group, categorize, and chunk information
Possible problems:
- Dependent on viewing and stimulus considtions
- distract the user when inadequately used
- Ineffective for color deficient individuals
- Results in information overload
Color maps can be
- categorical vs. ordered
- sequential vs. diverging
- discrete vs. continuous

Color mapping - Perceptual linear

equal steps in color map (i.e. magnitude of data) should be perceived equally
equal steps in color map don’t mean equal steps in reality

Color mapping - Perceptual ordering

ordering of data should be represented by ordering of colors
rainbow colormap is perceptually unordered

Things to know when mapping data to colors

Perceived color is highly context dependent
Size matters
vary luminance too
make sure contrast is high
Colors are more useful for qualitative statements
do not use color if it is necessary to read our precise values
use “good” color maps

Which visual channels can be used in a scatterplot besides position?

size

color
shape
…

How does a scatterplot matrix work? How can you see correlations?

used to show more than two variables: but the combination of a lot of variables
positive correlation: von links unten nach rechts oben
negative correlation: von links oben nach rechts unten

How does linking and brushing work?

brushing: mark (interesting) data subset
Linking: highlight brushed data in linked views

number of data point is selected in one plot of the scatter plot matrix and the corresponding data points are also highlighted in the other scatter plots in a scatter plot matrix

What does it mean when the lines between two axes in a parallel coordinates visualization meet in a point?

if the lines of a parallel coordinates plot meet in one point it means that we have a negative correlation between the two axes / attributes

What are glyphs? For which type of data are they typically used?

small independent visual objects that show attribtues of a data record
can be placed individually somewhere in the visualization in display space
different data attributes can be encoded by different visual channels of a glyph (e.g. shape, color, size, orientation)
visual channels should be easy to distinguish and combine
mainly used for multivariate data to show different data attributes of a data record

How do star glyphs / stick figures work? How do they encode the data?

star glyphs:
- show multiple attributes of a data record
- composed of equally spaced spikes that start in the center, and the length of wach of the spikes represent a value of a data attribute, the ends of the spikes are connected by lines
stick figures:
- data attributes can be mapped to the angle between the different limbs of the stick figure and the length / thickness of the individual limbs can be used to encode data
- usually not perceived individually but in combination
- recognize texture patterns

What are the advantages / disadvantages of a rainbow color map?

Disadvantages:

since it uses color hue to encode the information, some of the details may be lost
perceptually non linear, means that the same step in the data is not represented in the same perceived difference
perceptually unordered: no naturally ordering of the colors

Advantages:

many colors
good for categorical data, but not for quantitative data because of non-linear, perceptually unordered,…

What does it mean, when a visual channel (e.g. color) is perceptually linear / ordered?

perceptual linear: equal steps in color map (i.e. magnitude of data) should be perceived equally
perceptual ordering: ordering of data should be represented by ordering of colors

What are the characteristics of a sequential / diverging color map?

Sequential:
- are suited to ordered data that progress from low to high / min to max

Diverging:
- has neutral center
- put equal emphasis on mid-range critical values and extremes at both ends of the data range.

oben: sequential, unten: diverging

Visualizatin is good for…

Visual exploration
- find unknown / unexpected
- generate new hypotheses
Visual analysis (confirmative vis.)
- verify or reject hypotheses
- information drill-down
Presentation
- show / communicate results

Visual Analytics / Analysis

Visual Analysis of Scientific Data

Combines computational & interactive visual methods
multiple linked views
Interpret large & complex data
Drill-down into information
Find relations (“read between the lines”)
Detect features / patterns that are difficult to describe
Integrate expert knowledge

Multi-faceted Scoentific Data

Spatiotemporal data
Multi-variate / multi-field data (multiple data attributes, e.g. temperature or pressure)
Multi-modal data (CT, MRI, large-scale measurements, simulations, etc.)
Multi-run / ensemble simulations (repeated with varied parameter settings)
Multi-modal scenarios (e.g. coupled climate model)

Spatiotemporal data

Cartography, geovis, etc.
Linear vs cyclic time
automatic animations
Flow visualization
Visualize summary statistics

Multi-variate / Multi-field data

-> comes from 1 simulation / measurement device

-> multiple data attributes, e.g. temperature or pressure

Attribute views (scatterplots, parallel coordinates, etc.)
- Find patterns such as correlations or outliers
- lack spatial relationships of data
- which of the many data variables to show?
Volume rendering
- Difficult to see multi-variate patterns
- Layering and glyphs
- Feature-based visualisation (brushing, segmentation,…)
Clustering, dimensionality reduction, etc.

Multi-modal data

-> comes from different data sources / different modalities

-> CT, MRI, large-scale measurements, simulations, etc.

Various types of grids with different resolution
Coregistration and normalization
Multi-volume rendering
Visual data fusion
Comparative visualization

How are visualization, interaction & computer analysis combined?

Comparative visualization taxonomy

Side-by-side comparison (juxtaposition)
Overlay in same coordinate system (superposition)
Explicit encoding of differences / correlations

Navigation

Change item visibility
Change which items are visible
Camera metaphor
Zoom, pan, rotate (3D)

Automated viewpoint selection

-> Guided navigation between characteristic views

-> Based on information-theoretic measures

Compute rating for each view v_i and obejct o_j
Optimal viewpoint estimation based on obejct visibility, location in image, and distance to viewer
Animated transition
- View 1 optimal for o1 (emphasize o1)
- Overview optimal for o1 and o2
- View 2 optimal for o2 (emphasize o2)

Ranking / quality metrics

Automatically order views / axes by quality metrics
Enhance clustering, correlations, outliers, image quality, etc.

Overview + detail visualization

Spatially separate overview / detail (e.g. juxtaposed views)
User has to switch attention between representations

Focus + context (F+C) visualization

Seamlessly integrates focus / context in single visualization
Originally spaced distortion used
More space for focus
Keep context, without cropping away data outside of zoom area

Generalized F+C visualization

emphasize data in focus
keep context for orientation / navigation
focus specification, e.g. by pointing, brushing or querying

Line brush

Select function graphs that intersect with user-specific line

Similarity-based brushing

Select function graphs by similarity to user-sketched pattern
Similarity evaluated based on gradients (1st derivative)

Machine Learning Approaches

Supervised learning: leatning with a labeled training set
Unsupervised learning: discovering patterns in unlabeled data
Reinforcement learning: Learning based on feedback or reward

Semi-Automatic Labeling Tool (SALT)

Labeling of large time series data by domain experts
Integrates supervised & unsupervised segmentation methods
User can iteratively improve labeling

Algorithmic extraction of values & patterns

Dimensionality reduction
Aggregation, summary statistics
Clustering, classification, outliers, etc.

Clustering

Given some data points, we’d like to understand their structure
Given a set of data points with some notion of distance between points, group them into clusters such that
- members of a cluster are close / similar to each other
- members of different clusters are dissimilar
Usually
- points are in high-dimensional space
- similarity defined by distance measure (e.g. EUklidean)
Clustering is a hard problem: many applications involve 10 or 10000 dimensions, and distances in high-dimensional spaces are at similar distance

Cluster Calender View

Time series clustered by similarity (K-means)
Cluster affiliation of daily pattern shown in calendar

Density-based Clustering (DBSCAN)

Identify dense regions in data
Clusters can be arbitrarily shaped
Difficult to find good parameter settings
2 parameters: Radius epsilon and Number of Minimum Points MinPts
core point: epsilon-neighborhood contains at least minimum number (MinPts) of points
border point: in the epsilon-neighborhood of core point
noise: neither a core object nor a border object

Dimensionality reduction

Derive low-dimensional target space from high-dimensional measured space
use when you can’t directly measure what you care about
- True dimensionality of dataset assumed to be smaller than dimensionality of measurements
- Latent factors, hidden variables

Principal Component Analysis (PCA)

Find directions of largest variance
neglect directions of small variance (not descriptive)
Coordinate system transformation (rigid rotation)
Result:
- new axes (eigenvectors) & explained variances (eigenvalues)
- New axes usually don’t mean anything physical

What is the main goal of visual exploration?

start with unknown dataset -> don’t know which features are in the data set, we want to explore it to find some hidden / unknown information
Purpose: to generate new hypotheses -> which we can verify / reject later in visual analysis step

Which three major areas / concepts are combined in visual analysis / analystics?

Visualization (e.g. Information / Scientific Vis., Computer Graphics)
Interaction techniques (e.g. Human Computer interaction)

Data Analysis (e.g. machine learning, data mining)

-> Visual Analytics aims to combine those areas in different approaches

Give examples how multivariate data can be encoded in a spatial context?

2 ways to show multivariate data:
using attrubute views (scatterplots, parallel coordinates,…) that mainly focus on the attributes -> problem: don’t see spatial context in this examples
using volume rendering: show data frames in a spatial context using e.g. glyphs or layering

What are challenges when fusing multi-modal data stemming from different data sources?

here we usually have data coming from different data sources (e.g. tumor from MR scan, vessels from MRA scal, skull from CT scan…)
means data can be on various types of grids with different resolutions
challenge:
- what should be hidden and which data should be visible and shown to the user
- comparison of multiple modalities -> need different techniques for that

Visual data fusion intermixes data in a single visualization using a common frame of reference. Give at least two general approaches.

Layering techniques (e.g. glyphs, color, transparency)
Multi-volume rendering (coregistration, segmentation)
Helix glyphs

What are three general approaches for comparative visualization (according to the taxonomy of Gleicher et al. 2011)?

side-by-side comparison (juxtaposition)
Overlay in same coordinate system (superposition)
Explicit encoding of differences / correlations

What is focus+context visualization? Explain the general approach. How is it different from an overview+detail visualization?

idea: to combine both the important area / focus and the context information in one single visualization
- e.g. highlight with color, opacity / transparency, blurring, enlargement of focus
in overview + detail visualization the overview and the detail are shown next to each other (spatially separated)
- issue: user has to switch attention between the representations

Give at least three examples of visual channels (graphical resources) that can be used for focus+context discrimination.

color
style
frequency / blurring
opacity / transparency
fisheye views

Give two examples for focus+context visualization techniques which use spatial distortion.

What is the main idea in clustering? Is clustering a supervised or unsupervised method?

unsupervised method -> tries to find structures in the dataset by itself
given a dataset with data points and some idea between the distance between the data points, group them into clusters -> data points that are similar should be grouped together in a cluster, and not similar /dissimilar data points should be in different clusters

What is the main idea in dimensionality reduction? Name one example method? How does it work?

idea: if we have given some high dimensional data, then often it is possible to derive some low-dimensional target space where its easier to find the interesting features
use when you can’t directly measure what you care about
- true dimensionality of dataset assumed to be smaller than dimensionality of measurements
example method: principal component analysis

Principal component analysis transforms data from a cartesian coordinate system into another coordinate system. Why is it then still considered a dimensionality reduction method?

transformation of the coordinates system
since the new axis is more aligned with the variance it is easier to distinguish / find groups and thats why we can dismiss other principal components

Cartesian / equidistant grid

Samples at equidistant intervals along Cartesian coordinate axes
Neighboring samples are connected via edges
Cells formed by 4 (2D) or 8 (3D) samples
Cells and samples (grid vertices) are numbered sequentially with respect to increasing coordinates
It is a structured grid:
- Neighboring information (topology) is given implicitly
- Neighbors obtained by incrementing / decrementing indices
Structured Grids:
- Uniform / Regular grid: orthogonal and quidistant grid
- Rectilinear grid: varying sample-distances
- Curvilinear grid: non-orthogonal grid, Grid-points specified explicitly, implicit neighborhood relationship
Unstructured Grids: grid points and neighborhood specified explicitly; Cells: tetrahedra, hexahedra

Scattered Data

Grid-free data
data points given without neighborhood relationship
influence on neighborhood defined by spatial proximity
Scattered data interpolationIs

Beitreten

Vorschau

Author

Carina S.

Informationen

Zuletzt geändert
vor 3 Jahren

Kurs melden

Fragen

Author

Carina S.

Informationen