Testing the Untestable

Buffl

Softwaretests

by Keyjus N.

Difficulty of Testing CPS

Usually, no useful requirements available
Usually, no clear oracle available
Usually, system operates in dynamic open context
Usually, huge dimensionality of input parameter space

Testing Cyber Physical Systems - Continous Controllers

X-in-the-Loop tests:

Model-in-the-Loop: Model of the system is run in the simulation environment
Software-in-the-Loop: Actual software is used; sometimes on the dedicated computation hardware
Hardware-in-the-Loop: Actual hardware is used, e.g. mechanical steering parts
(Vehicle-in-the-Loop for autonomous driving systems: The whole car is used)

Quality criteria for controller: precision, responsiveness, smoothness, stability, steadiness

-> Non-functional, not functional

Search-Based Test Case Generation

Search Space: “All possible input signals”

Fitness Function: “The bigger T_stable the higher the fitness value” (f = -T_stable, minimize)

Oracle: If biggest identified T_stable is bigger than some desired threshold, the system is faulty

Possible Fault Models: precision, responsivness, smoothness, …

Testing Controllers - Approach

Step 1:
- Partition input space into equally sized blocks + select n points per block
- Asses satisfaction of nun-functional requirements per point
  - have formal definitions
  - can be measured
- create heat map
Step 2:
- fine-grained AI search methods for selected blocks
- find global maximum of deviation of blocks
Example: Stability heat map of an AC

Testing Autonomous Driving - Goals, Components, Pitfall

Goals: Safe, comfortable, non-disruptive driving.
Components: Planning, situation understanding, perception.
Pitfalls: Random scenarios, reusing test cases, simulation limitations.

Defects in Machine-Learned Models

Metamorphic Testing

Addressing Oracle Problem: Metamorphic testing solves the oracle problem in testing ML models without clear oracles.
Pseudo-Oracle Approach: It creates metamorphic relations between input-output pairs to act as pseudo-oracles.
Derived Test Cases: Metamorphic relations enable generating new test cases from known ones, like deriving sin(𝑥 + 2𝜋) from sin 𝑥.

Advantages and Disadvantages of metamophic testing

Advantages:
- Alleviates the test oracle problem
- Works for machine learned and not machine learned programs
- Less labeled test data is needed → cost effective
Disadvantages:
- Domain knowledge needed for identifying metamorphic relations
- Different metamorphic relations for different domains (image, lidar, radar, language, search engines, …)

Robustness Testing in ML - Symbolic Execution

Translate Neural Network:
- Convert neural network into imperative program.
Identify Key Pixels:
- Find important pixels in labeled image.
- Calculate pixel importance through coefficienst of input pixels
- Sort pixels by importance.
Create Adversarial Image:
- Craft new image from original.
- Select high-importance pixels (1- or 2-pixel attacks) and make them symbolic
- Formulate constraint problem for symbolic values

-> e.g. Pixel Attack

-> similar to Fuzzing

Robustness Testing in ML: Neuron Coverage/Fuzzing

Tries to mimic coverage-based testing

Coverage metrics:

Neuron Coverage: Ratio of activated neurons by total neurons in Network
K-Multisector Neuron Coverage: For each neuron partition the output values in k sections
Neuron Boundary Coverage: How many corner-case regions have been covered

Watchdogs

Watchdogs monitor system behavior and intervene in case of unwanted actions
system test level
very abstract -> only used in fail safe cases
also used as runtime monitor during production

Quality criteria for controller

precision
responsiveness
smoothness
stability
steadiness

-> Non-functional, not functional!

Generic Algorithms - Structure

Gene: one dimension of the search space
Individual (Test Case): Consists of genes for each dimension of the search space
-> concrete value for each gene
Population: Consists of individuals

Genetic Algorithms - Process

Compute fitness for all individuals in population (= run all test cases in population)
Keep the best individuals
replace bad individuals by re-combining individuals according to evolutionary heuristic

-> Always converges, but not necessarily to global optimum

SBST with CBS - (Dis-)advantages

Can cope with high dimensionality
does not guarantee best/worst cases
need methodological guidance for fitness functions (templates
Test can generally not be re-used

Testing challenges in Machine-Learned Models

Neural Network no clear specification
Functional test hard to specify for neural networks
Adversarial examples will always be a problem
Insufficient labeled data

Robustness Testing - DeepGini

Used to prioritize inputs for manual labeling
Prioritization according to class probability output
-> likelihood of missclassification highest -> manual labeling
Uses Gini index

-> Better result than neuron coverage metrics

finds more failed test with this priotization
metric calculation faster (than computing + comparing covareg metrics)

Join Course

Preview

Author

Keyjus N.

Information

Last changed
2 years ago

Report course