Difficulty of Testing CPS
Usually, no useful requirements available
Usually, no clear oracle available
Usually, system operates in dynamic open context
Usually, huge dimensionality of input parameter space
Testing Cyber Physical Systems - Continous Controllers
X-in-the-Loop tests:
Model-in-the-Loop: Model of the system is run in the simulation environment
Software-in-the-Loop: Actual software is used; sometimes on the dedicated computation hardware
Hardware-in-the-Loop: Actual hardware is used, e.g. mechanical steering parts
(Vehicle-in-the-Loop for autonomous driving systems: The whole car is used)
Quality criteria for controller: precision, responsiveness, smoothness, stability, steadiness
-> Non-functional, not functional
Search-Based Test Case Generation
Search Space: “All possible input signals”
Fitness Function: “The bigger T_stable the higher the fitness value” (f = -T_stable, minimize)
Oracle: If biggest identified T_stable is bigger than some desired threshold, the system is faulty
Possible Fault Models: precision, responsivness, smoothness, …
Testing Controllers - Approach
Step 1:
Partition input space into equally sized blocks + select n points per block
Asses satisfaction of nun-functional requirements per point
have formal definitions
can be measured
create heat map
Step 2:
fine-grained AI search methods for selected blocks
find global maximum of deviation of blocks
Example: Stability heat map of an AC
Testing Autonomous Driving - Goals, Components, Pitfall
Goals: Safe, comfortable, non-disruptive driving.
Components: Planning, situation understanding, perception.
Pitfalls: Random scenarios, reusing test cases, simulation limitations.
Defects in Machine-Learned Models
Metamorphic Testing
Addressing Oracle Problem: Metamorphic testing solves the oracle problem in testing ML models without clear oracles.
Pseudo-Oracle Approach: It creates metamorphic relations between input-output pairs to act as pseudo-oracles.
Derived Test Cases: Metamorphic relations enable generating new test cases from known ones, like deriving sin(𝑥 + 2𝜋) from sin 𝑥.
Advantages and Disadvantages of metamophic testing
Advantages:
Alleviates the test oracle problem
Works for machine learned and not machine learned programs
Less labeled test data is needed → cost effective
Disadvantages:
Domain knowledge needed for identifying metamorphic relations
Different metamorphic relations for different domains (image, lidar, radar, language, search engines, …)
Robustness Testing in ML - Symbolic Execution
Translate Neural Network:
Convert neural network into imperative program.
Identify Key Pixels:
Find important pixels in labeled image.
Calculate pixel importance through coefficienst of input pixels
Sort pixels by importance.
Create Adversarial Image:
Craft new image from original.
Select high-importance pixels (1- or 2-pixel attacks) and make them symbolic
Formulate constraint problem for symbolic values
-> e.g. Pixel Attack
-> similar to Fuzzing
Robustness Testing in ML: Neuron Coverage/Fuzzing
Tries to mimic coverage-based testing
Coverage metrics:
Neuron Coverage: Ratio of activated neurons by total neurons in Network
K-Multisector Neuron Coverage: For each neuron partition the output values in k sections
Neuron Boundary Coverage: How many corner-case regions have been covered
Watchdogs
Watchdogs monitor system behavior and intervene in case of unwanted actions
system test level
very abstract -> only used in fail safe cases
also used as runtime monitor during production
Quality criteria for controller
precision
responsiveness
smoothness
stability
steadiness
-> Non-functional, not functional!
Generic Algorithms - Structure
Gene: one dimension of the search space
Individual (Test Case): Consists of genes for each dimension of the search space
-> concrete value for each gene
Population: Consists of individuals
Genetic Algorithms - Process
Compute fitness for all individuals in population (= run all test cases in population)
Keep the best individuals
replace bad individuals by re-combining individuals according to evolutionary heuristic
-> Always converges, but not necessarily to global optimum
SBST with CBS - (Dis-)advantages
Can cope with high dimensionality
does not guarantee best/worst cases
need methodological guidance for fitness functions (templates
Test can generally not be re-used
Testing challenges in Machine-Learned Models
Neural Network no clear specification
Functional test hard to specify for neural networks
Adversarial examples will always be a problem
Insufficient labeled data
Robustness Testing - DeepGini
Used to prioritize inputs for manual labeling
Prioritization according to class probability output
-> likelihood of missclassification highest -> manual labeling
Uses Gini index
-> Better result than neuron coverage metrics
finds more failed test with this priotization
metric calculation faster (than computing + comparing covareg metrics)
Last changeda year ago