System Definiton
one or more components of differing nature
The individual components interact with one another via internal interfaces
fulfils a defined purpose by providing or executing functions
A system progresses through a life-cycle, from the development to the realisation, commissioning, operation until its disposal
In order to define a system, it must be delineated
system boundary separates the system and its components from the system environment
components interact via external interfaces with the system environment
The system environment is not part of the system
The components of a system can be systems themselves
They are then referred to as subsystems
The complete system is then also termed a System-of-Systems (SoS)
The separation can occur on the same level and/or in a hierarchical fashion
System Grenzen
System Structures
Complexity of Systems
no objective definition for complexity
some properties correlate with higher complexity
comparison of transistor count and size of instruction sets of various processors
Combinatorial explosion of the state space of comparatively small state machines executed in parallel
Failure
Any deviation of a function’s behaviour from its (intended) specification is a failure of the function or service
Failures are caused by errors in the components of a system
Error
An error is an internal system state that deviates from the expected state necessary to perform a function/service and always occurs at run time of the system
The occurrence of an error can lead to a propagation of further errors inside the system
When a propagating error reaches the system boundary it causes a failure
The reason for the occurrence of an error is a fault
Fault Classes
Development faults
Physical faults
Interaction faults
natural faults
malicious faults
operational faults
Fault
An active fault is a fault that causes an error during run time of the system
Otherwise the fault is still present in the system but is in a non-activated or dormant state (so-called dormant fault)
An external fault acts on components of a system from outside the system boundary
The external fault either by itself or by activating an existing internal fault, causes an error in the system
The internal fault is also termed a vulnerability in this context
occur during the development of a system
only affect physical components
include all external faults
Natural faults
arising from natural phenomena
Malicious faults
intentionally brought into a system
affect a system from outside in order to inhibit its operation
make it possible to gain control over it
Failure Modes
type of deviation from a specified service when a failure occurs
content failure
timing failure
silent failure
(in)consitsnt failures
magnitude of failure
Operational faults
faulty interactions of a user with the system
during run time
Deviation of the content of information
Deviation of the timing
complete absence of delivered information due to termination of the service
(in)consistent failure
Whether the failure is experienced by all users of the service in the same way (consistent failure) or in differing ways (inconsistent failures)
Propagation of Failure
system consisting of more than one component
When a component experiences a failure due to a fault, this can cause further failures in the dependant components
worst case, a chain reaction occurs that propagates to the system boundary, causing a system failure
Random failure
caused by a random fault
The probability of random faults and their associated random failures can be quantified within certain limits
A random fault is created and activated during run time of a system with a certain probability
Systematic failures
failures with deterministic causes
Systematic failures are caused by systematic faults
can be (theoretically) eliminated from a system altogether
probability for existence and activation of a systematic fault can not be quantified
Software hat nur systematische fehler
Definition of Dependability
Dependability of a system is the ability to avoid service failures that are more frequent and more severe than is acceptable.
Dependability is a collective term used to describe the availability performance and its influencing factors: reliability performance, maintainability performance and maintenance support performance.
Dependability is a property of a system that defines its resilience against (service) failures
The criteria for acceptability are formalised by dependability requirements for the system
Does a dependable System have failures?
yes
Dependability allows for the occurrence of failures, as long as they occur rarely enough and with only minor consequences
Attirubutes of Dependability
Reliability
Availability
Maintainability
Safty
Integrity
the probability that a system operates until a certain time point without experiencing a failure (i. e. the “survival probability”
the probability that a system can provide its service(s) at a certain time point (i. e. the probability of being “up and running”)
the probability that a renewal (i. e. a repair or replacement) is finished until a certain time point
Safety
but only in regard to failures!
protection from undetected alterations of information or structure
Fault Prevention/Avoidance
The introduction of faults into the system should be prevented
Achieved via constraints enforced on the development activities
Fault Removal
Existing faults should be detected and removed from the system
Achieved by performing verification activities on implementation artefacts
Fault Tolerance
The effects of residual faults during operation of the system should be controlled to prevent failures
methodology employed in the design and operation of a system in order to increase a system’s dependability when residual faults are present
Fault Forecasting
The effects of residual faults which are not tolerated should be estimated (and the consequences accepted)
Achieved by empirical observation of components and stochastic modelling of system structures or qualitative analysis of failure consequences
Residual faults
faults that were neither prevented nor removed during development of the system – their presence must be assumed for every nontrivial system!
Fault Tolerance Phases
Error Detection … Detecting the errors caused by residual faults via acceptance checks on the output/state of modules
Damage Assessment/Confinement … Assessing/limiting the extent of the corruption of the system state due to the detected errors
Error Recovery … Correcting the corrupted system state to arrive at a correct system state
Fault Treatment … Locating and deactivating the responsible fault(s) to prevent immediate recurrence of errors
Redundancy domians
Physical (or HW) redundancy, by adding physical components
Information redundancy, by adding information
Temporal redundancy, by repeating operations
Software redundancy, by adding SW components
Full Fault Tolerance
a service is provided according to its specification without any impairments at all
Partial Fault Tolerance
a service is provided in a degraded mode only, when a failure would otherwise occur
also known as Graceful Degradation
Fault/Error Injection
Faults/errors are deliberately introduced into the system
Error injection is used when injection of an actual fault is too expensive
Physical Fault Tolerance
most basic FT strategy and applicable to all systems
cost- and space-intensive
based on adding redundant physical components to a system
Homogeneous redundancy
components are replicated and differ only unintentionally
Inhomogeneous redundancy
components differ intentionally in some aspects
Standby System
Software Fault Tolerance
Homogeneous redundancy of SW is meaningless
SW only contains systematic faults
Trust in reused components is often unjustified (“Software of Unknown Pedigree” – SOUP)
N-Version Programming
Recovery Block
Signature-based Control Flow Monitor
Distributed System
system consisting of digital computers,called nodes, connected by a network that interact with one anotherto provide a set of services to its users
distinction from conventional system
Remoteness
Concurrency
Lack of global state
Partial failures
Asynchrony
Heterogeneity
Autonomy
Evolution
Mobility
Requirements for a Distributed System
Openness
Scalability
Security
Distribution Transparency
Interfaces for invoking services and communication between nodes are standardised and these standards are made available to the public
defined via Interface Description Languages (IDL)
requires portability, so that a service implementation can continue running even when the underlying components change
system can also handle increased workload
needs:
replication
hierachical structures for localisation and managing
Decentralisation of services
Replacing synchronous with asynchronous communication
Replacing discovery services based on broadcast communication with actual location services based on point-to-point communication that work over wide-area networks
enforcing policies over extensions and modifications
ensuring authentication and authorisation for different groups, also mobile
maintaining security for mobile users during location changes
ensuring CIA triade when using mobile code
ensuring availabity in the case of DoS attacks
hiding the physical or logical distribution of nodes and resources from the user so that the DS appears to them as a single, monolithic system
Access transparency
hides differences in the invocation of services of the DS and access to its resources
Location transparency
hides the actual physical location of a service or resource
Relocation/migration transparency
hides the effects of relocating a resource between components of the DS while they are accessed (relocation) or otherwise (migration)
Replication transparency
hides the replication of resources inside the DS for increasing performance and resilience against component failures
Failure transparency
hides the effects of failures and recoveries of components
of the DS
Persistence transparency
hides the persistence properties of a resource
Transaction transparency
hides interaction between components for achieving consistency when resources are modified by an invoked service
Concurrency transparency
hides the effects of simultaneous, competitive (i. e non-cooperating) access to shared resources by multiple users/services
Entities in a Distributed System
node
process
threads
object
single computer that executes a program
instantiation of a program at run time executed by an operating system (OS) running on a node
further subdivision of a process
share the virtual memory space of their parent process
virtual entity providing a set of methods that can be invoked by other objects and is realised by the processes/threads of a suitable runtime environment
Communication Paradigms
Direct communication
Indirect communication
the sender must know the receiver’s identity (and vice versa) and both must be active at the same time
RPC
Remote Procedure Call
A process invokes a procedure in a remote process
RMI
Remote Method Invocation
An object invokes a method of a remote object
sender and receiver need not know each other (decoupled in space) or need not be active at the same time (decoupled in time
Publish-Subcscribe
also called event-based, publishers generate events which are delivered to subscribers by an intermediate via notifications
Message Queues
Producers store messages in persistent message queues, from which they can be extracted by consumers at a later time
Shared Data Space
entities read from or write to a shared storage independently from one another
client
is an entity that invokes a service provided by a server by sending a request to the server’s interface and waiting for a reply (or response) from the server
server
A server is an entity that provides one or more services by waiting for a request from a client and, after processing it, answers with a reply (or response) back to the client
Tiered Architectur
Presentation
Application
Data
Presentation layer
performs visualisation and handles user interaction
Application layer
performs the core/business logic
Data layer
provides and manages data
Two-Tier architecture
layers are split up between a client and a server entity
Presentation layer and, optionally, part of the application layer on the client (so-called Thin Client), remaining layers on the server
Presentation and application and, optionally, part of the data layer on the client (so-called Fat Client), remaining parts of the data layer on the server
Three-Tier architecture
layers are split up between a client and two server entities,
usually in the following way:
Presentation layer on client
Application layer on an application server
Data layer on a database server
Peer-to-Peer
an alternative principle to client server architecture for providing decentralised services
Each entity provides the same services and implements the same interface the P2P models is therefore symmetric
Each entity participates in sharing the load – combined with an even distribution of peer entities over the network this avoids performance bottlenecks
Resources are distributed evenly (in the best case) over the entities, so localised failures have a smaller impact on the service
Middleware
facilitate achieving the requirements for a DS in a standardised way
Without middleware, each distributed application would need to achieve the requirements by itself, causing needless duplication of functionality and interoperability problems
Last changed9 days ago