undefined

Buffl

Embedded Systems VO

von Michael M.

What are advantages and drawbacks of centralized embedded systems?

Advantages:

-> high performance and security

-> suitable for small-scale systems

-> simple programming model

drawbacks:

-> doesnot scale well

-> confined location of components

What are advantages and drawbacks of decentralized embedded systems?

Advantages:

-> HIgher flexability

-> easier cabling than in centralized systems

drawbacks:

-> one single control unit does not scale well

-> lower performance than centralized systems

-> Reliability and bandwidth of communication network?

What are advantages and drawbacks of distributed systems? How does it look like?

Distributed systems introduce more control units, where each of them is directly connected to sime components (sensors and actuators). The system is built by connecting nodes that cooperate and provie services for each other.

Advantages:

-> High performance

-> High scalability

-> more flexible reconfiguration

-> improved debugging (use control unit on the network to debug and diagnose)

-> Modularity

-> physical distribution (allows placement of computing power near the occuring events)

-> easier maintainability

Drawback:

-> Complexity

-> network may be unreliable (bandwidth is not infinite)

-> latency of communication is not zero

-> dealing with faults gets difficult

-> mutual exclusion to a shared resource may cause problems

What is are some definitions for distributed systems?

-> A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another, in order to achieve a common goal.

-> Multiple computers interconnected by a network that share some common state and that cooperate to achieve some common goal

What are some examples for distributed systems?

-> modern cars

-> underwater robots

-> city-wide air quality measurement

-> wireless home automation systems

What is a big challenge with distributed systems regarding clocks? What are clock skew and clock drift?

Each node has its own clock. Due to different osciallators with manufactoring variations, temperature changes or other effects the frequency of each oscillator may vary over time.

-> clocks may run at different rates than a reference clock

Clock skew: difference between the reading of two clocks

clock drift: relative difference in clock frequency

Why is time synchronization needed in distributed systems?

-> Determine the correct temporatl ordering of events

-> serialization of concurrent access to shared objects/resources

-> synchronization between senders and receivers of messages

What is election in a distributed system? What is important?

Embedded systems often have redundant components (primary + backup) -> how to determine which one is primary and wich ones are backups? -> Who is the leader?

Election! -> Embedded systems are often self organized.

Example: Nodes in a sensory network may…

-> need to elect a new sink, when previous one fails

-> need to elect a new node as cluster head (better load balancing, better coverage, mobility afects range)

Important:

New leader elected -> did everyone agree? Was everyone correctly informed about this?
Only one leader should exist!

Which nodes are called “byzantine”?

-> Those whose communication channel is unreliable

-> Those whose latency is not zero

-> Those who have a large clock drift

-> Those who can have an arbitrary behaviour

-> None of the previous choices

-> Those who can have an arbitrary behaviour

What are the underlying causes of the classical two generals’ problem?

-> The unreliability of the communication medium

-> The generals do not trust each other

-> The communication latency is not zero

-> The generals do not speak the same language

-> The communication between the generals is asynchronous

-> The two generals are unable to send acknowledgements

-> The unreliability of the communication medium

-> The communication latency is not zero

-> The communication between the generals is asynchronous

Which of these are NOT advantages of a distributed embedded architecture?

-> Easier error identification

-> Independence of failures among nodes

-> Speeding-up of the data processing

-> Simpler programming model

-> Error-containment within nodes

-> Easier node replacement

-> Simpler programming model

How to deal with clock drift?

Do NOT set time back-/forward! Gradual clock correction until the clock is synchronized again!

What is external clock synchronization?

It is the process of synchroizing to a well-known clock external to the system e.g. a time server or a machine with an accurate (e.g. atomic) time source.

Explain Christian’s algorithm for external time synchronization. How does the accuracy compute?

Christian’s algorithm: It measures the round-trip-time (RTT) of the message exchange and estimates the delay in each direction. The RTT is the time it takes for a message to travel from the cliet to the server and back. It assumes, that network delays are symmetric (which is often not the case in the real world, but often close enough).

The new time computes as follows:

The accuracy can be calculated with Tmin, which is the minimumum message transit time.

How does Berkeley algorithm work? What is it used for?

It is used for internal synchronization. It assumes that no machine has an accurate time source

1.) Master polls each slave periodically asking for their time

2.) When results are inm compute average

3.) Send offset by which each clock needs to be adjusted

-> It also excludes faults from the time average! -> in examplel, C was exclude in the averaging

What is the Network Time Protocol (NTP) and how does it work?

It is a protocol for external time synchronization. It is intended to synchronize all participating compters to within a few milliseconds to universal time (UTC). It uses a hierarchial structure of servers and clients where each server has a stratum level that indicates its instance from a reference clock.

-> The nodes send and receive time stamps via UDP, because TCP retransmission would add variable delays.

-> messages contain slots for four timeslots

What is the difference between Christian’s algorithm and NTP?

In NTP, the server also sends the times when it received and sent the message. This helps the client to adjust its clock more accurately.

When NTP is used on WAN it usually achieves a milliseconf-accuracy, as it can not mitigate significant MAC-layer delays.

When synchronizing over LAN, one can achieve even better synchronization due performing timestamping on MAC-layer.

Calculate RTT and offset in the following problem with NTP

What is the Precision Time Protocol (PTP)?

-> designed to synchronize clocks o local area network

-> follower device synchronized to a reference devie called PTP Grandmaster

-> achieve synchronization of +-400ns! -> NTP on LAN is only 1-2ms

C’ is inconsistent -> because e33 preceed e31 and is not in the cut

C’’ is inconsistent -> because e34 preceed e22 and is not in the cut

C’’’ is consistent

What is a logical clock? How does it work? What is it used for?

It is used to construct a consistent global state. It assigns timestamps to events that are not absolute, but relative to each other.

Each process pi keeps a local counter (clock) LCi initialized at 0. LCi counts how many events in a distributed computation causally preceded the current event at pi.

Rules for updating LCi:

pi increments LCi when executing an instruction or a send operation
Every message sent carries its timestamo TS
Whenever pi receives a message m carrying TS, it computes LCi = max(LCi,TS)+1

Example:

What are limitations of logical clocks? How can it be solved?

-> One can not distinguish if two events are concurrent or causally-related using logical clocks

-> Can be solved by vector clocks!

example: p0 is an exxternal monitor that tries to build a consitent global state

e11 and e22 are concurrent (no causal path between them)

e11 and e22 are causally related -> BUT -> the timestamps at p0 have not changed!

How do vector clocks (VC) work? What are the rules for updating the VC?

Each process pi uses a vector VCi [1….N] of integer clocks, where N is the number f processes in the system. Each entry of a vector clock corresponds to a process. The j-th ekement of VCi e.g. VCi(j) holds the number of events that process i has observed from process i. Thus, Vi[j] is the logical clock value of process j’s events that process i is aware of.

Rules:

How can one distinguish between causal precedence and concurreny with VC?

How can one determine if a cut is consistent with VC?

A cut is consistent, if there are no pairwise inconsistent events.

How do you build a consistent global state?

passive monitoring -> each process pi sends to p0 a timestamp of each event it executed (e.g. VC timestamps)

active monitoring -> when p0 wants to find out the system’s state, it asks all processes pi to send their history, it builds a distributed snapshot (also called Chandy-Lamport algorithm) -> used to record a consisten global state for an asynchronous system

What are the steps in the Chandy-Lamport algorithm?

1.) A monitor p0 sends a “Take snapshot” message to all processes

2.) If processor pi receives such messsage for the first time, it

records its state and stops any distributed computation activity
relays the “take snapshot” message on all of its outgoing channels
starts recording a state of its incoming channels

3.) when pi receives a “take snapshot” message from process pj, it stops recording the state of the channel between itself and pj

4.) When pi received a “take snapshot” message from all processes and from p0, it stops recording the snapshot and sends it to p0

This algorithm only constructs consisten snapshots!

Distributed mutual exclusion - what is the problem?

-> Ensure that only one process at a time executes a critical section

-> processes communicate with each other only using messages

What are requirements for the solution to the problem of distributed mutual exclusion? What makes a good solution?

-> Safety: at most 1 process executes a critical section at any time

-> Liveness: every request for a critical section is eventually granted

-> Ordering: requests are granted in the order they were made

Good solutions:

-> Number of messages sent: to acquire access, to release access

-> delay: to acquire access, ro release access

-> throughput: number of operations per second

How can distributed mutual exclusion algorithms be classified?

-> permission-based: a process that wants to access a shared resource requests permission from one or more coordinators

-> token-based: each shared resource has a token. The token is circulated among all processes and only a process that holds the token can access the resource.

How does a centralized permission-based algorithm work? What are advantages and drawbacks of this method?

One process is elected as coordinator for the resource. If a process e.g. P1 wants access to resource it send a request to the coordinator. If the resource is free, the coordinator sends “grant”, otherwise the requesting process is put into a queue and the coordinator does not respond. When P1 finishes using the resource, it is removed as current owner and the coordinator sends “grant” to the first process in the queue.

Advantage:

-> guarantees mutual exclusion

-> simple to implement

drawbacks:

-> fault tolerance: a centrlized is vulnerable to a single point failure!

-> poor performance: a centralized algorithm may be a bottleneck (e.g. if the coordinator is overwhelmed with requests)

How does a decentralized permission-based algorithm work? What are advantages and drawbacks of this method?

This method does not depend on a single coordinator, but on n-coordinators where each has a replica of the resource. When a process wants access to a resource it needs to get the majority vote from m > n/2 coordinators (“permission granted”). If a coordinator has already voted it will send a “permission-denied” message. The coordinators answer concurrently “access granted or denied”.

-> one can also choose m > n/2+x, eith x=1,2,3… to increase fault tolerance

Advantages:

-> no central bottleneck

-> improved performance

Drawbacks:

-> overhead: needs more votes -> more messages to be sent

-> mutual exclusion can not deterministitically guaranteed, but can be probabilistically guaranteed (The probability of violating the mutual exclusion in praxis is usually very small ~10^(-40))

How does the Ricart & Agrawala’s distributed algorithm work? What are advantages and drawbacks of this method?

When a process wants to enter a critical section (CS) it broadcasts a message with (CS_name, process_id, and current_time) to alls processes including itself. After sending the message it waits for an OK from all other processes -> only upon receiving all OKs it enters the CS.

If two resources want to access the resource at the same time, the process with the smaller time-stamp wins and gets access to the resource. If a process is currently in a CS it does not send an OK and does not reply, queueing the request and sending the OK to all processes in the queue after finishing the CS.

Advantages:

-> no central bottleneck

-> improved bottleneck

-> fewer messages than decentralized algorithm

Drawbacks:

-> The system is exposed to n points of failure

-> If a node fails to respond, the entire system locks up

How does the token-ring algorithm work? What are advantages and drawbacks of this method?

Each resource is associated with a token. One then builds a logical ring in software out of an unordered group of processes in the network. The token is then circulated and the process that currently holds the token can access the resource. If the process does not want to access the resource or finished using the resource, the token is passed to the next process in the ring.

Advantages:

-> provides deterministic mutual exclusion

-> avoids starvation (each process will receive the token!)

Drawbacks:

-> high message overhead (fast circulation odf the token if it is not needed)

-> If the token is lost it must be regenerated! -> hard to detect token losses

-> dead processes must be purged from the ring!