What are advantages and drawbacks of centralized embedded systems?
Advantages:
-> high performance and security
-> suitable for small-scale systems
-> simple programming model
drawbacks:
-> doesnot scale well
-> confined location of components
What are advantages and drawbacks of decentralized embedded systems?
-> HIgher flexability
-> easier cabling than in centralized systems
-> one single control unit does not scale well
-> lower performance than centralized systems
-> Reliability and bandwidth of communication network?
What are advantages and drawbacks of distributed systems? How does it look like?
Distributed systems introduce more control units, where each of them is directly connected to sime components (sensors and actuators). The system is built by connecting nodes that cooperate and provie services for each other.
-> High performance
-> High scalability
-> more flexible reconfiguration
-> improved debugging (use control unit on the network to debug and diagnose)
-> Modularity
-> physical distribution (allows placement of computing power near the occuring events)
-> easier maintainability
Drawback:
-> Complexity
-> network may be unreliable (bandwidth is not infinite)
-> latency of communication is not zero
-> dealing with faults gets difficult
-> mutual exclusion to a shared resource may cause problems
What is are some definitions for distributed systems?
-> A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another, in order to achieve a common goal.
-> Multiple computers interconnected by a network that share some common state and that cooperate to achieve some common goal
What are some examples for distributed systems?
-> modern cars
-> underwater robots
-> city-wide air quality measurement
-> wireless home automation systems
What is a big challenge with distributed systems regarding clocks? What are clock skew and clock drift?
Each node has its own clock. Due to different osciallators with manufactoring variations, temperature changes or other effects the frequency of each oscillator may vary over time.
-> clocks may run at different rates than a reference clock
Clock skew: difference between the reading of two clocks
clock drift: relative difference in clock frequency
Why is time synchronization needed in distributed systems?
-> Determine the correct temporatl ordering of events
-> serialization of concurrent access to shared objects/resources
-> synchronization between senders and receivers of messages
What is election in a distributed system? What is important?
Embedded systems often have redundant components (primary + backup) -> how to determine which one is primary and wich ones are backups? -> Who is the leader?
Election! -> Embedded systems are often self organized.
Example: Nodes in a sensory network may…
-> need to elect a new sink, when previous one fails
-> need to elect a new node as cluster head (better load balancing, better coverage, mobility afects range)
Important:
New leader elected -> did everyone agree? Was everyone correctly informed about this?
Only one leader should exist!
Which nodes are called “byzantine”?
-> Those whose communication channel is unreliable
-> Those whose latency is not zero
-> Those who have a large clock drift
-> Those who can have an arbitrary behaviour
-> None of the previous choices
What are the underlying causes of the classical two generals’ problem?
-> The unreliability of the communication medium
-> The generals do not trust each other
-> The communication latency is not zero
-> The generals do not speak the same language
-> The communication between the generals is asynchronous
-> The two generals are unable to send acknowledgements
Which of these are NOT advantages of a distributed embedded architecture?
-> Easier error identification
-> Independence of failures among nodes
-> Speeding-up of the data processing
-> Simpler programming model
-> Error-containment within nodes
-> Easier node replacement
How to deal with clock drift?
Do NOT set time back-/forward! Gradual clock correction until the clock is synchronized again!
What is external clock synchronization?
It is the process of synchroizing to a well-known clock external to the system e.g. a time server or a machine with an accurate (e.g. atomic) time source.
Explain Christian’s algorithm for external time synchronization. How does the accuracy compute?
Christian’s algorithm: It measures the round-trip-time (RTT) of the message exchange and estimates the delay in each direction. The RTT is the time it takes for a message to travel from the cliet to the server and back. It assumes, that network delays are symmetric (which is often not the case in the real world, but often close enough).
The new time computes as follows:
The accuracy can be calculated with Tmin, which is the minimumum message transit time.
How does Berkeley algorithm work? What is it used for?
It is used for internal synchronization. It assumes that no machine has an accurate time source
1.) Master polls each slave periodically asking for their time
2.) When results are inm compute average
3.) Send offset by which each clock needs to be adjusted
-> It also excludes faults from the time average! -> in examplel, C was exclude in the averaging
What is the Network Time Protocol (NTP) and how does it work?
It is a protocol for external time synchronization. It is intended to synchronize all participating compters to within a few milliseconds to universal time (UTC). It uses a hierarchial structure of servers and clients where each server has a stratum level that indicates its instance from a reference clock.
-> The nodes send and receive time stamps via UDP, because TCP retransmission would add variable delays.
-> messages contain slots for four timeslots
What is the difference between Christian’s algorithm and NTP?
In NTP, the server also sends the times when it received and sent the message. This helps the client to adjust its clock more accurately.
When NTP is used on WAN it usually achieves a milliseconf-accuracy, as it can not mitigate significant MAC-layer delays.
When synchronizing over LAN, one can achieve even better synchronization due performing timestamping on MAC-layer.
Calculate RTT and offset in the following problem with NTP
What is the Precision Time Protocol (PTP)?
-> designed to synchronize clocks o local area network
-> follower device synchronized to a reference devie called PTP Grandmaster
-> achieve synchronization of +-400ns! -> NTP on LAN is only 1-2ms
C’ is inconsistent -> because e33 preceed e31 and is not in the cut
C’’ is inconsistent -> because e34 preceed e22 and is not in the cut
C’’’ is consistent
What is a logical clock? How does it work? What is it used for?
It is used to construct a consistent global state. It assigns timestamps to events that are not absolute, but relative to each other.
Each process pi keeps a local counter (clock) LCi initialized at 0. LCi counts how many events in a distributed computation causally preceded the current event at pi.
Rules for updating LCi:
pi increments LCi when executing an instruction or a send operation
Every message sent carries its timestamo TS
Whenever pi receives a message m carrying TS, it computes LCi = max(LCi,TS)+1
Example:
What are limitations of logical clocks? How can it be solved?
-> One can not distinguish if two events are concurrent or causally-related using logical clocks
-> Can be solved by vector clocks!
example: p0 is an exxternal monitor that tries to build a consitent global state
e11 and e22 are concurrent (no causal path between them)
e11 and e22 are causally related -> BUT -> the timestamps at p0 have not changed!
How do vector clocks (VC) work? What are the rules for updating the VC?
Each process pi uses a vector VCi [1….N] of integer clocks, where N is the number f processes in the system. Each entry of a vector clock corresponds to a process. The j-th ekement of VCi e.g. VCi(j) holds the number of events that process i has observed from process i. Thus, Vi[j] is the logical clock value of process j’s events that process i is aware of.
Rules:
How can one distinguish between causal precedence and concurreny with VC?
How can one determine if a cut is consistent with VC?
A cut is consistent, if there are no pairwise inconsistent events.
How do you build a consistent global state?
passive monitoring -> each process pi sends to p0 a timestamp of each event it executed (e.g. VC timestamps)
active monitoring -> when p0 wants to find out the system’s state, it asks all processes pi to send their history, it builds a distributed snapshot (also called Chandy-Lamport algorithm) -> used to record a consisten global state for an asynchronous system
What are the steps in the Chandy-Lamport algorithm?
1.) A monitor p0 sends a “Take snapshot” message to all processes
2.) If processor pi receives such messsage for the first time, it
records its state and stops any distributed computation activity
relays the “take snapshot” message on all of its outgoing channels
starts recording a state of its incoming channels
3.) when pi receives a “take snapshot” message from process pj, it stops recording the state of the channel between itself and pj
4.) When pi received a “take snapshot” message from all processes and from p0, it stops recording the snapshot and sends it to p0
This algorithm only constructs consisten snapshots!
Distributed mutual exclusion - what is the problem?
-> Ensure that only one process at a time executes a critical section
-> processes communicate with each other only using messages
What are requirements for the solution to the problem of distributed mutual exclusion? What makes a good solution?
-> Safety: at most 1 process executes a critical section at any time
-> Liveness: every request for a critical section is eventually granted
-> Ordering: requests are granted in the order they were made
Good solutions:
-> Number of messages sent: to acquire access, to release access
-> delay: to acquire access, ro release access
-> throughput: number of operations per second
How can distributed mutual exclusion algorithms be classified?
-> permission-based: a process that wants to access a shared resource requests permission from one or more coordinators
-> token-based: each shared resource has a token. The token is circulated among all processes and only a process that holds the token can access the resource.
How does a centralized permission-based algorithm work? What are advantages and drawbacks of this method?
One process is elected as coordinator for the resource. If a process e.g. P1 wants access to resource it send a request to the coordinator. If the resource is free, the coordinator sends “grant”, otherwise the requesting process is put into a queue and the coordinator does not respond. When P1 finishes using the resource, it is removed as current owner and the coordinator sends “grant” to the first process in the queue.
Advantage:
-> guarantees mutual exclusion
-> simple to implement
-> fault tolerance: a centrlized is vulnerable to a single point failure!
-> poor performance: a centralized algorithm may be a bottleneck (e.g. if the coordinator is overwhelmed with requests)
How does a decentralized permission-based algorithm work? What are advantages and drawbacks of this method?
This method does not depend on a single coordinator, but on n-coordinators where each has a replica of the resource. When a process wants access to a resource it needs to get the majority vote from m > n/2 coordinators (“permission granted”). If a coordinator has already voted it will send a “permission-denied” message. The coordinators answer concurrently “access granted or denied”.
-> one can also choose m > n/2+x, eith x=1,2,3… to increase fault tolerance
-> no central bottleneck
-> improved performance
Drawbacks:
-> overhead: needs more votes -> more messages to be sent
-> mutual exclusion can not deterministitically guaranteed, but can be probabilistically guaranteed (The probability of violating the mutual exclusion in praxis is usually very small ~10^(-40))
How does the Ricart & Agrawala’s distributed algorithm work? What are advantages and drawbacks of this method?
When a process wants to enter a critical section (CS) it broadcasts a message with (CS_name, process_id, and current_time) to alls processes including itself. After sending the message it waits for an OK from all other processes -> only upon receiving all OKs it enters the CS.
If two resources want to access the resource at the same time, the process with the smaller time-stamp wins and gets access to the resource. If a process is currently in a CS it does not send an OK and does not reply, queueing the request and sending the OK to all processes in the queue after finishing the CS.
-> improved bottleneck
-> fewer messages than decentralized algorithm
-> The system is exposed to n points of failure
-> If a node fails to respond, the entire system locks up
How does the token-ring algorithm work? What are advantages and drawbacks of this method?
Each resource is associated with a token. One then builds a logical ring in software out of an unordered group of processes in the network. The token is then circulated and the process that currently holds the token can access the resource. If the process does not want to access the resource or finished using the resource, the token is passed to the next process in the ring.
-> provides deterministic mutual exclusion
-> avoids starvation (each process will receive the token!)
-> high message overhead (fast circulation odf the token if it is not needed)
-> If the token is lost it must be regenerated! -> hard to detect token losses
-> dead processes must be purged from the ring!
Which algorithm always suffers from a very large number of messages exchanged?
-> Centralized Mutual Exclusion
-> Decentralized Mutual Exclusion
-> Distributed Mutual Exclusion
-> Token Ring for Mutual Exclusion
Which algorithm is the most robust to node failure?
Which algorithm is the most efficient? (i.e., sends the least number of messages)
Which algorithm is the easiest to implement?
Which of these problems are NOT related to the clock drift between distributed embedded nodes?
-> Determining the correct ordering of events
-> Executing several independent tasks on multiple control units
-> Detecting and finding a fault
-> Serialization of concurrent access to shared resources
-> Localization of distributed embedded systems
-> Coordination of joint activity across multiple robots
What is the consensus problem?
It is a fundamental challenge in distributed embedded systems, where a group of ndoes has to agree on a common value or action, despite the possibility of communication failures or node failures. Important for many applications that require coordination, synchronization or consistency among multiple nodes.
How can consensus be defined? Can it be achieved?
-> agreement: alls nodes decide on the same value or actions
-> validity: the decided value or action must be the input of at least one node
-> termination: every non-faulty node eventually comes to a decision
Achieving consenus is provably impossible, as any message including the last one could be lost.
What can be done to increase the probability of achieving consensus?
-> one could simply send more messages -> expensive and undesirable
-> relax the assumotions and change the model
-> Consider bounded delays
BUT -> still no guarantee to achieve consensus
What typer of node failures are known to you?
-> crash failure: the node abrubtly stps responding to oher nodes
-> byzantine failure: The node can have an arbitrary behavior, such as sending meaningless or contrictadory data
What is a f-resilient consensus algorithm
It means that the algorithm ensures that f of the n nodes fail (so n-f nodes are correct) the algorithm solves the consensus.
Assume that in a system all n tasks are synchronized and communicate reliable (within certain bounds). Exists an algorithm where after f+1 rounds, where f is the number of crashed nodes out of n, all faulty processes have received consensus? How does it work?
Yes, such an algorithm exists.
It works as folows
1.) each node pi sends its value vi
2.) at round 1 each pi sends its vi to all other nodes
3.) at round r > 1 each node pi sends all the new values that it received since the previous round to all other nodes
4.) at round f+1 each node pi deceides on the minimum value among all the values that it received
Can be proofed by contradiction
How can consensus with byzantine nodes be achieved? Does such an algorithm for f >= n/3 exist? What are advantages and drawbacks of this method?
There exists no algorithm for f >= n/3. However, there exists an algorithm where consensus with byzantine nodes can be achieved if f < n/3.
How the algorithm works:
1.) exchange all information for f+1 rounds
2.) ignore all processes that provided omconsistent information
3.) Let all processes decide based on the same input
-> It works for n > 3*f, which is optimal
-> it only take f+1 rounds
-> It works for any input and not just binary input
->The size of messages increases exponentially
-> There are algorithms that solve the problem with a lower amount of messages
Is achieving consensus in asynchronous systems possible?
FLP (Fischer-Lynch-Paterson) demonstrated in 1985 that no consensus can be guaranteed in an asynchronous communication system in the presence of any failure. This DOES NOT mean, that consensus can’t be reached!
In practice one can
-> use of reliable failure detectors
-> use of timeouts: if no progress is being made on deciding the next value, we wait until timeout, then start all steps all over again
-> Always a trade-off of safety and liveness
Consensus is impossible with byzantine failures if how many processes fail?
-> Consensus is never possible with byzantine failures
-> 1/3 of the amount of processes (or more)
-> Consensus is always possible despite byzantine failures
-> None of the previous answers is correct
A distributed consensus algorithm needs to satisfy the safety property. What does ‘safety’ refer to, exactly?
-> If the system runs long enough, all non-faulty nodes will eventually decide on the same value
-> Guarantee that two non-faulty nodes do not agree on an incorrect value
-> Guarantee that a node does not cause harm to the other nodes
-> At most one node executes a critical section at any time
-> Two non-faulty nodes will never decide on different values
What is the CAP-theorem?
CAP: Consistency Availability Partition tolerance
The theorem states:
It is impossible for a distributed system to simultaneously provide all 3 parts of CAP, only two at a time.
E.g.:
Amazon, Google -> availability is more important -> better latency -> more profit
Bank, flight booking -> consistency more important
What are the three steps in election algorithms?
1.) Initiation: one or more processes see that the leader is not responding and start sending messages to other processes to start the election
2.) Election: Each process vompares its own attribute value (e.g. ID, CPU speed, battery…) and decides which node is suited the best and chooses a leader
3.) Termination: all processes agree on who is the leader
How does the bully algorithm to elect a new leader work? What are its advantages and drawbacks?
Assumption: every process knows the ID of every other process, but does not know which one has crashed
A processor Pi notices that the existing leader does not respond. It initiates the election algorithm as follows:
1.) Pi sends an “election” message to all processes with higher IDs than itself
2.) when a process Pj with j>i receives the message it answers with “take-over”
-> Pi does no longer contest in the election
-> Process Pj start again at 1.)
3.) if no one responds, Pi wins the election Pi sends “coordinator” message to every process
-> Simple and easy to implement
Disadvantage:
-> large message overhead (N-1)*N/2 (N…number of processes), espescially in worst-case when the process with the lowest ID detects the failure of the leader
-> complexity of algorithm is O(N^2)
How does the ring algorithm to elect a new leader work? What are its advantages and drawbacks?
Assumption: Ring topology around the processes and every process knows its predecessor and successor in the ring.
1.) Pi “builds” an election message (E) inserting irs ID into it and sends a message to the next node
2.) When process Pj receives the message, it appends its ID and forwards the message
-> if the next node is crashed, it finds the next alive node
3.) When the message gets back to the process Pi that started the election
-> Pi elects the process with the highest ID as coordinator
-> Pi sends a message type “coordinator” with the ID around the ring
Two rounds in the ring are needed to elect a new leader!
-> more efficient than bully-algorithm
-> requires ring topology, not easy to repair and maintain
What happens when using the ring algorithm and the initiator of the election fails after sendind the message? How can this be fixed?
If the initiator fails after sending the “election” message, the message circles around for ever! -> liveness violated
Solved by:
-> Adding timeouts: restart the election after some time and fix the ring
-> predecessor (or successor) of would-be-leader detect the failure and start a new election run
-> use a failure-detector
What defines a smart object and differentiate it from traditional embedded systems?
-> They are able to communicate
-> They usually have very constrained energy sources
If a smart object is deprived of its ability to communicate it no longer is able to fulfill its purpose
What is a main challenge in interconnecting smart-objects? How can this be prevented?
-> Energy constraints, even use of low power wireless radio transceiver is insufficient to guarantee long-lasting batteries.
Usage of e.g. radio duty cycling technology -> Drawback: issues with device discoveryy, rendezvous, host-to-router interaction…
-> How to cope with lossy links and limited range
low power wireless technology trade communication performance for energy efficiency. They have high bit error rates ans short packets
-> How to overvome MTU limitations and header overloads?
IPv6 has large header overload -> efficient header compression is needed
-> suitable application layer protocols needed for smart objects.
Why are TCP, HTTP or FTP used ontop of IPv6 on smart objects?
HTTP is based on TCP, which requires multiple packets for connection establishement and reliability. Moreover, HTTP has large headers that consume bandwidth and range!
What does 6LoWPAN enable and what is it?
It is an adaption layer that enables IPv6 over low-power wireless technologies. The perform fragementation and reassembly, header compression, optimized neighbour discovery and context dissemination.
What is RPL?
It is a routing protocol for low-power and lossy networks.
What is CoAP?
It is an application layer protocol that provides a RESTful web service for smart objects. It uses UDP as the transport layer.
What is ContikiMAC?
It is a duty-cycling MAC protocol that allows smart objects to sleep most of the time and wake up periodically to check for incoming packets.
Zuletzt geändertvor 13 Tagen