Communications

Buffl

Distributed Systems Dependability

von Carmen F.

Layer Model

TCP/IP layers

Link Layer

Network Layer

Transport Layer

Application Layer

Link Layer

For transporting Protocol Data Units (PDUs) inside local networks

Important protocols: Ethernet, 802.3, 802.11 (WLAN), PPP

Network Layer

For transporting PDUs between local networks

Important protocols: IP, ARP, ICMP, IGMP

Transport Layer

Provides end-to-end transfer of PDUs between applications residing on end systems, possibly in a reliable way

Important protocols: TCP, UDP, SCTP

Application Layer

Contains protocols for providing network services to applications/users and for managing network communication through applications

Important protocols: HTTP, DNS, DHCP, SMTP, SNMP, SIP, RTP, …

TCP/IP responsibilities

each network device is responsible for own implementation

network services are independent from end system

TCP/IP implementations on devices

A network device implements only a subset of the TCP/IP stack and specific protocols necessary for forwarding PDUs

End systems (hosts) implement the complete TCP/IP stack, but only those protocols that are necessary for end-to-end data transpor

Switches

Implement link layer only, forward frames between end systems in the same network

Router

implement link and network layer, forward packets in the direction of the destination network, employ routing protocols to find the “best” way

Link Layer – Network Topologies

Half Duplex

Send or Receive possible, but not at the same time

Typically due to usage of shared medium

Medium access protocol necessary

Full Duplex

Send and Receive possible, at the same time

Requires dedicated bidirectional links between devices

Requires switches for linking more than two systems

Link Layer – Forwarding Modes

Unique identifier: Media Access Control (MAC) addresses (6 Byte long, written in hexadecimal notation) as most used format

three modes for forwarding frames:

Unicast: Destination = single end system,
identified by its MAC address
Broadcast: Destination = all end systems on local network, uses special MAC address as destination (all-1)
Multicast: Destination = Group of end systems, identified by special MAC addresses, group management necessary

Network Layer – Services

Routing of packets: Forwarding of PDUs between local networks, based on structured network address -> Routed protocol (e. g. IP), Establishment of routes between routers -> Routing protocol (e. g. RIP, OSPF)

Fragmentation and reassembly: In case packet size > Maximum Transmission Unit (MTU) of link

Packet hop limiting

Multicast group management

Quality of Service (QoS)

Routing Algorithms

Determination of routes and distribution of routing information follows a specific routing algorithm

the two general classes

Distance Vector: uses a “distance” metric (e. g. number of intermediate routers, i. e. “hops”), each router distributes and receives information only to/from neighbour routers

Link State: uses arbitrary metrics, each router possesses a complete view of the network topology and cooperates with all other routers to establish and maintain this view

Transport Layer – Services

Multiplexing of data streams

Segmentation and Reassembly (SAR)

Flow control: Adapting data rate to receiver capacity

Congestion control: Adapting data rate to network capacity

End-to-End Protection

Communication System

Components:

sender:
- source
- source encoder
- channel encoder
- line encoder
- channel
  - noise
  - through space or time
receiver:
- line decoder
- channel decoder
- source decoder
- desination

Source Coding

purpose: reduce amount of data that needs to be transmitted

removing redundancy
compression (lossy or lossless)

assumption: channel is noiseless

Channel Coding

purpose: detection and possible recovery of errors

assumes a noise channel

errors are acounted for by intentonally adding beneficial redundancy in the data

Line Coding

purpose: convert data into physical representation

coding und modulation techniques

Forward Error Correction

application of channel coding

compensate for errors in digital communication

original PDU is augmented with redundant information

check data allows the receiver of the PDU to detect errors (and possible correct it)

Channel Models

used to model the occurrence and manifestation of errors

choosing the right model is crucial for evaluating the effectiveness

Binary Symmetric Channel

simplest model

input bits (xi)

output bits (yi)

xi is corrupred with a certain crossover probabilty (p)

where 𝑝 = 0 (or 𝑝 = 1) represents reliable communication without errors, and 𝑝 = 0.5 represents complete uncertaint

errors are indepented events (1-p)^n that message of length n is transmitted without error

received word is calculated using input and xor-ing it with error word

error word: every location of a bit that was affected by an error during transmission is set to 1

Binary Asymmetric Channel

unequal probabiliteis for bit flip from 0 to 1 or 1 to 0

A common source of these errors are stuck-at faults, where a wire or pin of a circuit is “stuck” at a specific (bit) value

A Z-channel is a special case of a BAC, where only errors that transform a ‘1’ into a ‘0’ can occur (fiber-optic loss is possible but not the generation)

Binary Erasure Channel

modification of the BSC

input: individual bits Xi

output: set of {0, 1, ? }

? = erasure symbol

probability is determiend for each bit independently

errors modelled by a BEC are called erasures

where the location of the error s known but neither the correct nor incorrect value can be determined

typically occur due to HW failures in storage systems

Block Codes

A (binary) (𝒏, 𝑵) block code is a set 𝐶 of binary words with the following characteristics:

𝐶 is a subset of the set of all binary words of length 𝑛, with the latter called the code space of 𝐶, i. e. 𝐶 ⊆ {0, 1}𝑛

The binary words in 𝐶 all satisfy some property required by the code and are called the code words of 𝐶

All other words in the code space, that do not satisfy the property, are non-code words

The code size 𝐶 is the number of code words in 𝐶 and is equal to 𝑁

Encoding

map data words to code words

set 𝐷 of data words consists of all 𝐾 binary words of length 𝑘, with 𝑘 ≥ 1

exactly one code word to each data word,

so it follows that 𝑁 = 𝐾 = 2𝑘

In general, the length of a code word is greater than the length of a data word 𝑛 > 𝑘

(information) rate 𝑅 = 𝑘/𝑛

n-k= redundancy

𝒓-repetition code

a code word is formed by repeating the data word 𝑟 times

R = 1/r

parity code

Count number of 1s
Append 1 if count of 1 is odd 0 otherwise

R = (n-1)/n

channel capacity

A very important for the usage of codes for achieving error-free communication

channel capacity 𝑐𝑎𝑝(𝐶ℎ) of a channel 𝐶ℎ is the maximum rate of information that can be communicated over the channel without error, i. e. the (mean) proportion of bits that can be received error-free in relation to the total number of bits transmitted and possibly corrupted by noise.

binary symmetric channel with a crossover probability 𝑝:

𝑐𝑎𝑝 𝐵𝑆𝐶(𝑝) = 1 − 𝐻(𝑝)

where 𝐻(𝑝) is the (binary) entropy function:

𝐻 𝑝 = − 𝑝 ∗ log2 𝑝 + 1 − 𝑝 ∗ log2(1 − 𝑝)

which basically represents the average “uncertainty” whether a received bit is actually the same as the bit that was sent or whether it was changed by noise during transmission. Note for example, that the entropy/uncertainty is greatest when 𝑝 = 0.5, i. e. when there is equal chance that a received bit was actually sent with the same value or not.

Shannon coding theorem

that for a channel with capacity 𝑐𝑎𝑝 there always exists a code that can reduce the probability of erroneous bits after decoding to an arbitrarily small value, as long as the rate of the code does not exceed the channel capacity, i.e. as longs as 𝑅 ≤ 𝑐𝑎𝑝. In particular, this means that with increasing code word length 𝑛, the number 𝑘 of data bits and therefore the amount of actual useful data transmitted can be increased in tandem, as long as 𝑘/𝑛 ≤ 𝑐𝑎𝑝, without decreasing the probability of a correct decoding.

Hamming Distance

number of bits in which two words differ from each other

Is: non-negativ, symetric and has a triangel inequality

Hamming weight: number of bits that different from 0 in a word

Decoding

return dataword for received word in code space

may be as easy as returning data bits embedded in c

when not equal to some code word then first step is to determine the correct code word

decoding error probability

𝑝𝑒𝑟𝑟(𝒄) for a given code word 𝒄 is the probability, that a received word is decoded to a code word 𝒄′ ≠ 𝒄, assuming that 𝒄 was originally sent

decoding algorithm

decoding algorithm tries to find a code word that maximises one of the following two probabilities:

The probability 𝑃 𝒓 | 𝒄 , which is the probability that the word 𝒓 is received, given that the code word 𝒄 was originally sent
The probability 𝑃 𝒄 | 𝒓 , which is the probability that the code word 𝒄 was originally sent, given that the word 𝒓 was received

Maximum Likelihood Decoder

A MLD chooses the code word 𝒄 as correct, which is the “most likely” explanation for the received word 𝒓, assuming some channel model (and associated error probabilities)

Error Handling Properties

The error correction capability is stated as the maximum number of errors that can be corrected

The error detection capability is stated as the maximum number of errors that can be detected (but not necessarily corrected)

The erasure recovery capability is stated as the maximum number of erasures that can be recovered

Error Correction

task of discovering the locations of erroneous bits in a received word 𝒓 which can then be corrected by simply inverting them

can always correct half min distance -1 number of errors

Error Detection

decoder only indicates that one or more erroneous bits are present in a received word, without determining the exact location of errors

Erasure Recovery

process of determining the correct value for each erasure position in a received word

achieved by comparing the received word 𝒓, including the erasures, against each code word 𝐜 ∈ 𝐶

FEC Issues

hard choosing the right model

overhead is directly proportional to databits, code word length

r-1 bits for each additonal erroneous bit

only feasible when the expected number of transmission errors is low

Checksum Principle

1) The sender takes a payload 𝒑𝒍𝒐𝒂𝒅 from some upper layer constructs a suitable header 𝒉𝒅𝒓 for it and concatenates both to create a PDU as 𝑷𝑫𝑼 = 𝒉𝒅𝒓 | 𝒑𝒍𝒐𝒂𝒅

2) The sender calculates a binary word 𝒄𝒉𝒌𝒔𝒖𝒎 = 𝑓(𝑷𝑫𝑼), called a checksum, from the PDU by using a suitable checksum algorithm 𝑓

> The function 𝑓 should be nonlinear, so that small changes (e. g. single bit flips) in the PDU’s data will lead to a different value for the checksum with a high probability

3) The sender constructs a frame to send 𝒔 by appending the checksum to the PDU, i. e. 𝒔 = 𝑷𝑫𝑼 | 𝒄𝒉𝒌𝒔𝒖𝒎, and sends 𝒔 over the communication link to the receiver

> Alternatively, the checksum might be included in the PDU’s header as a separate field

4) The receiver receives the frame 𝒔′ = 𝑷𝑫𝑼′| 𝒄𝒉𝒌𝒔𝒖𝒎′, which might or might not be equal to the originally sent frame 𝒔, due to transmission errors

5) The receiver calculates its own checksum 𝒄𝒉𝒌𝒔𝒖𝒎′′ = 𝑓(𝑷𝑫𝑼′), using the same function 𝑓 as the sender, and compares it with the received checksum 𝒄𝒉𝒌𝒔𝒖𝒎′ - this can result in one of two outcomes:

I. Both checksum values are equal, i. e. 𝒄𝒉𝒌𝒔𝒖𝒎′′ = 𝒄𝒉𝒌𝒔𝒖𝒎’; in this case, the frame is accepted as correct and processed by the receiver

II. The checksum values differ, i. e. 𝒄𝒉𝒌𝒔𝒖𝒎′′ ≠ 𝒄𝒉𝒌𝒔𝒖𝒎’; in this case, the whole (!) frame is treated as erroneous and discarded by the receiver

Cyclic codes

special class of linear block codes optimised for fast encoding and decoding of large data words - for example, the class of algorithms called Cyclic-Redundancy Check (CRC)

Arithmetic checksum algorithms

checksum is calculated by byte- or word-wise summation of the PDU data modulo some integer – examples are the Internet checksum used in IP, UDP and TCP as well as Fletcher’s checksum and its improvement the Adler checksum

Hash functions

strongly nonlinear functions that generate a fixed-size hash value from arbitrarily large input data – examples are the families of standardised hash functions called MD (Message Digest) and SHA (Secure HAsh)

ARQ Mechanism

alternative to FEC

can also handle frame loss

Automatic Repeat reQuest (ARQ)

sender sends checksum-protected frame and starts timer called Retransmission Time Out (RTO)
at receiver two scenarios
1. frame arrives without error and is accepted —> receiver send ACK back
2. frame arrives corrupted / is lost —> is discarded —> no ACK is sent
two scenarios at sender
1. ACK arrives before RTO expires —> sneder sends next frame
2. RTO expires —> sender resends frame

ARQ issues

receiver must distinguish betwwn new data frames and retransmission of already recevied data frames

RTO must be chosen with suitable value

retransmission must be bounded to a maximum number of tries —> infinite loop at sender

Stop-and-Wait

only one ACK outstanding at any time

link is therefore half-duplex

sequence number in header of frame distinguishes different frames

sequence number with two values is sufficient

Stop-and-Wait Delayed ACK

bevore sending an ACK receiver waits shortly for a data frame to sender and piggybacks the ACK with it

Group Communication Definition

primitive where a message is sent to a set of entities in a single (abstract) operation

The set of entities is called a group, with the entities as the group’s members

The sender of the message can but need not always be a member, depending on the “openness” of the group

In addition to send and receipt of messages, systems using group communication must support operations for membership management

Open group

senders can be non-members, i. e. messages can be “injected” into the group from the outside

Closed group

senders must be members of the group and receive all messages, including those sent by themselves

membership management

Each member maintains a view of the group, i. e. the set of entities that are currently members of the group

Changes to the view happen due to entities joining or leaving the group (or by experiencing a failure)

Static groups are groups where the view does not change and are much simpler to handle

Group Communication Uses

Service discovery and registration by sending respective queries to a group of entities (e. g. registration server)

Data and event dissemination to a set of entities

Management and monitoring of nodes (e. g. network health monitoring)

Facilitating fault tolerance via replication of services and resources

Implementation of GC

on top of a communication service that supports multicast (MC) communication

Using network layer multicasts, i. e. IP multicast, when supported by the network devices

not widely supported
unreliable

Using application layer multicasts which basically emulate network layer MC via unicast transmission

Entities maintain an overlay network on top of the IP network
can be inefficient but reliable

Reliable Multicast

each member has two commponents

A (member) process that represents the member from an application point-of-view and that creates and processes messages
A (member) middleware that represents the receiver and sender of a multicast message and that performs the actual send/receive operations on behalf of the process

fulfils the following requirement

Integrity: For any message 𝑚, every correct member delivers message 𝑚 at most once and only if message 𝑚 was previously sent by some sender. This basically disallows any duplicated and spurious messages created by the network

Validity: If a correct member sends a message 𝑚, then the same (!) member eventually delivers the message 𝑚. This definition assumes a closed group, where self-delivery of a message by a member is required

Agreement: If a correct member delivers a message 𝑚, then all corrct members eventually deliver message 𝑚. This prevents that only a proper subset of members deliver a message

distinction between receip and delivery

A receipt of a message by a member means that a message is received by the middleware and put into a receive buffer

A delivery of a message means that the middleware delivered a received message to the member’s process

multicast emulation

A member that wants to send a reliable group message 𝑚 to a closed group 𝑔 of members 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (with 𝑖 = 1 … 𝑔 ) uses reliable point-to-point communication (e. g. Stop-and-Wait) to send 𝑚 to each member in one of two ways:

Iteratively, i. e. the next send operation is only performed when the previous send operation was acknowledged

Concurrently, i. e. a send operation is performed for each member at once, after which the acknowledgements from the members are collected

Each member that receives the message 𝑚 via point-to-point communication simply delivers it to its process

Problems: performance, does not scale well, ACK-implosion, is not reliable

reliable multicast emulation

Each member 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (with 𝑖 = 1 … 𝑔 ) of a closed group 𝑔 maintains a set of received messages 𝑅𝑒𝑐 which is initialised as the empty set

When a member wants to send a message 𝑚, it uses reliable point-to-point communication as in the emulation approach

When 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 receives a message 𝑚 it performs the following steps:

If 𝑚 is in the set 𝑅𝑒𝑐 (i. e. was already received before), the message is discarded and the member continues to wait for messages again
Otherwise, 𝑚 is added to 𝑅𝑒𝑐
(send-loop): If the sender of 𝑚 (as seen in the message’s header) is not equal to the receiver 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (i. e. it was not a self-receipt by the sender):
- Send 𝑚 to every member of 𝑔 via reliable point-to-point communication as in the emulation approach
(deliver-action): Deliver 𝑚 to the member’s process for processing

When the (original or another) sender of a message 𝑚 fails between two send operations, there is always at least one other member that reliably received 𝑚 (as guaranteed by the reliable point-to-point communication) and that can send 𝑚 to the remaining members

Even when a receiving member’s process fails right after the member’s middleware received 𝑚, this is not a problem as:

either there is at least one other correct member that can send 𝑚 to the remaining members
or 𝑚 will not be sent to any other member at all because every member that received 𝑚 so far has failed

uniform agreement

If a correct member delivers a message 𝑚 and then continues to work correctly OR fails immediately afterwards, then all correct members eventually deliver message 𝑚

Ordered Multicas

different requirements for the ordering of messages sent via multicast can be stated

Ordering requirements can be stated relative to a single group or, when an entity is a member of more than one group, for a set of overlapping groups

FIFO/Causal order and total order are orthogonal requirements, i. e. a given multicast scheme could satisfy either one or both

Any singular or hybrid ordering can be further combined with reliability requirements to create a reliable and ordered multicast scheme

First-In First-out (FIFO) order

If a sender sends a message 𝑚 before sending another message 𝑚′, then no correct member delivers message 𝑚′ unless it has previously delivered message 𝑚

This disallows that messages sent by the same sender will have their order changed during delivery to a member process

FIFO order can be achieved by an adequate sequence numbering scheme that compensates for reordering due to the network – it is therefore naturally achieved when reliable multicast emulation is used

Causal order

If the sending of a message 𝑚 causally precedes the sending of a message 𝑚′, then no correct member delivers message 𝑚′ unless it has previously delivered message 𝑚

“𝑚 causally precedes 𝑚′” hereby means, that sending of message 𝑚′ was caused by delivery of message 𝑚 in a process

Causal order is a strictly stronger property than FIFO order, as it imposes an order also on messages possibly sent by different processes

Causal order can be achieved via distributed agreement algorithms that use timestamps or similar information attached to messages to allow members to establish the causal relationships between received messages so they can be re-ordered in the middleware before they are delivered

Total order

If two correct members 𝑋 and 𝑌 both deliver messages 𝑚 and 𝑚′, then 𝑋 delivers 𝑚 before 𝑚′ if and only if 𝑌 also delivers 𝑚 before 𝑚′

In contrast to FIFO and causal order, total order imposes an order between members delivering messages from different senders, i. e. it is a form of agreement

Total order can be achieved in several ways:

By using a dedicated node, called a sequencer (which could also be a member), that assigns group-specific sequence numbers to messages received from members via unicast and multicasts them into the group on behalf of any member

By using so-called logical timestamps to enable receivers to determine the order in which messages were sent and re-order them if necessary

atomic multicast

the combination of reliable multicast with totally ordered multicast

guarantees that messages are delivered to all members in the same order or that a given message is not delivered to any member

Beitreten

Vorschau

Author

Carmen F.

Informationen

Zuletzt geändert
vor 3 Monaten

Kurs melden