Layer Model
TCP/IP layers
Link Layer
Network Layer
Transport Layer
Application Layer
For transporting Protocol Data Units (PDUs) inside local networks
Important protocols: Ethernet, 802.3, 802.11 (WLAN), PPP
For transporting PDUs between local networks
Important protocols: IP, ARP, ICMP, IGMP
Provides end-to-end transfer of PDUs between applications residing on end systems, possibly in a reliable way
Important protocols: TCP, UDP, SCTP
Contains protocols for providing network services to applications/users and for managing network communication through applications
Important protocols: HTTP, DNS, DHCP, SMTP, SNMP, SIP, RTP, …
TCP/IP responsibilities
each network device is responsible for own implementation
network services are independent from end system
TCP/IP implementations on devices
A network device implements only a subset of the TCP/IP stack and specific protocols necessary for forwarding PDUs
End systems (hosts) implement the complete TCP/IP stack, but only those protocols that are necessary for end-to-end data transpor
Switches
Implement link layer only, forward frames between end systems in the same network
Router
implement link and network layer, forward packets in the direction of the destination network, employ routing protocols to find the “best” way
Link Layer – Network Topologies
Half Duplex
Send or Receive possible, but not at the same time
Typically due to usage of shared medium
Medium access protocol necessary
Full Duplex
Send and Receive possible, at the same time
Requires dedicated bidirectional links between devices
Requires switches for linking more than two systems
Link Layer – Forwarding Modes
Unique identifier: Media Access Control (MAC) addresses (6 Byte long, written in hexadecimal notation) as most used format
three modes for forwarding frames:
Unicast: Destination = single end system,
identified by its MAC address
Broadcast: Destination = all end systems on local network, uses special MAC address as destination (all-1)
Multicast: Destination = Group of end systems, identified by special MAC addresses, group management necessary
Network Layer – Services
Routing of packets: Forwarding of PDUs between local networks, based on structured network address -> Routed protocol (e. g. IP), Establishment of routes between routers -> Routing protocol (e. g. RIP, OSPF)
Fragmentation and reassembly: In case packet size > Maximum Transmission Unit (MTU) of link
Packet hop limiting
Multicast group management
Quality of Service (QoS)
Routing Algorithms
Determination of routes and distribution of routing information follows a specific routing algorithm
the two general classes
Distance Vector: uses a “distance” metric (e. g. number of intermediate routers, i. e. “hops”), each router distributes and receives information only to/from neighbour routers
Link State: uses arbitrary metrics, each router possesses a complete view of the network topology and cooperates with all other routers to establish and maintain this view
Transport Layer – Services
Multiplexing of data streams
Segmentation and Reassembly (SAR)
Flow control: Adapting data rate to receiver capacity
Congestion control: Adapting data rate to network capacity
End-to-End Protection
Communication System
Components:
sender:
source
source encoder
channel encoder
line encoder
channel
noise
through space or time
receiver:
line decoder
channel decoder
source decoder
desination
Source Coding
purpose: reduce amount of data that needs to be transmitted
removing redundancy
compression (lossy or lossless)
assumption: channel is noiseless
Channel Coding
purpose: detection and possible recovery of errors
assumes a noise channel
errors are acounted for by intentonally adding beneficial redundancy in the data
Line Coding
purpose: convert data into physical representation
coding und modulation techniques
Forward Error Correction
application of channel coding
compensate for errors in digital communication
original PDU is augmented with redundant information
check data allows the receiver of the PDU to detect errors (and possible correct it)
Channel Models
used to model the occurrence and manifestation of errors
choosing the right model is crucial for evaluating the effectiveness
Binary Symmetric Channel
simplest model
input bits (xi)
output bits (yi)
xi is corrupred with a certain crossover probabilty (p)
where 𝑝 = 0 (or 𝑝 = 1) represents reliable communication without errors, and 𝑝 = 0.5 represents complete uncertaint
errors are indepented events (1-p)^n that message of length n is transmitted without error
received word is calculated using input and xor-ing it with error word
error word: every location of a bit that was affected by an error during transmission is set to 1
Binary Asymmetric Channel
unequal probabiliteis for bit flip from 0 to 1 or 1 to 0
A common source of these errors are stuck-at faults, where a wire or pin of a circuit is “stuck” at a specific (bit) value
A Z-channel is a special case of a BAC, where only errors that transform a ‘1’ into a ‘0’ can occur (fiber-optic loss is possible but not the generation)
Binary Erasure Channel
modification of the BSC
input: individual bits Xi
output: set of {0, 1, ? }
? = erasure symbol
probability is determiend for each bit independently
errors modelled by a BEC are called erasures
where the location of the error s known but neither the correct nor incorrect value can be determined
typically occur due to HW failures in storage systems
Block Codes
A (binary) (𝒏, 𝑵) block code is a set 𝐶 of binary words with the following characteristics:
𝐶 is a subset of the set of all binary words of length 𝑛, with the latter called the code space of 𝐶, i. e. 𝐶 ⊆ {0, 1}𝑛
The binary words in 𝐶 all satisfy some property required by the code and are called the code words of 𝐶
All other words in the code space, that do not satisfy the property, are non-code words
The code size 𝐶 is the number of code words in 𝐶 and is equal to 𝑁
Encoding
map data words to code words
set 𝐷 of data words consists of all 𝐾 binary words of length 𝑘, with 𝑘 ≥ 1
exactly one code word to each data word,
so it follows that 𝑁 = 𝐾 = 2𝑘
In general, the length of a code word is greater than the length of a data word 𝑛 > 𝑘
(information) rate 𝑅 = 𝑘/𝑛
n-k= redundancy
𝒓-repetition code
a code word is formed by repeating the data word 𝑟 times
R = 1/r
parity code
Count number of 1s
Append 1 if count of 1 is odd 0 otherwise
R = (n-1)/n
channel capacity
A very important for the usage of codes for achieving error-free communication
channel capacity 𝑐𝑎𝑝(𝐶ℎ) of a channel 𝐶ℎ is the maximum rate of information that can be communicated over the channel without error, i. e. the (mean) proportion of bits that can be received error-free in relation to the total number of bits transmitted and possibly corrupted by noise.
binary symmetric channel with a crossover probability 𝑝:
𝑐𝑎𝑝 𝐵𝑆𝐶(𝑝) = 1 − 𝐻(𝑝)
where 𝐻(𝑝) is the (binary) entropy function:
𝐻 𝑝 = − 𝑝 ∗ log2 𝑝 + 1 − 𝑝 ∗ log2(1 − 𝑝)
which basically represents the average “uncertainty” whether a received bit is actually the same as the bit that was sent or whether it was changed by noise during transmission. Note for example, that the entropy/uncertainty is greatest when 𝑝 = 0.5, i. e. when there is equal chance that a received bit was actually sent with the same value or not.
Shannon coding theorem
that for a channel with capacity 𝑐𝑎𝑝 there always exists a code that can reduce the probability of erroneous bits after decoding to an arbitrarily small value, as long as the rate of the code does not exceed the channel capacity, i.e. as longs as 𝑅 ≤ 𝑐𝑎𝑝. In particular, this means that with increasing code word length 𝑛, the number 𝑘 of data bits and therefore the amount of actual useful data transmitted can be increased in tandem, as long as 𝑘/𝑛 ≤ 𝑐𝑎𝑝, without decreasing the probability of a correct decoding.
Hamming Distance
number of bits in which two words differ from each other
Is: non-negativ, symetric and has a triangel inequality
Hamming weight: number of bits that different from 0 in a word
Decoding
return dataword for received word in code space
may be as easy as returning data bits embedded in c
when not equal to some code word then first step is to determine the correct code word
decoding error probability
𝑝𝑒𝑟𝑟(𝒄) for a given code word 𝒄 is the probability, that a received word is decoded to a code word 𝒄′ ≠ 𝒄, assuming that 𝒄 was originally sent
decoding algorithm
decoding algorithm tries to find a code word that maximises one of the following two probabilities:
The probability 𝑃 𝒓 | 𝒄 , which is the probability that the word 𝒓 is received, given that the code word 𝒄 was originally sent
The probability 𝑃 𝒄 | 𝒓 , which is the probability that the code word 𝒄 was originally sent, given that the word 𝒓 was received
Maximum Likelihood Decoder
A MLD chooses the code word 𝒄 as correct, which is the “most likely” explanation for the received word 𝒓, assuming some channel model (and associated error probabilities)
Error Handling Properties
The error correction capability is stated as the maximum number of errors that can be corrected
The error detection capability is stated as the maximum number of errors that can be detected (but not necessarily corrected)
The erasure recovery capability is stated as the maximum number of erasures that can be recovered
Error Correction
task of discovering the locations of erroneous bits in a received word 𝒓 which can then be corrected by simply inverting them
can always correct half min distance -1 number of errors
Error Detection
decoder only indicates that one or more erroneous bits are present in a received word, without determining the exact location of errors
Erasure Recovery
process of determining the correct value for each erasure position in a received word
achieved by comparing the received word 𝒓, including the erasures, against each code word 𝐜 ∈ 𝐶
FEC Issues
hard choosing the right model
overhead is directly proportional to databits, code word length
r-1 bits for each additonal erroneous bit
only feasible when the expected number of transmission errors is low
Checksum Principle
1) The sender takes a payload 𝒑𝒍𝒐𝒂𝒅 from some upper layer constructs a suitable header 𝒉𝒅𝒓 for it and concatenates both to create a PDU as 𝑷𝑫𝑼 = 𝒉𝒅𝒓 | 𝒑𝒍𝒐𝒂𝒅
2) The sender calculates a binary word 𝒄𝒉𝒌𝒔𝒖𝒎 = 𝑓(𝑷𝑫𝑼), called a checksum, from the PDU by using a suitable checksum algorithm 𝑓
> The function 𝑓 should be nonlinear, so that small changes (e. g. single bit flips) in the PDU’s data will lead to a different value for the checksum with a high probability
3) The sender constructs a frame to send 𝒔 by appending the checksum to the PDU, i. e. 𝒔 = 𝑷𝑫𝑼 | 𝒄𝒉𝒌𝒔𝒖𝒎, and sends 𝒔 over the communication link to the receiver
> Alternatively, the checksum might be included in the PDU’s header as a separate field
4) The receiver receives the frame 𝒔′ = 𝑷𝑫𝑼′| 𝒄𝒉𝒌𝒔𝒖𝒎′, which might or might not be equal to the originally sent frame 𝒔, due to transmission errors
5) The receiver calculates its own checksum 𝒄𝒉𝒌𝒔𝒖𝒎′′ = 𝑓(𝑷𝑫𝑼′), using the same function 𝑓 as the sender, and compares it with the received checksum 𝒄𝒉𝒌𝒔𝒖𝒎′ - this can result in one of two outcomes:
I. Both checksum values are equal, i. e. 𝒄𝒉𝒌𝒔𝒖𝒎′′ = 𝒄𝒉𝒌𝒔𝒖𝒎’; in this case, the frame is accepted as correct and processed by the receiver
II. The checksum values differ, i. e. 𝒄𝒉𝒌𝒔𝒖𝒎′′ ≠ 𝒄𝒉𝒌𝒔𝒖𝒎’; in this case, the whole (!) frame is treated as erroneous and discarded by the receiver
Cyclic codes
special class of linear block codes optimised for fast encoding and decoding of large data words - for example, the class of algorithms called Cyclic-Redundancy Check (CRC)
Arithmetic checksum algorithms
checksum is calculated by byte- or word-wise summation of the PDU data modulo some integer – examples are the Internet checksum used in IP, UDP and TCP as well as Fletcher’s checksum and its improvement the Adler checksum
Hash functions
strongly nonlinear functions that generate a fixed-size hash value from arbitrarily large input data – examples are the families of standardised hash functions called MD (Message Digest) and SHA (Secure HAsh)
ARQ Mechanism
alternative to FEC
can also handle frame loss
Automatic Repeat reQuest (ARQ)
sender sends checksum-protected frame and starts timer called Retransmission Time Out (RTO)
at receiver two scenarios
frame arrives without error and is accepted —> receiver send ACK back
frame arrives corrupted / is lost —> is discarded —> no ACK is sent
two scenarios at sender
ACK arrives before RTO expires —> sneder sends next frame
RTO expires —> sender resends frame
ARQ issues
receiver must distinguish betwwn new data frames and retransmission of already recevied data frames
RTO must be chosen with suitable value
retransmission must be bounded to a maximum number of tries —> infinite loop at sender
Stop-and-Wait
only one ACK outstanding at any time
link is therefore half-duplex
sequence number in header of frame distinguishes different frames
sequence number with two values is sufficient
Stop-and-Wait Delayed ACK
bevore sending an ACK receiver waits shortly for a data frame to sender and piggybacks the ACK with it
Group Communication Definition
primitive where a message is sent to a set of entities in a single (abstract) operation
The set of entities is called a group, with the entities as the group’s members
The sender of the message can but need not always be a member, depending on the “openness” of the group
In addition to send and receipt of messages, systems using group communication must support operations for membership management
Open group
senders can be non-members, i. e. messages can be “injected” into the group from the outside
Closed group
senders must be members of the group and receive all messages, including those sent by themselves
membership management
Each member maintains a view of the group, i. e. the set of entities that are currently members of the group
Changes to the view happen due to entities joining or leaving the group (or by experiencing a failure)
Static groups are groups where the view does not change and are much simpler to handle
Group Communication Uses
Service discovery and registration by sending respective queries to a group of entities (e. g. registration server)
Data and event dissemination to a set of entities
Management and monitoring of nodes (e. g. network health monitoring)
Facilitating fault tolerance via replication of services and resources
Implementation of GC
on top of a communication service that supports multicast (MC) communication
Using network layer multicasts, i. e. IP multicast, when supported by the network devices
not widely supported
unreliable
Using application layer multicasts which basically emulate network layer MC via unicast transmission
Entities maintain an overlay network on top of the IP network
can be inefficient but reliable
Reliable Multicast
each member has two commponents
A (member) process that represents the member from an application point-of-view and that creates and processes messages
A (member) middleware that represents the receiver and sender of a multicast message and that performs the actual send/receive operations on behalf of the process
fulfils the following requirement
Integrity: For any message 𝑚, every correct member delivers message 𝑚 at most once and only if message 𝑚 was previously sent by some sender. This basically disallows any duplicated and spurious messages created by the network
Validity: If a correct member sends a message 𝑚, then the same (!) member eventually delivers the message 𝑚. This definition assumes a closed group, where self-delivery of a message by a member is required
Agreement: If a correct member delivers a message 𝑚, then all corrct members eventually deliver message 𝑚. This prevents that only a proper subset of members deliver a message
distinction between receip and delivery
A receipt of a message by a member means that a message is received by the middleware and put into a receive buffer
A delivery of a message means that the middleware delivered a received message to the member’s process
multicast emulation
A member that wants to send a reliable group message 𝑚 to a closed group 𝑔 of members 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (with 𝑖 = 1 … 𝑔 ) uses reliable point-to-point communication (e. g. Stop-and-Wait) to send 𝑚 to each member in one of two ways:
Iteratively, i. e. the next send operation is only performed when the previous send operation was acknowledged
Concurrently, i. e. a send operation is performed for each member at once, after which the acknowledgements from the members are collected
Each member that receives the message 𝑚 via point-to-point communication simply delivers it to its process
Problems: performance, does not scale well, ACK-implosion, is not reliable
reliable multicast emulation
Each member 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (with 𝑖 = 1 … 𝑔 ) of a closed group 𝑔 maintains a set of received messages 𝑅𝑒𝑐 which is initialised as the empty set
When a member wants to send a message 𝑚, it uses reliable point-to-point communication as in the emulation approach
When 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 receives a message 𝑚 it performs the following steps:
If 𝑚 is in the set 𝑅𝑒𝑐 (i. e. was already received before), the message is discarded and the member continues to wait for messages again
Otherwise, 𝑚 is added to 𝑅𝑒𝑐
(send-loop): If the sender of 𝑚 (as seen in the message’s header) is not equal to the receiver 𝑚𝑒𝑚𝑏𝑒𝑟𝑖 (i. e. it was not a self-receipt by the sender):
Send 𝑚 to every member of 𝑔 via reliable point-to-point communication as in the emulation approach
(deliver-action): Deliver 𝑚 to the member’s process for processing
When the (original or another) sender of a message 𝑚 fails between two send operations, there is always at least one other member that reliably received 𝑚 (as guaranteed by the reliable point-to-point communication) and that can send 𝑚 to the remaining members
Even when a receiving member’s process fails right after the member’s middleware received 𝑚, this is not a problem as:
either there is at least one other correct member that can send 𝑚 to the remaining members
or 𝑚 will not be sent to any other member at all because every member that received 𝑚 so far has failed
uniform agreement
If a correct member delivers a message 𝑚 and then continues to work correctly OR fails immediately afterwards, then all correct members eventually deliver message 𝑚
Ordered Multicas
different requirements for the ordering of messages sent via multicast can be stated
Ordering requirements can be stated relative to a single group or, when an entity is a member of more than one group, for a set of overlapping groups
FIFO/Causal order and total order are orthogonal requirements, i. e. a given multicast scheme could satisfy either one or both
Any singular or hybrid ordering can be further combined with reliability requirements to create a reliable and ordered multicast scheme
First-In First-out (FIFO) order
If a sender sends a message 𝑚 before sending another message 𝑚′, then no correct member delivers message 𝑚′ unless it has previously delivered message 𝑚
This disallows that messages sent by the same sender will have their order changed during delivery to a member process
FIFO order can be achieved by an adequate sequence numbering scheme that compensates for reordering due to the network – it is therefore naturally achieved when reliable multicast emulation is used
Causal order
If the sending of a message 𝑚 causally precedes the sending of a message 𝑚′, then no correct member delivers message 𝑚′ unless it has previously delivered message 𝑚
“𝑚 causally precedes 𝑚′” hereby means, that sending of message 𝑚′ was caused by delivery of message 𝑚 in a process
Causal order is a strictly stronger property than FIFO order, as it imposes an order also on messages possibly sent by different processes
Causal order can be achieved via distributed agreement algorithms that use timestamps or similar information attached to messages to allow members to establish the causal relationships between received messages so they can be re-ordered in the middleware before they are delivered
Total order
If two correct members 𝑋 and 𝑌 both deliver messages 𝑚 and 𝑚′, then 𝑋 delivers 𝑚 before 𝑚′ if and only if 𝑌 also delivers 𝑚 before 𝑚′
In contrast to FIFO and causal order, total order imposes an order between members delivering messages from different senders, i. e. it is a form of agreement
Total order can be achieved in several ways:
By using a dedicated node, called a sequencer (which could also be a member), that assigns group-specific sequence numbers to messages received from members via unicast and multicasts them into the group on behalf of any member
By using so-called logical timestamps to enable receivers to determine the order in which messages were sent and re-order them if necessary
atomic multicast
the combination of reliable multicast with totally ordered multicast
guarantees that messages are delivered to all members in the same order or that a given message is not delivered to any member
Zuletzt geändertvor 3 Tagen