What are current problems in the internet w.r.t. using content delivery networks?
websites often need to handle lot of traffic
video on demand replacing linear TV
videos need lots of bandwidth
connecting to server far away introduces latency
latency is foe of user satisfaction
DDoS are common
What is planned to acheive in CDNs?
distribute load over lot of content servers
have servers physically as near as possible to customer site
How are content servers close to the end of CDNs that offer static content (e.g. videos) called?
edge-caches
Where might one find edge caches? What can be an advantage of that?
may be placed in ISP networks (e.g. netflix does that)
distributing load can mitigate DDoS
caching can reduce amount of traffic between ASes
What are benefits of CDN load distribution and edge caches?
for customer
lower latency
higher reliability
for ISP and consumer
lower traffic
reduced cost
better service quality
for content providers
reduced costs
What are elements and stages in the CDN cache hierarchy?
from high to low distance to customer:
Full content archive (ARCHIVE)
-> connects to several
Regional Caches (e.g. Europe, Asia, America,…)
-> connect to several
Edge Caches
Customers
How is the structure of the cache hierarchy?
archive holds all the data (e.g. whole video library)
might be distributed itself…
regional caches get content from archive
edge caches (located at ISP/IXPs) get content from regional cache
if cache does not hold requested content -> ask its parent…
real deployments may have more levels / be more sophisticated
What is the usual cache access architecture?
client connects to load balancer
distributing the requests to several content servers
What might be a problem with load balancers?
data plane devices forward packets
-> packets usually belong to flow
-> there may be multiple next hops for the destinaiton of a packet
=> if different next hops used within one flow, packets might be reordered
=> messes with TCP congestion control…
What can be a solution to ensure that packets from same flow use the same next hop?
use 5-tuple hash
=> calculate hash over 5-tuple
=> same for all packets within one flow
=> chose next hop based on means of the hash…
=> packets all on same path between soruce and destination…
What is something one has to consider in hashing and load balancing?
one has to choose a server for each flow…
What are solutions on how to choose the edge server for individual flows when using 5-tuple hashes?
moduo hashing
consistent hashing
How does modulo hashing work?
N servers
-> redirect client (with 5-tuple hash value h) to server
h mod N
What might be problems with modulo hashing?
what happens if N changes?
eery client is hashed to new location…
What is the idea behind consistent hasing?
map each client to point on the edge of a circle
e.g. [0,…,15]
walk aroud the “circle” until a server is reached
(-> each server also gets assigned to a point in the circle…)
What happens in consistent hashing if the number of servers N changes?
points are removed or added on circle
=> clents are remapped according to available positions on the circle…
What are advantages of consistent hashign?
only K/N keys need to be remapped on average with K being the number of clients
How can one improve consistent hashing?
map servers not to single but multiple positions in the circle
-> more even distribution of clients to server mapping
What is the idea behing TCP load balancers?
Have front end server (for redirects)
front end knows pool of content servers
front end always answers with “302 temporarily moved”
browser will automatically connect to content server mentioned in moved message
Why is encryption potentially problenatic in terms of CDN?
front end may want to select server based on URL, cookies
traffic contains confidential content
front end server needs to have the TLS certificate
=> what happens after the conection is decrypted?!!!
What are the options for encryptoin (when) in CDN?
use SSL/TLS front-end special high performance hardware
snooping inside network yields clear text
very cost efficient
one central configuration
perform decryption / encryption on content server
may burn lot of CPU cycles
decryption as late as possinble
easily deployed out-of-the box
What is the idea of DNS based CDNs?
remove useless RTT caused by frontend redirect
=> directly connect to correct content server
=> resolve URL to different IP (from the different content servers)
What are the two possibilties for DNS based load balancing?
lots of entries
geo based
What is the naive idea of using lots of DNS entries for load balancing?
DNS may return mutlipel A/AAAA entries
client chooses randomly
=> in average, each server shoudl get equal amount of traffic
clietn / DNS does the load balancing
What is a problem with naive DNS load balancing? What is a solution to it?
may lead to endpoitns far away
solition:
nameserver has IP address of resolver
resolver may give the /24 prefix of client
=> nameserver knows rough geographical locatoin of resolver/client
=> provide the resolver/client with nearest content server
=> Geo-DNS
But: may still be mislead (use resolver in another coutry which doesnt give /24 prefix)
What is the DDoS protection scheme of CDNs?
set up DNS so that it points to the protectoin provider’s CDN (e.g. cloudflare)
=> all traffic passes through the CDN
CDN caches static content (e.g. videos)
forwards dynamic content requests (e.g. php-websites) to the real webserver
What is a problem wih the regular DDoS protection scheme?
CDN needs to terminate TLS connection
=> CDN has complete access to connetion, including secret cookies, passwords,…
=> protection between CND and actual webserver migth be protected, might be not…
What are problems with using DNS for load balancing?
still may be misleading
DNS was never intended to work that way
What is the solution to DNS load balancing?
packets routed through BGP
Basically:
manipulate routes via BGP
assiogn lots of content servers same IP
-> announce IP prefix through lots of different sites / perrings
=> BGP searches best path (shortest…) from each source itself…
What are the design goals of a cache?
decrease latency of requests from clients
increase hit-cache ratio
What architecture design considerations have to be done in CDN caches?
which objects to cache?
where to cache?
What size of cache do we need?
caching strategy?
how to update the caceh?
What are the essential metrics for caches?
cache-hit / cache-miss
What are problems in CDN in terms of requested resources w.r.t. caching?
many objects only accessed once
uses disk space without benefit
more accessed objects could be eviced from cache (LRU)
What are solutuios to get efficient cache strategy?
use cache filtering
bloom filter
use filter to decide which objects to cache
example policy: only cache objects already seen once
How can bloom filters be used as cache filters?
Bloom filter:
stochastic data structure
map object hashes into table with binary entries
use multiple has functions to decerase false positives
What are bloom filters?
data structure that uses hashing
-> space efficient and easy to test wether element in set or to add element to set…
-> says element is definitely not in set or it is possible in the set…
bit array of m bits -> initially set to 0
k hash functions that map to interval [0, m-1]
-> add elemetn by applying all hashes to it and then set resulting index in bit array to 1…
What are challenges with bloom filteres?
speed -> use sungle hash and partition it into multiple hashes
size: balance between false positives (e.g. indices set to 1 correspond to element not in list…); number of hash functions, number of stored objects and size of list…
Last changed2 years ago