undefined

von Niklas K.

What are the basic principles of MapReduce?

Map
- Generates intermediate results from input data
- Input: aggregates
- Output: key-value pairs
- Each map task is independent → safely parallelizable
Reduce
- Aggregates intermediate results
- Input: multiple map outputs with the same key
- Output: a combined value per key
Analogies:
- Map ≈ SQL GROUP BY
- Reduce ≈ SQL AGGREGATE

Assume you have three nodes containing words like

a,a,b,c

c,d

a,a,c

How would the map and shuffle & sort step look like?

How could pseudo code look like?

What does the reduce step do after the map step?

How could pseudo code look like?

How can MapReduce be implemented with respect to system architecture?

What is the purpose of combinable reducers in MapReduce?

What properties must a reduce function have to be combinable?

Composability:
- Output type of reduce must match input type of map
- Allows nesting: reduce(key, [C, reduce(key, [A, B])]) == reduce(key, [C, A, B])
Confluence:
- Idempotency: Reapplying reduce does not change the result
- Order-agnosticism: Result doesn’t depend on order of values:
  reduce(key, [A, B]) == reduce(key, [B, A])

What is decentralized MapReduce and how does it work?

Works with consistent hashing
MAP: Logic sent to nodes, applied locally based on hashed input data, clockwise
REDUCE: Logic sent to nodes, applied locally based on hashed intermediate results, also clockwise
Nodes act as workers, executing tasks based on hash values

What are the drawbacks and enhancements related to MapReduce?

Drawbacks:

Enhancements:

Zuletzt geändert
vor 2 Monaten

NoSQL - Map reduce