undefined

Buffl

Web IS

von Niklas K.

What are the three main hazard dimensions for consistency in distributed systems?

Semantic Integrity
- Violated when atomicity fails (e.g., partial transactions)
Operational Integrity
- Problems from concurrent users (e.g., lost updates, dirty reads)
Replication Integrity
- Arises with multiple replicas needing synchronization
- Risks: read/write discrepancies, outdated (non-topical) data

Ideal case: 1 user, 1 operation, 1 data copy → no inconsistency

But this is rarely realistic in distributed environments.

How is atomicity achieved in distributed databases and what are the challenges?

Atomicity ensures all parts of a transaction commit or none do.

In distributed systems, this requires a distributed commit protocol:
- 2 Phase Commit (2PC):
  - Central coordinator manages the commit
  - All nodes must agree to commit
  - Disadvantages: performance bottleneck, single point of failure
- Paxos:
  - Fully distributed consensus protocol
  - More fault-tolerant, but complex and costly

Note: 2PC is not the same as 2PL (Two-Phase Locking).

Also, locks must be held for the full duration of the protocol.

How is atomicity handled in NoSQL systems?

Transactions in NoSQL ≈ single operation on a single key-value pair or aggregate.

No atomicity support for:
- Multiple operations
- Multiple aggregates
Semantic consistency can only be ensured within a single aggregate.
For operations across aggregates, atomicity must be handled at the application level.
Exception: Some Graph DBs support multi-operation atomicity.

What is the 2-Phase Locking (2PL) protocol and how does it help with concurrency in RDBS?

Serializability = Final result is as if the operations happened one after the other when multiple users access and modfiy data

2PL ensures serializability by dividing a transaction into:

Growing phase: locks are acquired, but not released.
Shrinking phase: locks are released, but no new locks can be acquired.

It prevents concurrency issues by enforcing a serial order of operations across transactions. Two types of protocols:

Pessimistic control (prevention): via locking.
Optimistic control (resolution): via timestamp ordering.

How does NoSQL handle concurrency and isolation compared to RDBS?

NoSQL DBs:

Often do not guarantee isolation across multiple aggregates.
Avoid 2PL because it’s expensive in distributed systems.
Rely on optimistic concurrency control.
Expect the application to handle inconsistencies.

What are typical hazards caused by asynchronous update propagation in replicated systems?

Three major hazards due to asynchronous update propagation:

Read/Write Discrepancies
- R and W of the same process may hit different replicas
- The process might not see its own write in a later read
Non-monotonic Read (R)
- Sequential reads from the same process may access different replicas
- A later read might return an older value than a previous one
Non-monotonic Write (W)
- Sequential writes may be sent to different replicas
- A later write might be overwritten by an earlier one

Root Cause: All are caused by asynchronous update propagation in distributed/replicated storage

What is strong consistency (aka linearizability) in distributed systems?
What does the strong consistency requirement imply?
Why is strong consistency difficult to achieve in practice?

Strong consistency ensures that all processes always see the same version of a value. Once a value is written and committed, any subsequent read must return that latest value, regardless of which replica is accessed.

It implies that the system behaves as if there were only a single copy of the data, making every update instantly visible to all read operations.

Because it requires instant propagation of updates to all replicas, which is nearly impossible in real-world distributed systems due to network latency, partitions, and asynchronous replication.

What makes a system linearizable?

Ensures all operations appear to take effect in a single, global order.

A read concurrent with a write must return the new value.
All processes observe the same committed value, regardless of which replica they access.
Emulates a single copy of data across the system.
Important for preventing read/write inconsistencies in distributed databases.

What is ROWA and how does it ensure strong consistency?

ROWA = Read One, Write All strategy.

All replicas must be synchronously updated before confirming a write.
Any read from one replica returns the latest value.
Guarantees strong consistency across replicas.
Often implemented via 2PC or Paxos protocols.
Main drawback: high write latency due to full synchronization.

What is the ROWA alternative and how does it work?

Only a subset of replicas (e.g. 2 out of 4) are updated synchronously.

Reads must query a larger subset (e.g. at least 3 out of 4 replicas).
Ensures at least one up-to-date replica is included in the read.
Improves performance by reducing write coordination.
Requires use of timestamps or version numbers to identify the latest value during reads.
Balances consistency and availability in distributed systems.

What is the Quorum protocol and how does it ensure strong consistency?

A quorum is the minimum number of replicas that must acknowledge a read or write.

Condition for strong consistency:
- W + R > N, where
  - N = total number of replicas
  - W = replicas required to acknowledge a write
  - R = replicas contacted for a read
This condition ensures overlap between write and read sets.
Guarantees that at least one replica read is up-to-date (has the latest version).
Example:
- If N = 4 and W = 2, then R must be at least 3 (2 + 3 > 4).
Quorum protocol helps relax ROWA, allowing better response times while maintaining consistency.

Compare Serializability vs Linearizability

Aspect	Serializability	Linearizability (Strong Consistency)
Definition	A schedule is serializable if its outcome is equivalent to some serial execution (but order may differ from real-time).	A system is linearizable if operations appear to occur in a total order that respects real-time (wall-clock) ordering.
Focus	Correctness of concurrent transactions	Real-time behavior of read/write operations
Time Awareness	No real-time guarantees – only logical correctness	Real-time aware – newer operations must reflect immediately
Use case	Common in traditional RDBMS (e.g., 2PL ensures it)	Required in strongly consistent distributed systems
Example	It’s okay if A’s write is visible after B’s write, even if A happened before B – as long as the result is like a serial schedule	If A’s write happened before B’s read in real time, B must see A’s write
Cost	Lower (especially in centralized systems)	Higher (requires coordination like quorum, 2PC, Paxos)
Concurrency Control	Often enforced with Two-Phase Locking (2PL)	Enforced with quorum-based protocols, ROWA, Paxos, etc.

Serializability: cares about logical correctness – the order of execution doesn’t have to match real time.
Linearizability: adds a real-time constraint – the system behaves as if there’s a single global clock and a single up-to-date copy of data.

What does the CAP theorem state in the context of distributed systems?

C: Consistency – All clients see the same data at the same time.

A: Availability – System continues to operate even if some nodes fail.
P: Partition Tolerance – System continues operating despite network or message failures.
Only two out of the three can be guaranteed at any one time.

How are NoSQL systems categorized using the CAP Theorem?

CA systems: Prioritize Consistency + Availability (e.g., RDBMS like Greenplum).

CP systems: Prioritize Consistency + Partition Tolerance (e.g., MongoDB, Redis).
AP systems: Prioritize Availability + Partition Tolerance (e.g., Dynamo, Cassandra).
No system provides all three (C, A, P) at once.
Trade-offs are based on application needs and failure assumptions.

What are BASE properties, and how do they relate to CAP?

BASE = “Basically Available, Soft State, Eventually Consistent”
- Basically Available: System responds to every request (partial consistency allowed).
- Soft State: System state may change over time, even without input.
- Eventually Consistent: System becomes consistent once updates are fully propagated.
BASE is often seen in AP systems.
Contrasts with ACID (strong consistency, transactional guarantees).
BASE emphasizes liveness over safety.

How doe eventual and strong consistency differ?

Strong requires r and w step to overlap - eventual not

Eventual does not guarantee each process seing the same version of value

What is a vector clock (VC)?

A data structure tracking causality in distributed systems

Each node has a counter → together form a vector
A versioned value is stored as (value, VC)
Used to determine which update happened before, after, or concurrently

How does a node update its vector clock when sending or receiving an operation?

Sending:
- Increment its own clock by 1
- Send current VC with the operation
Receiving:
- Increment its own clock by 1
- For each VC entry: take max(local SCN, received SCN)

How are two vector clocks compared?

Equal: VCs are identical → in sync

One > other in all or some entries: → one version is more recent (syntactic conflict → auto-resolution possible)
Neither > the other: → concurrent/conflicting (semantic conflict → manual/app-level resolution needed/ semantic reconcilitation)

What’s the difference between syntactic and semantic conflict in vector clocks?

Syntactic Conflict:
- One VC dominates the other (e.g., A1, B1 > A1, B0)
- Can be auto-resolved
Semantic Conflict:
- VCs are incomparable (e.g., A1, B0 vs A0, B1)
- Requires manual or app-level reconciliation

What are Hash Histories and how do they differ from vector clocks?

Track versions using a hash value and a revision number

Conflict detected if same revision number but different hash
Independent of number of nodes
Grow with number of updates
Lighter than vector clocks, which grow with number of writing processes

What is serializability

Serializability = Final result is as if the operations happened one after the other when multiple users access and modfiy data

Beitreten

Vorschau

Author

Niklas K.

Informationen

Zuletzt geändert
vor 2 Monaten

Kurs melden

NoSQL - Consistency

Author

Niklas K.

Informationen