Multi-Region Data Consistency System
AWS-based solution for managing geo-distributed data across multiple regions with DynamoDB Global Tables, ensuring strong consistency and idempotency in active-active event processing.
Project Overview
Designed and implemented a serverless, multi-region data management system for a telecom network infrastructure that processes events across multiple AWS regions while maintaining strong consistency and idempotency guarantees.
This solution was critical for a telecom network where Event A (e.g., a network configuration change) must be fully replicated before Event B (the response to Event A) is generated. The architecture prevents scenarios where Event B contains outdated information due to replication lag or concurrent updates.
The Challenge
Problem Statement
This project addressed multiple critical technical challenges:
1. Stale Read Problem
When Region 1 reads data while Region 2 is updating it, the read operation can retrieve outdated information. DynamoDB Global Tables use eventual consistency with typical replication times of under 1 second, but under network stress or high write volumes, this delay can cause critical issues in telecom operations.
2. Concurrent Update Conflicts
If two regions attempt to update the same record before replication completes, DynamoDB's last-writer-wins conflict resolution can overwrite correct data with stale information. This is particularly problematic for network state management where multiple regions may receive related but different events.
3. Event Causality
When an event producer sends Event A (e.g., subscriber activation), the system must respond with Event B (confirmation with updated data). Event B cannot be generated using the state before Event A was processed—this violates causality and can lead to network inconsistencies.
Solution Architecture
(Telecom Network)"] EventAB["Event A / B"] Lambda1["Lambda Region 1
(us-east-1)"] Lambda2["Lambda Region 2
(eu-west-1)"] Lambda3["Lambda Region 3
(ap-south-1)"] DynamoDB["DynamoDB Global Table (Replicated)
Region 1 ↔️ Region 2 ↔️ Region 3
(Bi-directional Replication)"] Streams["DynamoDB Streams"] EventBridge["EventBridge"] Response["Response to
Event Producer"] EventProducer --> EventAB EventAB --> Lambda1 EventAB --> Lambda2 EventAB --> Lambda3 Lambda1 --> DynamoDB Lambda2 --> DynamoDB Lambda3 --> DynamoDB DynamoDB --> Streams Streams --> EventBridge EventBridge --> Response style EventProducer fill:#667eea,stroke:#333,stroke-width:2px,color:#fff style Lambda1 fill:#48bb78,stroke:#333,stroke-width:2px,color:#fff style Lambda2 fill:#48bb78,stroke:#333,stroke-width:2px,color:#fff style Lambda3 fill:#48bb78,stroke:#333,stroke-width:2px,color:#fff style DynamoDB fill:#ed8936,stroke:#333,stroke-width:2px,color:#fff style Streams fill:#4299e1,stroke:#333,stroke-width:2px,color:#fff style EventBridge fill:#9f7aea,stroke:#333,stroke-width:2px,color:#fff style Response fill:#667eea,stroke:#333,stroke-width:2px,color:#fff
Key Features & Implementation
- Conditional Writes with Version Control: Every record includes a version number that must match for updates to succeed, preventing lost updates in concurrent scenarios.
- Idempotency Token System: Each event includes a unique idempotency token stored for 24 hours, ensuring duplicate events are safely ignored.
- Read-Your-Write Consistency: After writing to DynamoDB, the system performs strongly consistent reads to verify data before responding.
- Event Ordering with Sequence Numbers: Events include sequence numbers to ensure causal ordering is maintained across regions.
- Replication Status Verification: CloudWatch metrics monitor replication lag across regions with automated alerts for delays exceeding thresholds.
- Transaction Support: Uses DynamoDB Transactions for atomic multi-item operations within a single region.
- Dead Letter Queue (DLQ): Failed events are routed to SQS DLQ for manual review and replay.
- Distributed Tracing: AWS X-Ray provides end-to-end visibility of event flow across regions.
Implementation Highlights
1. Conditional Write with Version Control
This approach prevents concurrent update conflicts by ensuring the version hasn't changed since the last read:
import { DynamoDBClient, UpdateItemCommand } from '@aws-sdk/client-dynamodb';
import { marshall, unmarshall } from '@aws-sdk/util-dynamodb';
interface EventRecord {
eventId: string;
version: number;
timestamp: string;
status: string;
data: Record<string, any>;
}
export async function processEventWithVersionControl(
eventId: string,
updateData: Record<string, any>,
expectedVersion: number
): Promise<EventRecord> {
const client = new DynamoDBClient({ region: process.env.AWS_REGION });
try {
const command = new UpdateItemCommand({
TableName: process.env.EVENTS_TABLE_NAME,
Key: marshall({ eventId }),
UpdateExpression: 'SET #data = :data, #version = :newVersion, #updatedAt = :updatedAt',
ConditionExpression: '#version = :expectedVersion',
ExpressionAttributeNames: {
'#data': 'data',
'#version': 'version',
'#updatedAt': 'updatedAt'
},
ExpressionAttributeValues: marshall({
':data': updateData,
':newVersion': expectedVersion + 1,
':expectedVersion': expectedVersion,
':updatedAt': new Date().toISOString()
}),
ReturnValues: 'ALL_NEW'
});
const response = await client.send(command);
return unmarshall(response.Attributes!) as EventRecord;
} catch (error: any) {
if (error.name === 'ConditionalCheckFailedException') {
throw new Error(`Concurrent modification detected for event ${eventId}`);
}
throw error;
}
}
2. Idempotency Token Implementation
Ensures events are processed exactly once, even if received multiple times:
import { DynamoDBClient, PutItemCommand, GetItemCommand } from '@aws-sdk/client-dynamodb';
import { marshall, unmarshall } from '@aws-sdk/util-dynamodb';
import * as crypto from 'crypto';
const IDEMPOTENCY_TTL_HOURS = 24;
export async function checkIdempotency(
eventData: Record<string, any>
): Promise<{ isProcessed: boolean; result?: any }> {
const client = new DynamoDBClient({ region: process.env.AWS_REGION });
const token = crypto
.createHash('sha256')
.update(JSON.stringify(eventData))
.digest('hex');
const tableName = process.env.IDEMPOTENCY_TABLE_NAME!;
try {
const getCommand = new GetItemCommand({
TableName: tableName,
Key: marshall({ idempotencyToken: token })
});
const existingRecord = await client.send(getCommand);
if (existingRecord.Item) {
const record = unmarshall(existingRecord.Item);
console.log(`Event already processed: ${token}`);
return { isProcessed: true, result: record.result };
}
return { isProcessed: false };
} catch (error) {
console.error('Error checking idempotency:', error);
throw error;
}
}
Results & Impact
Outcomes Achieved
- 99.99% Consistency: Eliminated stale read issues across all three production regions
- Zero Data Loss: No concurrent update conflicts detected in 6 months of production operation
- Sub-Second Latency: Average event processing time of 450ms including replication verification
- 100% Event Causality: All Event B responses contain current state post-Event A processing
- Cost Optimization: Serverless architecture reduced operational costs by 60% vs. EC2-based solution
- Scalability: Successfully handling 50,000+ events per second across all regions
Technical Insights & Best Practices
DynamoDB Global Tables Considerations
- Eventually Consistent by Default: Always use strongly consistent reads when causal ordering matters
- Last Writer Wins: Implement version control to detect and prevent concurrent modification
- Replication Lag Monitoring: CloudWatch metrics for ReplicationLatency are critical for SLA compliance
- Regional Failover: Design for active-active; don't rely on a single "primary" region
Idempotency Token Strategy
- Generate tokens from event content hash, not just event ID (handles retries with same ID)
- Use TTL to automatically expire old idempotency records (24-48 hours typical)
- Store complete result payload to avoid reprocessing logic on duplicates
Standards & References
- AWS Well-Architected Framework: Reliability and Performance Efficiency pillars
- CAP Theorem: Chose consistency over availability for critical telecom operations
- ACID Transactions: DynamoDB TransactWriteItems for atomic multi-item updates
- Event Sourcing Pattern: Immutable event log with DynamoDB Streams
- CQRS Pattern: Separate read and write models for optimal performance