Architecting Scalable Microservices with NestJS
When building high-load systems, the initial architecture decisions often determine the ceiling of your scalability. In this deep dive, I will break down exactly how we moved from a monolithic Express app to a distributed NestJS system handling 10k requests/second.
The Monolith Bottleneck
Our initial application was a standard modular monolith built with Express and MongoDB. It worked great for the first 10,000 users. However, as we introduced image processing and real-time analytics features, we hit a wall.
The Problem: Node.js Event Loop Blocking
Our application was becoming CPU bound. Image processing tasks (using Sharp) were blocking the main thread, causing significant latency spikes for simple API calls like login or profile fetching.
// The Bad Way: Blocking the Event Loop
app.post('/process-image', (req, res) => {
// This synchronous heavy calculation blocks ALL other requests
const result = heavyImageProcessing(req.body);
res.send(result);
});
We realized that scaling the entire application just to handle the load of one specific feature (image processing) was inefficient and costly. We needed to decouple components.
The Solution: Event-Driven Microservices
We decided to split the application into domain-driven microservices. We chose NestJS for its native support for microservices and strict architectural patterns (Modules, Providers, Controllers).
Architecture Overview
- API Gateway (NestJS): The entry point. Handles authentication (JWT), rate limiting, and request validation/routing.
- Auth Service: Manages users, roles, and tokens.
- Media Service (Worker): Consumes messages to process images/video.
- Notification Service: Sends emails/Push/Websockets.
Asynchronous Communication with RabbitMQ
To decouple the services, we avoided direct HTTP (REST/gRPC) calls for write operations. Instead, we used RabbitMQ as a message broker.
// The Better Way: Producer (API Gateway)
@Post('process')
async processImage(@Body() data: ImageDto) {
// Returns immediately to the user
this.client.emit('image_uploaded', data);
return { status: 'queued', timestamp: new Date() };
}
// Consumer (Worker Service)
@EventPattern('image_uploaded')
async handleImageUpload(@Payload() data: ImageDto) {
// Processes in background, scaling independently
await this.imageProcessor.resize(data);
// Emits processing_complete event when done
this.eventBus.emit('processing_complete', { id: data.id });
}
Handling Data Consistency: The Saga Pattern
One of the biggest challenges in microservices is distributed transactions. You can't just BEGIN TRANSACTION across three different databases.
We implemented the Choreography-based Saga pattern. Each service listens for events and performs a local transaction. If a step fails, a "Compensating Transaction" (or rollback event) is triggered to undo previous changes.
For example, if the *Order Service* charges the user but the *Inventory Service* is out of stock, the Inventory Service emits an OrderFailed event, and the Order Service listens to it to refund the payment.
Observability with OpenTelemetry
With 15+ services running, debugging became a nightmare. "500 Internal Server Error" could mean anything.
We integrated OpenTelemetry with Jaeger to trace requests across service boundaries. By passing a TraceID in the headers (or AMQP metadata), we could visualize the entire request lifecycle.
Key Takeaways
Microservices are not a silver bullet. They introduce significant complexity:
- Deployment: You need Kubernetes or Docker Swarm.
- Debugging: Distributed tracing is mandatory.
- Latency: Network hops add overhead.
- Independent Scaling: We scaled the "Media Worker" to 50 pods during Black Friday while keeping the "Auth Service" at 3 pods.
- Fault Isolation: If the Notification Service crashes, users can still log in and buy products.
- Technology Diversity: We wrote the Image Processing service in Go for performance, while keeping the rest in Node.js.