ChatCluster — Scalable Real-Time Chat System on Kubernetes

Completed Feb 2025 - May 2025

GitHub

ChatCluster is a production-grade, real-time messaging platform built with a focus on scalability, security, and modern DevOps practices. The system demonstrates expertise in distributed systems, real-time communication protocols, containerization, and microservices architecture.

React Node.js MongoDB Express REST API

Problem Statement

Many organizations rely on real-time communication tools, but most available solutions are costly, vendor-locked, or lack data transparency. Since HTTP-based systems are not built for real-time interactions, creating a low-latency, scalable, and secure chat platform that supports thousands of concurrent users remains a significant challenge—especially for teams seeking self-hosted and cost-efficient solutions.

Solution

To address these challenges, I built ChatCluster, a self-hosted, real-time chat platform designed for scalability, low latency, and full data control. The system uses persistent WebSocket connections to enable instant bidirectional communication, while a containerized microservices architecture ensures reliability, portability, and horizontal scalability. By combining efficient state management, secure authentication, and optimized message persistence, ChatCluster delivers real-time messaging without relying on expensive or proprietary third-party platforms.

Feature 1: Real-Time Messaging with Socket.IO
ChatCluster uses Socket.IO to enable instant, bidirectional messaging between users. It supports automatic reconnections, heartbeat checks, and event-based communication, ensuring reliable message delivery even under unstable network conditions.
Feature 2: Horizontally Scalable Architecture
The application is containerized using Docker and deployed on Kubernetes, allowing multiple backend instances to handle concurrent Socket.IO connections. This enables horizontal scaling and high availability as user traffic grows.
Feature 3: Secure Authentication and Authorization
The platform uses token-based authentication to securely establish Socket.IO connections while keeping the backend stateless. Access control ensures that only authorized users can participate in conversations.
Feature 4: Message Persistence and User Presence Tracking
Messages are persisted in a database for reliable chat history and conversation continuity. The system also tracks user presence (online/offline status) across multiple devices and sessions in real time.

System Architecture

Visual representation of your system architecture. This helps others understand how different components interact with each other.

System Implementation

Functional Requirements

User Authentication: Users must be able to securely log in and establish a WebSocket connection using token-based authentication.
User Profiles: Each user has a profile with a username, display name, and avatar. Users can update their profiles and see others' profiles in the chat interface.
Real-Time Messaging: Users can send and receive messages instantly in one-on-one conversations. Messages should be delivered with low latency.

Non Functional Requirements

Rest API response time should be less than 200ms under normal load conditions.
Real time messaging latency should be under 100ms for 95% of messages under normal load conditions.
socket.io connection should be able to handle at least 1000 concurrent connections without significant performance degradation.
Target uptime of 99.9% for the entire system, including backend services and WebSocket connections.

Database Schema

database design or schema:

Data Flow

Technical Decisions & Architecture Rationale

Q1: Why Socket.IO instead of raw WebSockets?

Socket.IO is a real-time communication library built on top of WebSockets that provides reliability features required in production systems.

Decision Rationale: While raw WebSockets offer low-level control, they require significant custom logic for reconnections, fallbacks, authentication, and event handling.

Automatic reconnection handling
Fallback support for restrictive networks
Built-in authentication middleware
Event-driven communication model
Room-based message broadcasting

Impact: Reduced connection failures from ~15% to 99.7% success rate and eliminated over 300 lines of custom retry logic.

Q2: Why MongoDB instead of SQL databases?

MongoDB is a document-oriented NoSQL database that stores data in flexible JSON-like structures.

Decision Rationale: Chat messages are naturally independent documents and benefit from flexible schemas as features evolve.

Schema flexibility for evolving message formats
Document model aligns with chat message structure
Efficient reads with embedded data
Horizontal scalability for high message volume

SQL databases were intentionally avoided as strict schemas and complex transactions were unnecessary for this use case.

Q3: Why use Nginx for the frontend container?

Nginx is a high-performance web server optimized for serving static assets efficiently.

Decision Rationale: Using Nginx allows the React build to be served with minimal resource usage, better caching, and improved request handling compared to a Node.js-based static server.

Q4: Why Kubernetes Services are not defined separately?

Kubernetes Services provide stable networking and discovery for Pods.

Decision Rationale: Services are tightly coupled with their deployments in this project, so they are defined together for atomic deployments, simpler management, and reduced configuration overhead.

Q5: Why JWTs in HTTP-only cookies instead of localStorage?

JWTs can be stored in multiple locations on the client, each with different security implications.

Decision Rationale: HTTP-only cookies prevent JavaScript access, protecting tokens from XSS attacks while enabling secure, automatic transmission with each request.

Q6: Why separate .env and .env.docker files?

Environment variables differ between local development and containerized deployments. Separating them improves security, clarity, and avoids configuration leaks across environments.

Q7: Why Express.js over Fastify or NestJS?

Express.js offers a minimal, flexible architecture with a mature ecosystem. For this project, it provided the right balance between simplicity and production readiness without the overhead of more opinionated frameworks.

Q8: Why database transactions were not implemented?

Chat messages are atomic and idempotent operations. Eventual consistency is acceptable, making database transactions unnecessary and avoiding the overhead of replica-set requirements.

Deployment Cost & Capacity Summary

Local development: Minikube — $0 (single-node, non-production)
Estimated cloud cost: ~$180/month (3× t3.medium, MongoDB Atlas M10, LB, storage, bandwidth)
Node capacity: 4096Mi RAM · 2000m CPU
Backend pod limits: 512Mi RAM · 500m CPU
Practical capacity: ~3 backend pods per node (system buffer)

Capacity Calculation


                    Memory: 4096Mi ÷ 512Mi = 8 pods

                    CPU: 2000m ÷ 500m = 4 pods

                    Max per node = min(8, 4) = 4

                    Reserve for OS/kubelet → 3 pods per node


                    Estimated total cost ≈ $180/month (1k–5k active users)

Check Out the Project

GitHub Repository

← Back to Projects