Lab Notes

The architecture of calm — building software that does not wake you up at 2 AM

11 min read · Published April 2026

[ESSAY: TBD by Mukesh]

This essay will explore the engineering decisions made at build time that determine operational calm post-launch — the difference between software that pages you on a Sunday morning and software that handles its own problems.

Topics to address:

Error budgets: defining acceptable failure rates before you ship, not after the first incident
Monitoring philosophy: what to alert on (user-facing errors, payment failures, auth breakdowns) versus what to log (slow queries, cache misses, API deprecation warnings)
The architecture choices that compound: idempotent operations, graceful degradation, circuit breakers, retry with exponential backoff
Database decisions: connection pooling, read replicas, migration safety nets
The human side: on-call rotation design, incident response templates, post-mortem culture
What we actually ship with every engagement: the monitoring stack, the alert thresholds, the runbook

[ESSAY: TBD by Mukesh]

Next essay

Why we say no to Shopify for D2C brands above ₹5L MRR