Infrastructure as Code: Stop Managing Servers Manually Before It Destroys Your Sanity

Infrastructure as Code (IaC) defines your servers, networks, and databases in version-controlled code—Terraform, CloudFormation, Pulumi—instead of clicking through a cloud console. You review it like software, reproduce identical environments, recover from disasters in minutes, and stop the "if Raj gets hit by a bus, our infra dies" single point of failure.

A practical, jargon-free guide for Indian engineering teams and founders — part of the Learn AI with Reeturaj series on InBharat AI.

I remember the disaster.

Production database crashed at 3 AM. We'd provisioned the server six months ago. Nobody documented the kernel parameters. The monitoring configuration was someone's custom Bash script on a laptop. The database replication setup existed only in Senior Engineer Raj's head.

When Raj was sleeping, we had to wake him. He logged in through memory. Fixed it. But then we realized: if Raj gets hit by a bus tomorrow, our infrastructure dies.

That panic motivated me to adopt Infrastructure as Code. Best decision we made.

IaC means describing your infrastructure in code, version-controlled, reviewed like software. Not clicking buttons in AWS console. Not logging into servers and running commands. Actual code that defines what your infrastructure should be.

Let me explain why this changed everything about reliability at InBharat.ai.

The Problem with Manual Infrastructure

Before IaC, we did this:

  • Open AWS console
  • Create EC2 instance with specific parameters
  • SSH into the instance
  • Run custom scripts to install software
  • Configure networking manually
  • Set up monitoring through GUI
  • Document everything in Confluence (documentation that quickly gets outdated)

When we needed another identical environment for staging, we couldn't easily reproduce it. Every environment differed slightly. Database configuration in production had custom kernel parameters that staging didn't. Load balancing was configured slightly differently.

This is called "infrastructure drift." Your intended configuration diverges from reality. Manual changes accumulate. Documentation lies. Nobody knows the current actual state.

At some point, we needed to scale from one production server to three. This required recreation. Different configurations got applied to different servers. One had stricter permissions than others. One had an old package version. Debugging problems meant checking each server individually.

It was chaos.

IaC Solves This

Infrastructure as Code means expressing your infrastructure requirements in code.

Instead of clicking AWS console, you write Terraform:

resource "aws_instance" "ml_inference" {
      ami           = "ami-0123456789abcdef"
      instance_type = "t3.xlarge"

      tags = {
        Name = "InBharat-inference-prod"
      }
    }
    

Now your infrastructure is in Git. You can review it. You can version control it. You can reproduce it. You can deploy it repeatedly.

Need another inference server identical to the existing one? Run Terraform again. It creates another with identical configuration. No mystery. No drift.

The Actual Benefits

Repeatability and Consistency. When your entire infrastructure is code, deploying development, staging, and production becomes trivial. Same code, different variables. Development uses smaller instances for cost efficiency. Production uses larger ones. The configuration structure is identical.

This eliminates the "works in production but not staging" problem because staging actually matches production.

Scalability Without Manual Work. When traffic spikes, you need more servers. Manually provisioning servers takes 20 minutes. Auto-scaling based on IaC can add servers in 2 minutes.

We set up auto-scaling for our NLP inference cluster. During Zomato's dinner rush, traffic spikes. Before IaC, we'd manually SSH to servers and increase connection pool sizes. Now? Auto-scaling automatically provisions servers with identical configuration.

Audit Trails. Before IaC, if something changed in production, you'd have to ask around. "Who changed the load balancer?" "When did we upgrade the database?" "Why does this security group allow traffic from 10.0.0.0/8?"

With IaC in version control, you see exactly what changed, who changed it, when, and why (the commit message). This is crucial for compliance and incident investigation.

Disaster Recovery. Your production infrastructure crashed. You need to rebuild it.

Without IaC, this takes days. You rebuild servers manually. You restore from backups, hoping backups are current.

With IaC, you run Terraform and your entire infrastructure redeploys in 30 minutes. This isn't theoretical for us. We had a database failure last year. Rebuilding from IaC took 45 minutes. Without IaC, it would have taken three days.

Rapid Experimentation. Want to test a new database configuration? Terraform out a test environment. Run experiments. Destroy it. Costs you 2 hours and ₹500 in cloud bills. Without IaC, you'd spend days on setup and would be afraid to destroy an environment.

Declarative vs Imperative

This is where Terraform really shines. The two paradigms differ in how you tell the tool what to do:

Dimension Declarative (Terraform) Imperative (Ansible)
You specify The desired end state Exact steps to run
The tool Figures out how to get there Executes your steps in order
Best for Cloud-native greenfield infra Legacy config + orchestration
Drift handling Detects + reconciles automatically Re-runs steps

Example:

Imperative: Provision instance, wait 2 minutes, install nginx, configure nginx, start nginx

Declarative: I want an instance running nginx

Terraform figures out provisioning, software installation, configuration. If you change the desired configuration to use Apache, Terraform automatically replaces the instance.

For complex enterprise scenarios with legacy integration, imperative tools like Ansible work better. But for cloud-native infrastructure, declarative is superior.

Tools in the IaC Ecosystem

Terraform (HashiCorp): Most popular. Cloud-agnostic. Uses HCL language. Mature ecosystem.

CloudFormation (AWS-native): If you're AWS-only, it's integrated. Less portable than Terraform but very powerful within AWS.

Ansible (imperative configuration management): Smaller learning curve than Terraform. Better for existing infrastructure. Can be used alongside Terraform.

Pulumi (newer, programmatic): Uses actual programming languages (Python, TypeScript, Go) instead of domain-specific languages. Easier for developers but less declarative.

We chose Terraform because we're multi-cloud and needed portability.

Secrets Management

Your Terraform code is version controlled. But some values are secrets—database passwords, API keys. You can't commit secrets to Git.

We solve this with:

  • AWS Secrets Manager for database passwords and API keys
  • Environment variables for sensitive configuration
  • IAM roles for service authentication (no keys needed)
  • Audit logging for secret access

Terraform reads these at deployment time. The code never contains actual secrets. State files are stored in S3 with versioning and locking enabled—corrupting state breaks everything, so it's protected.

Monitoring as Code

Beyond server provisioning, we describe monitoring infrastructure as code.

CloudWatch alarms, Prometheus rules, Grafana dashboards—all defined in Terraform. When we change application behavior, monitoring configuration changes alongside it.

This prevents the scenario where code changes but monitoring didn't update, leading to blind spots in production.

Cost Optimization

IaC enables aggressive cost optimization.

We track what resources each service consumes. Terraform cost estimation tools predict infrastructure expenses. Before deploying a new service, you know the monthly cloud bill.

We've used this to optimize aggressively. Switching from m5.xlarge to t3.large instances where sustained CPU isn't needed. Using spot instances for non-critical workloads. Auto-scaling policies that shut down unused resources at night.

Result: our infrastructure costs dropped 35% year-over-year while capacity increased.

For Startups in India

If you're building AI startups, adopt IaC immediately.

Cloud bills add up fast when you're training models and running inference at scale. IaC lets you measure infrastructure cost per feature. Expensive feature? IaC makes it obvious and easy to optimize.

Also, IaC enables scaling without hiring new infrastructure people. You need one senior engineer comfortable with Terraform, not a three-person Ops team.

What Changed at InBharat

We transitioned to IaC three years ago.

Before: manual infrastructure management, frequent mistakes, scary production incidents, onboarding new infrastructure engineers took months.

After: self-documenting infrastructure, consistent environments, rapid provisioning, confident scaling, infrastructure changes reviewed like code.

The transition took two months. We documented all existing infrastructure as Terraform code. We created reusable modules. We standardized on versions and configuration patterns.

Now? Provisioning a new service's infrastructure takes a developer 30 minutes. It's self-service with automated review. Changes are tracked. Disaster recovery is practiced (we test rebuilding from IaC quarterly).

Frequently Asked Questions

What is Infrastructure as Code? Defining your infrastructure—servers, networks, databases, monitoring—as version-controlled code (Terraform, CloudFormation, Pulumi) that's reviewed like software, instead of manually clicking through a cloud console or SSH-ing into boxes.

What problem does IaC solve? Infrastructure drift—manual changes accumulate, docs go stale, nobody knows the true state, and environments differ. IaC makes infra reproducible, auditable, and recoverable, removing the single engineer who holds the setup in their head.

Declarative vs imperative IaC—what's the difference? Declarative (Terraform) describes the desired end state; the tool figures out how. Imperative (Ansible) specifies exact steps. Declarative wins for cloud-native infra; imperative suits legacy configuration. Many teams use both.

How does IaC help disaster recovery? You run Terraform and rebuild the entire infrastructure in ~30–45 minutes from code, instead of days of manual reconstruction hoping backups are current. Recovery becomes a tested, repeatable operation.

How do you keep secrets out of IaC? Never commit secrets to Git. Terraform reads database passwords and API keys from a secret manager (AWS Secrets Manager) or environment variables at deploy time; IAM roles handle service auth without keys. State files are stored with versioning and locking.


Reeturaj Goswami is the founder of InBharat.ai, building AI infrastructure that is built in India, for India. He's passionate about helping Indian startups scale infrastructure efficiently through best practices and better tools. When not building, he's evangelizing infrastructure excellence.

#InBharat #DeshKaAI #AIForBharat #InfrastructureAsCode #Terraform #CloudInfrastructure #DevOps #ScalableArchitecture #CloudComputing #TechOperations