KERNEL PANIC

FATAL_ERROR: RED_BULL_RESERVOIR_EMPTY

A problem has been detected and systems have been shut down to prevent damage to your sanity.


*** STOP: 0x000000GO (0x000000RU, 0x000000ST, 0x000000SRE, 0x000000AI)


Rebooting in 5 seconds...

Originally published on an external platform.

In this blog, I will share our journey from being legitimate script kiddies to a GitOps-enabled Infrastructure. Like every other DevOps Team, we had our fair share of chaos; we owned a combination of tools depending on who was working and what problem they were trying to solve.

We thought we had a well-defined process for provisioning infrastructure:

Python/Ruby/Bash for spinning up Cloud Components, Chef for configuration management, Jenkins to run Adhoc Jobs, and GitHub for storing changes (if we remembered to do so đŸ€Ș).

However, what we ended up with was a bunch of disparate scripts, no real version control concepts, Chef cookbooks with no idempotency and conflicting versions, and often no GitHub check-ins.

If I am being honest, we were in a very chaotic situation with no confidence in our infrastructure. Outages were happening left, right, and center, and often we didn’t know why.

So we thought: What should we do to fix this? How can we reach a state where we have confidence in our infrastructure and start following standards? đŸ€”

The Plan

We decided on a roadmap to guide us toward a stable, automated future:

  • Become 100% GitOps Compliant.
  • Become Cloud Agnostic.
  • Maintain Infrastructure with a well-defined state and only one source of truth.
  • Make Application configuration part of provisioning, with provisions to change or update them using the same process.
  • Make monitoring part of provisioning.
  • Implement Version Certifications.
  • Eliminate most manual tasks and automate them.

The Framework

We developed a framework with the following tools to achieve our goals:

Framework Diagram

Let me explain the role of each tool:

  • Terraform: We used Terraform for cloud provisioning with both Amazon and Azure. We adopted the modules mechanism where a component will call multiple modules based on the version.
  • HashiCorp Vault: Used to store all the credentials required for provisioning, such as Cloud credentials and Git credentials.
  • Jenkins: Used to run the provisioning, modification, updates, and destruction pipelines.
  • Golang: Used to write a custom tool to knit all these technologies together.
  • GitHub: Used as our absolute source of truth.

The Architecture

Here is how the whole flow looks, from start to end:

GitOps Flow Diagram

With this automation, we were able to accomplish the following:

  1. GitHub is our one source of truth.
  2. A well-defined state for our infrastructure.
  3. Proper lifecycle management of resources.
  4. Security is a first-class citizen.
  5. Little to no manual work related to provisioning or updates.
  6. Proper version control and standardization.

Version Certification

We also spent a good amount of time on Version Certification, where we certify terraform module versions with each other and maintain a version-controlled component package. A component package is simply a combination of specific Terraform modules tested together.

Version Certification

Figure: Component Packaging Structure

The Power of Profiles

To streamline our infrastructure delivery, we introduced the concept of Infrastructure Profiles. Instead of requiring developers to understand every nuance of VPC CIDRs or subnet IDs, we abstracted these into pre-defined archetypes.

  • Standardized Blueprints: We created Profiles based on specific use-cases (e.g., ‘Internal Tooling’, ‘High-Traffic API’, ‘Database Cluster’).
  • Automated Injection: Most Terraform variables are now populated automatically simply by selecting a profile type. This includes complex networking configurations like VPC Type, Subnets, and Load Balancer (ELB/ALB) attributes.
  • Reduced Friction: This abstraction allows our engineering teams to focus on their application logic rather than the plumbing of the cloud provider.
  • Consistency: By using profiles, we ensure that every environment—from Dev to Prod—follows the same structural standards, eliminating the “it worked in Dev” surprises during production rollouts.

The Certification Workflow

This is how the whole Certification Process looks:

graph LR A[Develop Module] --> B{Unit Test} B -->|Fail| A B -->|Pass| C[Tag Release] C --> D[Update Component Package] D --> E{Integration Test} E -->|Fail| D E -->|Pass| F([Certified Version]) style A fill:#1e293b,stroke:#38bdf8,stroke-width:2px,color:#fff style B fill:#1e293b,stroke:#facc15,stroke-width:2px,color:#fff style C fill:#1e293b,stroke:#818cf8,stroke-width:2px,color:#fff style D fill:#1e293b,stroke:#c084fc,stroke-width:2px,color:#fff style E fill:#1e293b,stroke:#facc15,stroke-width:2px,color:#fff style F fill:#1e293b,stroke:#34d399,stroke-width:2px,color:#fff

Conclusion

Overall, we were able to achieve the goals we set out for. Now, all our deployments are initiated by one single git commit. Further modifications, updates, and destruction also follow the same controlled flow. We are now far more confident in our infrastructure. Every change we introduce is thoroughly tested. No more dealing with N number of manual configuration files or scripts.

This process also made our lives very easy for cost optimization and resource lifecycle management.

Rollbacks became straightforward as each resource has a build tag and git commit tag attached to it, allowing us to rollback whenever necessary.

Success

Maybe what we did isn’t extraordinary compared to what the industry giants or the community are already doing. However, when I look back at where we started, we have come a long way, and that gives us a great sense of accomplishment. đŸ€©

Hope this gives you the motivation to boost automation and reduce engineering toil. We can truly achieve great things with simplicity and the tools already available to us.

Happy Deployments!!

36.5°C
CORE TEMPERATURE

KERNEL PANIC

Critical system failure. All Gophers have escaped.

Rebooting universe in 5...

Error: PEBKAC_EXCEPTION
Address: 0xDEADBEEF