Building a Production-Grade DevOps Platform on AWS (Terraform + EKS + GitOps + Monitoring)
Modern DevOps platforms require far more than simply deploying an application. Production systems demand infrastructure automation, container orchestration, CI/CD pipelines, GitOps deployment, and robust observability.
In this article, I walk through how I built a production-style DevOps platform on AWS using modern cloud-native tooling.
The platform integrates:
• Infrastructure as Code with Terraform
• Kubernetes orchestration using Amazon EKS
• Containerization with Docker
• CI automation with GitHub Actions
• GitOps deployments using ArgoCD
• Observability using Prometheus and Grafana
The project demonstrates a complete end-to-end DevOps workflow, from infrastructure provisioning to monitoring live application metrics.
Project Repositories
The platform is divided into three repositories, each responsible for a different layer of the system.
Infrastructure (Terraform)
https://github.com/rasika-08061998/eks-devops-platform-infra
This repository provisions the AWS infrastructure and Kubernetes cluster.
Application Code
https://github.com/rasika-08061998/three-tier-ai-app
This repository contains the frontend and backend microservices.
GitOps Deployment
https://github.com/rasika-08061998/eks-gitops-deployments
This repository contains Kubernetes manifests used by ArgoCD for deployments.
Separating repositories in this way reflects real production DevOps architecture used by many engineering teams.
Project Architecture Overview
The platform follows a GitOps-based architecture, where infrastructure, application code, and deployment configuration are managed independently.
High-level architecture:
Developer
|
v
GitHub Repositories
|
v
GitHub Actions CI Pipeline
|
v
AWS ECR (Container Registry)
|
v
ArgoCD GitOps Deployment
|
v
AWS EKS Cluster
|
+---- Frontend (React)
+---- Backend (FastAPI)
+---- PostgreSQL
|
v
Prometheus Monitoring
|
v
Grafana Dashboards
This design reflects a modern cloud-native application platform running on Kubernetes.
Infrastructure Layer (Terraform)
Infrastructure is defined using Terraform, which enables Infrastructure as Code.
This ensures that infrastructure is:
• reproducible
• version controlled
• automated
Repository:
https://github.com/rasika-08061998/eks-devops-platform-infra
Infrastructure Components
The Terraform configuration provisions the following AWS resources:
• AWS VPC
• Public and Private Subnets
• Internet Gateway
• NAT Gateway
• Bastion Host
• AWS EKS Cluster
• Managed Node Groups
• IAM Roles
• IRSA configuration
• AWS ECR Repository
This provides the complete foundation for running Kubernetes workloads.
Terraform Folder Structure
terraform
│
├── modules
│ ├── vpc
│ ├── eks
│ ├── bastion
│ └── ecr
│
└── environments
└── dev
Terraform modules allow reusable infrastructure components.
Examples include:
• VPC module
• EKS module
• Bastion host module
• ECR repository module
This modular design follows Terraform best practices used in production environments.
Private Kubernetes Cluster
The EKS cluster is configured as a private cluster.
cluster_endpoint_public_access = false
cluster_endpoint_private_access = true
This means the Kubernetes API server is not publicly exposed to the internet.
Access to the cluster occurs through a bastion host inside the VPC, improving the overall security posture.
Application Layer
Repository:
https://github.com/rasika-08061998/three-tier-ai-app
The platform runs a three-tier application architecture.
Frontend → Backend API → Database
Frontend
The frontend is built using:
React
It communicates with the backend using REST API calls.
Backend
The backend API is implemented using:
Python FastAPI
FastAPI is chosen because it provides:
• high performance
• async support
• automatic API documentation
Example endpoint:
@app.post("/chat")
def chat(request: schemas.MessageRequest):
The backend also exposes a Prometheus metrics endpoint.
/metrics
This allows monitoring tools to collect application metrics.
Database
The platform uses:
PostgreSQL
Postgres runs inside Kubernetes as a StatefulSet with persistent volumes.
This ensures data persistence across pod restarts.
Containerization with Docker
Both frontend and backend services are containerized.
Example backend Dockerfile:
FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Docker images are pushed to:
AWS Elastic Container Registry (ECR)
CI Pipeline (GitHub Actions)
Continuous Integration is implemented using GitHub Actions.
Pipeline workflow:
Push to main
|
v
GitHub Actions
|
+---- Build Docker images
|
+---- Push images to AWS ECR
|
+---- Update GitOps repository
This ensures application images are automatically built and published.
GitOps Deployment with ArgoCD
Repository:
https://github.com/rasika-08061998/eks-gitops-deployments
Deployment is managed using ArgoCD, which follows the GitOps model.
What is GitOps?
GitOps is a deployment model where:
Git repository = source of truth
ArgoCD continuously monitors the Git repository and synchronizes the cluster state.
Benefits include:
• declarative deployments
• version controlled infrastructure
• automated synchronization
• easy rollback
GitOps Repository Structure
eks-gitops-deployments
│
├── frontend
│ ├── deployment.yaml
│ ├── service.yaml
│
├── backend
│ ├── deployment.yaml
│ ├── service.yaml
│ └── servicemonitor.yaml
│
└── postgres
├── statefulset.yaml
└── service.yaml
Each component has dedicated Kubernetes manifests.
Networking and Ingress
Application traffic is routed through an AWS Application Load Balancer.
This is implemented using:
AWS Load Balancer Controller
Routing rules:
/ → frontend
/api → backend
This creates a single entry point to the application.
Monitoring with Prometheus
Observability is implemented using:
kube-prometheus-stack
Prometheus collects metrics from:
• Kubernetes nodes
• Kubernetes pods
• application endpoints
The backend exposes metrics through:
/metrics
Prometheus scrapes this endpoint using a ServiceMonitor resource.
Example resource:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
Grafana Dashboards
Grafana provides visualization for metrics collected by Prometheus.
Example dashboards display:
• API request rate
• requests per endpoint
• HTTP status codes
• Kubernetes CPU and memory usage
This provides real-time observability into application performance.
Alerting
Prometheus alert rules detect abnormal system behavior.
Example alert rule:
High API error rate (5xx responses)
If the error rate exceeds a threshold, an alert is triggered.
Final System Architecture
Developer
|
v
GitHub
|
v
GitHub Actions
|
v
AWS ECR
|
v
ArgoCD
|
v
AWS EKS Cluster
|
+---- React Frontend
+---- FastAPI Backend
+---- PostgreSQL
|
v
Prometheus
|
v
Grafana
Key DevOps Concepts Demonstrated
This project demonstrates several real-world DevOps practices:
• Infrastructure as Code with Terraform
• Containerization with Docker
• CI pipelines using GitHub Actions
• GitOps deployment with ArgoCD
• Kubernetes orchestration using AWS EKS
• Application monitoring using Prometheus
• Observability dashboards with Grafana
Lessons Learned
Building this platform reinforced several important DevOps principles:
1️⃣ Infrastructure must be reproducible.
2️⃣ Git should be the source of truth for deployments.
3️⃣ Observability is critical for production systems.
4️⃣ Kubernetes environments require automation and strong CI/CD practices.
Future Improvements
Possible improvements for this platform include:
• Horizontal Pod Autoscaling
• Slack alert integrations
• Multi-environment deployments (dev / staging / prod)
• Distributed tracing with OpenTelemetry
Conclusion
This project demonstrates how to build a production-style DevOps platform on AWS using modern cloud-native tooling.
By combining Terraform, Kubernetes, GitOps, CI/CD pipelines, and observability tools, we can create scalable infrastructure capable of running real-world applications.
If you are learning DevOps or cloud engineering, building a complete end-to-end platform like this is one of the best ways to understand how production systems operate.
Outputs Screenshots :
About the Author
Rasika Deshmukh is a DevOps and Cloud enthusiast focused on building cloud-native platforms using AWS, Kubernetes, Terraform, Docker, GitHub Actions, and GitOps with ArgoCD. She enjoys working on infrastructure automation, CI/CD pipelines, and observability systems that reflect real-world production environments.
She is currently actively exploring opportunities in DevOps, Cloud Engineering, and Platform Engineering roles.
LinkedIn: https://www.linkedin.com/in/rasika-deshmukh
GitHub: https://github.com/rasika-08061998