Skip to main content

Command Palette

Search for a command to run...

Phase 2 — Amazon EKS Platform with Terraform

Published
5 min read

This document covers the design and implementation of a production-grade Amazon EKS (Elastic Kubernetes Service) platform using Terraform. It explains the architecture, IAM design, node groups, networking integration, and real-world troubleshooting encountered during cluster creation. This is suitable for Hashnode, Medium, and DevOps portfolio documentation.


1. Overview of Phase 2

Objective

The goal of Phase 2 was to provision a production-ready Kubernetes platform on AWS using Amazon EKS, integrated with the custom VPC built in Phase 1.

This phase focused on:

  • Kubernetes control plane

  • Managed worker nodes

  • IAM roles and security

  • Multi-phase Terraform architecture

  • kubectl access and validation


2. Phase-Based Terraform Architecture

Folder Structure

01-infra-terraform/
└── phase-2-eks/
    ├── backend.tf
    ├── providers.tf
    ├── variables.tf
    ├── data.tf
    ├── iam-cluster.tf
    ├── iam-nodes.tf
    ├── eks-cluster.tf
    ├── eks-nodes.tf
    └── outputs.tf

Why Separate Phase for EKS?

Separating EKS into its own Terraform phase provides:

  • Independent Terraform state

  • Safer changes and rollbacks

  • Clear ownership boundaries

  • Enterprise-grade infrastructure lifecycle management

This mirrors how large DevOps and Platform Engineering teams manage infrastructure.


3. Remote Backend for Phase 2

backend.tf

terraform {
  backend "s3" {
    bucket         = "rasika-terraform-state-bucket"
    key            = "phase-2-eks/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}

Key Concepts

  • Separate Terraform state per phase

  • Centralized state storage in S3

  • DynamoDB-based state locking

  • Encrypted state for security


4. Provider Configuration

providers.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

Why This Matters

  • Ensures consistent AWS provider versions

  • Prevents unexpected breaking changes

  • Supports reproducible infrastructure


5. Reusing Phase 1 Network (Data Sources)

Instead of recreating networking, Phase 2 uses Terraform data sources to reference existing resources.

data.tf

# Lookup VPC created in Phase 1
data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["${var.project_name}-vpc"]
  }
}

# Lookup Private Subnets for EKS Nodes
data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

 igtag
  filter {
    name   = "tag:Tier"
    values = ["private"]
  }
}

# Lookup Public Subnets (for ALB later)
data "aws_subnets" "public" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

  filter {
    name   = "tag:Tier"
    values = ["public"]
  }
}

Key Terraform Concepts

  • Cross-phase resource reuse

  • Data sources vs resources

  • Tag-based infrastructure discovery

This is a common enterprise Terraform pattern.


6. IAM Role for EKS Control Plane

iam-cluster.tf

resource "aws_iam_role" "eks_cluster_role" {
  name = "${var.project_name}-eks-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = { Service = "eks.amazonaws.com" }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  role       = aws_iam_role.eks_cluster_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}

Purpose

  • Allows EKS control plane to manage AWS resources

  • Required for creating the Kubernetes API server

  • Enforces least-privilege access


7. IAM Role for EKS Worker Nodes

iam-nodes.tf

resource "aws_iam_role" "eks_node_role" {
  name = "${var.project_name}-eks-node-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = { Service = "ec2.amazonaws.com" }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  role       = aws_iam_role.eks_node_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  role       = aws_iam_role.eks_node_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

resource "aws_iam_role_policy_attachment" "ecr_read_only" {
  role       = aws_iam_role.eks_node_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}

What This Enables

  • Nodes can join the cluster

  • Pods can attach ENIs via VPC CNI

  • Nodes can pull images from ECR

  • Nodes can communicate with control plane

IAM Roles Created :


8. EKS Control Plane Creation

eks-cluster.tf

resource "aws_eks_cluster" "main" {
  name     = var.eks_cluster_name
  role_arn = aws_iam_role.eks_cluster_role.arn

  vpc_config {
    subnet_ids = data.aws_subnets.private.ids
    endpoint_private_access = true
    endpoint_public_access  = true
  }

  enabled_cluster_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler"
  ]
}

Design Decisions

  • Private subnets for worker nodes

  • Public + private API endpoint access

  • Control plane logging for security and audit

Output:


9. Managed Node Group (Worker Nodes)

eks-nodes.tf

resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.project_name}-node-group"
  node_role_arn   = aws_iam_role.eks_node_role.arn
  subnet_ids      = data.aws_subnets.private.ids

  scaling_config {
    desired_size = 2
    max_size     = 3
    min_size     = 1
  }

  instance_types = ["t3.medium"]
  capacity_type  = "ON_DEMAND"

  ami_type = "AL2023_x86_64_STANDARD"
}

Key Design Choices

  • Managed node groups for easier lifecycle

  • Private subnets for security

  • Auto Scaling via desired/min/max

  • Modern Amazon Linux 2023 AMI


10. Real-World Troubleshooting: AL2 to AL2023

Issue Encountered

Initial node group creation failed due to:

  • Kubernetes version > 1.32

  • Amazon Linux 2 (AL2) AMI no longer supported

Resolution

Migrated node group to:

AL2023_x86_64_STANDARD

Why This Matters

This demonstrates:

  • Awareness of Kubernetes version compatibility

  • Ability to troubleshoot AWS/EKS errors

  • Keeping platform components up to date

This is valuable real-world experience.


11. kubectl Configuration & Cluster Validation

kubeconfig Update

aws eks update-kubeconfig \
  --region ap-south-1 \
  --name rasika-eks-cluster

Validation

kubectl get nodes

Result

  • 2 worker nodes in Ready state

  • Kubernetes version v1.34.x

  • Successful control plane + node connectivity

Output:


12. Key Terraform & Platform Concepts Demonstrated

  • Multi-phase Terraform architecture

  • Data sources for cross-phase dependencies

  • IAM role-based security design

  • Managed Kubernetes with EKS

  • Node group lifecycle management

  • Kubernetes version & AMI compatibility

  • Secure private subnet worker nodes


13. Outcome of Phase 2

At the end of Phase 2, the platform includes:

  • Fully functional Amazon EKS cluster

  • Managed EC2 worker nodes

  • Secure networking via private subnets

  • kubectl access from developer workstation

  • Production-grade IAM integration

This forms the foundation for:

  • GitOps with Argo CD

  • CI/CD pipelines

  • Application deployments

  • Ingress and HTTPS access


14. What’s Next (Phase 3)

Phase 3 will introduce:

  • Amazon ECR

  • Docker image builds

  • GitHub Actions CI

  • Automated container pipelines


This Phase 2 demonstrates real-world Kubernetes platform engineering and cloud-native infrastructure best practices.

More from this blog

Rasika DevOps

13 posts