Running containers is easy.
Running them securely, efficiently, and reliably on production EKS?
That’s engineering.

Over the past weeks, I’ve been diving deep into:

🔐 Kubernetes security (especially in EKS)
⚙️ Resource management (CPU/Memory control)
🩺 Liveness & Readiness probes (health mechanisms)

This article breaks down the theory, real-world context, and YAML examples you can apply immediately.

☁️ First: What is EKS?

Amazon EKS (Elastic Kubernetes Service) is AWS’s managed Kubernetes service.

👉 AWS manages the control plane (API server, etcd, scheduler)
👉 You manage the worker nodes, workloads, security, and networking

That division of responsibility is critical.

Official Docs:
https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html

🔐 Part 1 — Kubernetes Security in EKS

Security in Kubernetes is layered. Think of it like airport security:

Layer	What it Protects
IAM	Who can access AWS
RBAC	Who can access Kubernetes
Network Policies	Pod-to-pod traffic
Security Groups	Node-level traffic
Pod Security	Container privileges

Let’s go deeper.

1️⃣ IAM + RBAC (Identity & Access Control)

In EKS:

IAM controls access to AWS
RBAC controls access to Kubernetes resources

Example: Allow a user to only view pods.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

Then bind it:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: production
subjects:
- kind: User
  name: jane
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

💡 Key Learning:
Never use cluster-admin in production.
Least privilege is non-negotiable.

2️⃣ Pod Security (Containers Should Not Be Root)

Many containers run as root by default — dangerous.

Secure configuration:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

This prevents:

Privilege escalation
Container breakout attempts
Filesystem tampering

AWS EKS Security Best Practices:
https://aws.github.io/aws-eks-best-practices/security/docs/

3️⃣ Network Policies (Zero Trust Inside the Cluster)

By default:
👉 Every pod can talk to every other pod.

That’s risky.

Example: Allow traffic only from frontend to backend.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend

Now backend is protected from everything else.

That’s micro-segmentation.

⚙️ Part 2 — Resource Management (Why Pods Crash Randomly)

If you don’t define resources:

Pods can consume unlimited memory
Nodes can get unstable
The kernel can kill containers (OOMKilled)

That’s when chaos begins.

Requests vs Limits (Critical Concept)

Field	Meaning
requests	Guaranteed minimum
limits	Maximum allowed

Example:

resources:
  requests:
    memory: "256Mi"
    cpu: "200m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Explanation:

Kubernetes scheduler uses requests to place the pod
If memory exceeds limit → container is killed

💡 200m CPU = 0.2 CPU core
💡 256Mi = 256 Megabytes

What Happens Without Limits?

Scenario:

One pod leaks memory
Node memory fills
Linux OOM killer kills random pods
Production outage

This is why resource governance matters.

🩺 Part 3 — Liveness vs Readiness Probes (Critical for Reliability)

This is where many teams make mistakes.

🩺 Liveness Probe

Question:

"Is the application alive?"

If it fails:
👉 Kubernetes restarts the container

Example:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

🚦 Readiness Probe

Question:

"Is the application ready to receive traffic?"

If it fails:
👉 Pod is removed from Service endpoints
👉 No traffic is sent

Example:

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Real Production Example

Imagine:

App starts
Needs 20 seconds to connect to DB
Without readiness probe:
- Traffic hits immediately
- Users get 500 errors

With readiness probe:

No traffic until DB connection is successful
Zero user-facing error

Startup Probe (Advanced)

Used for slow-starting apps.

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

This prevents liveness probe from killing slow apps.

🔥 Combined Production-Ready Deployment Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: myapp:1.0
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        securityContext:
          runAsNonRoot: true
          allowPrivilegeEscalation: false
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080

This setup ensures:

🔐 Secure container
⚙️ Controlled resources
🩺 Self-healing
🚦 Smart traffic routing

🎯 Final Takeaways

What I truly understood:

Kubernetes security is layered — not a single setting
Resource management prevents unpredictable crashes
Probes are not optional — they are production essentials
EKS handles control plane, but YOU secure workloads

🔐 Kubernetes on EKS: Security, Resource Management & Probes — What I Recently Learned (Deep Dive)

☁️ First: What is EKS?

🔐 Part 1 — Kubernetes Security in EKS

1️⃣ IAM + RBAC (Identity & Access Control)

2️⃣ Pod Security (Containers Should Not Be Root)

3️⃣ Network Policies (Zero Trust Inside the Cluster)

⚙️ Part 2 — Resource Management (Why Pods Crash Randomly)

Requests vs Limits (Critical Concept)

What Happens Without Limits?

🩺 Part 3 — Liveness vs Readiness Probes (Critical for Reliability)

🩺 Liveness Probe

🚦 Readiness Probe

Real Production Example

Startup Probe (Advanced)

🔥 Combined Production-Ready Deployment Example

🎯 Final Takeaways

Comments

More from this blog

🚨 10 Real Kubernetes + AWS Production Scenarios Every DevOps Engineer Must Know

🔐 Kubernetes Authentication & Service Accounts (EKS Focus)

Building a Production-Grade DevOps Platform on AWS (Terraform + EKS + GitOps + Monitoring)

Building a Production-Grade Kubernetes Platform on AWS with Terraform

🔄 Kubernetes Rolling Updates & Rollbacks Explained (Zero-Downtime Deployments Made Simple)

Command Palette

☁️ First: What is EKS?

🔐 Part 1 — Kubernetes Security in EKS

1️⃣ IAM + RBAC (Identity & Access Control)

2️⃣ Pod Security (Containers Should Not Be Root)

3️⃣ Network Policies (Zero Trust Inside the Cluster)

⚙️ Part 2 — Resource Management (Why Pods Crash Randomly)

Requests vs Limits (Critical Concept)

What Happens Without Limits?

🩺 Part 3 — Liveness vs Readiness Probes (Critical for Reliability)

🩺 Liveness Probe

🚦 Readiness Probe

Real Production Example

Startup Probe (Advanced)

🔥 Combined Production-Ready Deployment Example

🎯 Final Takeaways

Comments

More from this blog