🔐 Kubernetes on EKS: Security, Resource Management & Probes — What I Recently Learned (Deep Dive)
Running containers is easy.
Running them securely, efficiently, and reliably on production EKS?
That’s engineering.
Over the past weeks, I’ve been diving deep into:
🔐 Kubernetes security (especially in EKS)
⚙️ Resource management (CPU/Memory control)
🩺 Liveness & Readiness probes (health mechanisms)
This article breaks down the theory, real-world context, and YAML examples you can apply immediately.
☁️ First: What is EKS?
Amazon EKS (Elastic Kubernetes Service) is AWS’s managed Kubernetes service.
👉 AWS manages the control plane (API server, etcd, scheduler)
👉 You manage the worker nodes, workloads, security, and networking
That division of responsibility is critical.
Official Docs:
https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html
🔐 Part 1 — Kubernetes Security in EKS
Security in Kubernetes is layered. Think of it like airport security:
| Layer | What it Protects |
| IAM | Who can access AWS |
| RBAC | Who can access Kubernetes |
| Network Policies | Pod-to-pod traffic |
| Security Groups | Node-level traffic |
| Pod Security | Container privileges |
Let’s go deeper.
1️⃣ IAM + RBAC (Identity & Access Control)
In EKS:
IAM controls access to AWS
RBAC controls access to Kubernetes resources
Example: Allow a user to only view pods.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
Then bind it:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: User
name: jane
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
💡 Key Learning:
Never use cluster-admin in production.
Least privilege is non-negotiable.
2️⃣ Pod Security (Containers Should Not Be Root)
Many containers run as root by default — dangerous.
Secure configuration:
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
This prevents:
Privilege escalation
Container breakout attempts
Filesystem tampering
AWS EKS Security Best Practices:
https://aws.github.io/aws-eks-best-practices/security/docs/
3️⃣ Network Policies (Zero Trust Inside the Cluster)
By default:
👉 Every pod can talk to every other pod.
That’s risky.
Example: Allow traffic only from frontend to backend.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: production
spec:
podSelector:
matchLabels:
app: backend
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
Now backend is protected from everything else.
That’s micro-segmentation.
⚙️ Part 2 — Resource Management (Why Pods Crash Randomly)
If you don’t define resources:
Pods can consume unlimited memory
Nodes can get unstable
The kernel can kill containers (OOMKilled)
That’s when chaos begins.
Requests vs Limits (Critical Concept)
| Field | Meaning |
| requests | Guaranteed minimum |
| limits | Maximum allowed |
Example:
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
Explanation:
Kubernetes scheduler uses requests to place the pod
If memory exceeds limit → container is killed
💡 200m CPU = 0.2 CPU core
💡 256Mi = 256 Megabytes
What Happens Without Limits?
Scenario:
One pod leaks memory
Node memory fills
Linux OOM killer kills random pods
Production outage
This is why resource governance matters.
🩺 Part 3 — Liveness vs Readiness Probes (Critical for Reliability)
This is where many teams make mistakes.
🩺 Liveness Probe
Question:
"Is the application alive?"
If it fails:
👉 Kubernetes restarts the container
Example:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
🚦 Readiness Probe
Question:
"Is the application ready to receive traffic?"
If it fails:
👉 Pod is removed from Service endpoints
👉 No traffic is sent
Example:
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Real Production Example
Imagine:
App starts
Needs 20 seconds to connect to DB
Without readiness probe:
Traffic hits immediately
Users get 500 errors
With readiness probe:
No traffic until DB connection is successful
Zero user-facing error
Startup Probe (Advanced)
Used for slow-starting apps.
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
This prevents liveness probe from killing slow apps.
🔥 Combined Production-Ready Deployment Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: myapp:1.0
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
livenessProbe:
httpGet:
path: /health
port: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080
This setup ensures:
🔐 Secure container
⚙️ Controlled resources
🩺 Self-healing
🚦 Smart traffic routing
🎯 Final Takeaways
What I truly understood:
Kubernetes security is layered — not a single setting
Resource management prevents unpredictable crashes
Probes are not optional — they are production essentials
EKS handles control plane, but YOU secure workloads