A concise, hands-on DevOps refresher with AWS labs plus interview-ready study notes.
- Start with the AWS labs in order to build a working staging stack. See
aws-labs/README.md
for the suggested sequence and validation scripts. - Use the docs for quick references and deeper dives. See
docs/README.md
for an index anddocs/overview.md
for a one-page cheat sheet per lab. - New to the terms? See
docs/glossary.md
for a repo-specific glossary of acronyms and concepts. - Keep labs focused: apply one lab at a time, validate, then proceed. Use the included teardown checklists when cleaning up.
This repo includes the real demo application as a Git submodule at demo-node-app
.
- CI/CD: App deployments build from the
demo-node-app
repository directly (not via submodule). Infra labs do not require submodule init. - Local dev: If you want to run or edit the app from this repo, initialize submodules:
git submodule update --init --recursive
To bump the submodule pointer after updating the app in its own repo:
cd demo-node-app && git pull origin main && cd -
git add demo-node-app && git commit -m "chore: bump demo app submodule" && git push
See docs/submodules.md
for a practical guide.
- State backend → VPC → Endpoints → ECR → IAM/SGs → S3 → RDS → Redis → ALB → ECS Cluster → ECS Service → CI/CD → Observability → (optionally) EKS.
- Each lab exports outputs consumed by the next (for example, VPC subnets → ALB/ECS; RDS endpoint → Parameter Store → app env).
- The lab overview and flow:
aws-labs/README.md
.
- AZ mapping: use two AZs consistently (e.g.,
ap-southeast-2a
,ap-southeast-2b
). The VPC lab pins these via variables so subnet indexes remain deterministic. - NAT costs: one NAT Gateway per AZ is ideal for HA; the labs default to a single NAT in staging to reduce cost. VPC Endpoints help cut NAT egress for AWS APIs.
- SES sandbox: new accounts and regions start in sandbox. You must verify identities and request production access before sending to unverified recipients. Plan for DNS propagation (SPF/DKIM) delays.
Architecture Decisions
docs/decisions/ADR-000-environments.md
— single staging environment defaultsdocs/decisions/ADR-001-alb-tls-termination.md
— TLS at ALB, redirects, DNS/ACMdocs/decisions/ADR-002-secrets-and-config.md
— secrets vs. config strategydocs/decisions/ADR-003-security-groups.md
— SG structure and rationale
- Kubernetes assets live under
kubernetes/
: aws-labs/kubernetes/helm/
for Helm values and helpersaws-labs/kubernetes/manifests/
for raw manifestskubernetes/policies/
for IAM policies used by controllers
- YAML extension: use
.yml
across the repo for consistency, except where upstream tools require specific names: - Helm chart files must be
Chart.yaml
. For values, we standardize onvalues.yml
and always pass it explicitly with-f
for consistency.- Other YAML files (templates, manifests, values you pass with
-f
) may use.yml
.
- Other YAML files (templates, manifests, values you pass with
- For each problem solved:
- Code solutions in Ruby, Go, Python, and TypeScript (to compare paradigms: Ruby for elegance, Go for concurrency, Python for simplicity, TypeScript for modern web development).
- Add time and space complexity analysis (Big O notation, with explanations of bottlenecks).
- Write a short "pattern takeaway" + real-world application (e.g., how it applies to DevOps tools like caching in Kubernetes or load balancing algorithms).
- Store in
leetcode/<category>/<problem>.md
with code snippets, test cases, and edge cases. - Track progress in a Git repo with branches per category. Use tools like LeetCode CLI or VS Code's LeetCode extension for automation. Aim for 5-10 problems/week, reviewing patterns weekly.
- Deliverables per Problem: Solution code, complexity, takeaway, application example, 3-5 test cases (including failures), and optimizations.
Original problems plus more: Focus on efficiency in large datasets (e.g., logs in ELK stack).
-
Two Pointers:
- Two-Sum: Given array and target, find indices summing to target.
- Python Solution:
def twoSum(nums, target): for i in range(len(nums)): for j in range(i+1, len(nums)): if nums[i] + nums[j] == target: return [i,j]
(O(n^2) naive; optimize to O(n) with hashmap). - Complexity: Time O(n), Space O(n).
- Pattern Takeaway: Pointers for sorted arrays; hash for unsorted.
- Application: In DevOps, used in log analysis tools (e.g., summing resource usage in Prometheus queries) or detecting pairs in monitoring alerts.
- Python Solution:
- Three-Sum: Find triplets summing to zero.
- Add sort + two pointers: O(n^2) time.
- Application: Resource allocation in autoscaling (e.g., balancing CPU across three pods).
- Container with Most Water: Maximize area between heights.
- Two pointers from ends: O(n) time.
- Application: Optimizing storage in S3 buckets by maximizing "capacity" based on object sizes.
- Trapping Rain Water (two pointers or stack). Application: Modeling resource leaks in memory usage graphs.
- Remove Duplicates from Sorted Array. Application: Deduplicating logs in Splunk or ELK.
- Two-Sum: Given array and target, find indices summing to target.
-
Sliding Window:
- Longest Substring Without Repeats.
- Python: Use set and pointers; O(n) time.
- Application: Session management in web apps (e.g., unique user IDs in Redis sliding windows for rate limiting).
- Minimum Window Substring.
- Counter + two pointers.
- Application: Searching substrings in log files for error patterns in CloudWatch.
- Sliding Window Maximum (deque). Application: Peak load detection in time-series metrics (e.g., CPU spikes in Grafana).
- Longest Repeating Character Replacement. Application: Handling noisy data in ML pipelines for anomaly detection.
- Longest Substring Without Repeats.
-
Prefix Sum:
- Subarray Sum Equals K.
- Hashmap of prefix sums: O(n) time.
- Application: Cumulative cost tracking in AWS Billing (e.g., subarray of daily spends equaling budget).
- Maximum Subarray (Kadane's).
- O(n) time.
- Application: Identifying peak performance periods in application metrics.
- Range Sum Query (prefix array or Fenwick Tree). Application: Querying summed logs over time in DynamoDB.
- Subarray Sum Equals K.
-
Interval Problems:
- Merge Intervals.
- Sort and merge: O(n log n).
- Application: Merging downtime intervals in incident management (e.g., in PagerDuty).
- Insert Interval.
- Application: Adding new maintenance windows to schedules.
- Non-Overlapping Intervals. Application: Scheduling CI/CD jobs without overlaps.
- Meeting Rooms II (priority queue). Application: Resource booking in Kubernetes (e.g., pod scheduling).
- Merge Intervals.
- Group Anagrams.
- Sort keys or counter: O(n k log k) where k is word length.
- Application: Grouping similar error logs in ELK for pattern recognition.
- Subarray Sum Problems (as above).
- LRU Cache Implementation.
- Doubly linked list + hashmap: O(1) get/put.
- Application: In AWS ElastiCache (Redis), used for caching frequently accessed data like user sessions in e-commerce apps—reduces DB load by evicting least recently used items.
- Where Applied: Web apps (e.g., Netflix recommendation cache), databases (query results), CI/CD (artifact caching).
- LFU Cache. Application: Frequency-based caching in CDNs like CloudFront.
- Valid Sudoku (hash sets). Application: Validating configs in IaC (e.g., unique IPs in Terraform).
- Longest Consecutive Sequence. Application: Detecting sequence gaps in log timestamps for outage detection.
- Reverse a Linked List.
- Iterative or recursive: O(n) time.
- Application: Reversing audit logs for recent-first display in dashboards.
- Detect/Remove Cycle (Floyd's Tortoise and Hare).
- Application: Detecting infinite loops in workflows (e.g., Step Functions cycles).
- Merge Two Sorted Lists.
- Application: Merging sorted metrics from multiple sources in Prometheus.
- Merge K Sorted Lists (heap).
- O(n log k).
- Application: Aggregating logs from k microservices.
- Copy List with Random Pointer.
- Hashmap or interleave: O(n).
- Application: Deep copying configs with references in GitOps.
- Add Two Numbers (as lists). Application: Big integer ops in crypto (e.g., KMS key management).
- Flatten Multilevel Doubly Linked List. Application: Nested configs in Helm charts.
- Rotate List. Application: Rotating access keys in IAM.
- Min Stack (two stacks).
- O(1) operations.
- Application: Tracking min resource usage in real-time monitoring stacks.
- Next Greater Element (monotonic stack).
- Application: Predicting next high-load event in autoscaling.
- Largest Rectangle in Histogram.
- Stack for bars: O(n).
- Application: Visualizing storage usage histograms in dashboards.
- Daily Temperatures.
- Application: Time-series forecasting in CloudWatch.
- Valid Parentheses.
- Stack matching.
- Application: Validating JSON/YAML configs in IaC.
- Implement Queue using Stacks. Application: FIFO in message queues like SQS.
- Basic Calculator (stack for ops). Application: Evaluating expressions in monitoring queries.
- Asteroid Collision. Application: Simulating resource conflicts in simulations.
- DFS and BFS Traversals.
- Recursive/iterative: O(n).
- Application: Traversing dependency graphs in CI/CD pipelines (DFS for depth-first builds).
- Binary Search Tree Validation.
- Inorder traversal.
- Application: Validating sorted indexes in databases like DynamoDB GSIs.
- Lowest Common Ancestor.
- Application: Finding common ancestors in org charts or VPC peering hierarchies.
- Level Order Traversal (queue).
- Application: Layered processing in ML models or network topologies.
- Serialize/Deserialize Binary Tree.
- Preorder + markers.
- Application: Storing tree structures in S3 for backups.
- Topological Sort (Course Schedule).
- Kahn's or DFS: O(V+E).
- Application: Dependency resolution in Terraform applies or Kubernetes manifests.
- Shortest Path: BFS (unweighted), Dijkstra (heap for weighted).
- Application: Network routing in VPCs or shortest path to replicas in RDS.
- Union-Find: Connected Components, Kruskal MST.
- Path compression + union by rank: near O(1).
- Application: Detecting connected clusters in EKS nodes or merging shards in databases.
- Invert Binary Tree. Application: Mirroring data structures for backups.
- Diameter of Binary Tree. Application: Max distance in graph networks (e.g., latency in multi-region setups).
- Number of Islands (DFS/BFS). Application: Identifying isolated subnets in VPCs.
- Word Ladder (BFS). Application: Pathfinding in config transformations.
- Clone Graph. Application: Duplicating infrastructure graphs in DR planning.
- Fibonacci Variations (memoization, tabulation).
- O(n) time.
- Application: Recursive resource calculations in budgeting tools.
- Climbing Stairs.
- Application: Ways to scale resources (e.g., steps as instance sizes).
- Coin Change (Min Coins and Combinations).
- Unbounded knapsack.
- Application: Optimizing costs in AWS (min "coins" for budget).
- Longest Increasing Subsequence.
- O(n log n) with patience sorting.
- Application: Sequence of version upgrades without breaks.
- Longest Common Subsequence.
- Application: Diffing configs in Git.
- Palindromic Substrings.
- Expand around center.
- Application: Detecting symmetric patterns in logs.
- Edit Distance.
- Application: Fuzzy matching in search autocompletes.
- Word Break.
- Application: Parsing commands in CLI tools.
- Knapsack Variations (0/1, Unbounded).
- Application: Resource allocation (e.g., packing containers into EC2 instances).
- House Robber. Application: Non-adjacent resource selection (e.g., avoiding adjacent AZs for HA).
- Unique Paths (grid DP). Application: Path counting in maze-like networks.
- Burst Balloons (interval DP). Application: Optimizing burstable instances in EC2.
- Matrix Chain Multiplication. Application: Optimal query ordering in DBs.
- Binary Search Variations (first/last occurrence).
- O(log n).
- Application: Searching logs in S3 by timestamp.
- Search in Rotated Array.
- Application: Searching circular buffers in queues.
- Median of Two Sorted Arrays.
- O(log (m+n)).
- Application: Median latency in merged metrics.
- Kth Largest Element (Quickselect/Heap).
- Average O(n).
- Application: Top-K alerts in monitoring.
- Merge Sort (divide-conquer). Application: Sorting large datasets in Spark on EMR.
- Heap Sort. Application: Priority queues in task scheduling.
- Find Peak Element. Application: Finding local maxima in performance graphs.
- Search a 2D Matrix. Application: Querying grid-based data like heatmaps.
- Implement Trie (Prefix Tree).
- Insert/search: O(word length).
- Application: Autocomplete in search bars (e.g., Route 53 domain suggestions) or routing tables.
- Word Search (Backtracking).
- Application: Finding patterns in config files.
- Regular Expression Matching (DP).
- Application: Log parsing in Fluentd or Logstash.
- Sudoku Solver (Backtracking).
- Application: Constraint satisfaction in scheduling (e.g., pod placement in K8s).
- N-Queens. Application: Placement optimization without conflicts.
- Wildcard Matching. Application: Glob patterns in S3 access policies.
- Sliding Puzzle. Application: State space search in chaos engineering.
- Alien Dictionary (topo sort). Application: Ordering dependencies in monorepos.
- LFU Cache (as above).
- Each scenario gets a Markdown doc in
system-design/<scenario>/
. - Include:
- Assumptions: Traffic (e.g., 1M RPS), scale (global vs regional), SLAs (99.99% uptime), constraints (budget, compliance like GDPR).
- Architecture Diagram: Mermaid + ASCII alternatives for text.
- Component Choices and Tradeoffs: Justify (e.g., SQS vs Kafka: SQS for simplicity, Kafka for high throughput).
- Risks and Mitigations: e.g., Single point of failure → redundancy.
- Cost estimates (using AWS Calculator), performance metrics (latency targets), security considerations (e.g., zero trust), and deployment strategy (blue/green).
- Tools: Use Lucidchart or Draw.io for diagrams; practice verbalizing designs for interviews.
- Load Balancing: L4 (NLB: TCP/UDP, low latency for gaming/VoIP) vs L7 (ALB: HTTP routing for microservices, integrates WAF for security). Application: ALB in e-commerce for path-based routing to carts/checkout; NLB in IoT for UDP sensor data.
- Caching Strategies: Write-through (immediate consistency, high writes), write-back (low latency, risk of loss), write-around (cache reads only), TTL (expiration). Application: ElastiCache in Netflix for video metadata—reduces DB hits, improves load times; tradeoffs: staleness vs freshness.
- Message Queues: SQS (simple, at-least-once), Kafka (partitioned, high throughput, replay), RabbitMQ (AMQP, routing). Application: SQS in order processing (e.g., Amazon fulfillment); Kafka in real-time analytics (e.g., fraud detection streams).
- Database Scaling: Sharding (horizontal by key, e.g., user ID), replication (master-slave for reads), indexing (B-trees for queries). Application: DynamoDB sharding for social media feeds; RDS replication for read-heavy reports.
- Storage Design: Object (S3: unstructured, scalable), block (EBS: VM disks, persistent), distributed file (EFS: shared access). Application: S3 for logs in compliance audits; EBS for DB volumes in EC2.
- API Design: REST (stateless, HTTP verbs), GraphQL (client-driven queries), gRPC (binary, streaming). Application: REST for public APIs (e.g., Stripe payments); GraphQL for mobile apps (e.g., Instagram feeds to reduce overfetching).
- Authentication & Authorization: SSO (single sign-on via Okta), OIDC (OpenID Connect for tokens), SAML (enterprise federation). Application: OIDC in EKS for pod auth; concepts: JWT tokens for microservices.
- TLS and Certificate Management: Encrypt in-transit; ACM for auto-renewal. Application: Securing ALB in fintech apps to prevent MITM.
- Secrets Management: Secrets Manager (rotation), SSM Parameter Store (cheaper, versioned). Application: Rotating DB creds in Lambda without downtime.
- Multi-Account AWS Organization Design: OUs for envs, SCPs for policies. Application: Separate prod/dev to isolate breaches; central billing for cost allocation.
- Multi-Region and DR Planning: Active-active for zero RTO. Application: Global e-commerce with Route 53 failover.
- VMware & Virtualization Basics: Concepts: Hypervisors (Type 1 bare-metal like ESXi), VMs vs containers. Trivia: vSphere for on-prem; migration to AWS via VMware Cloud. Application: Hybrid cloud setups.
- Serverless Architectures: Lambda + API Gateway. Tradeoffs: Cold starts vs scalability.
- Edge Computing: CloudFront Functions, Lambda@Edge. Application: Personalization at edge for low latency.
- Zero Trust: BeyondCorp model, verify every request. Application: In VPCs with PrivateLink.
- URL Shortener: DB for mappings (DynamoDB), caching (Redis), rate limiting. Application: TinyURL-like for marketing; risks: collisions mitigated by hashing.
- Twitter Feed / Facebook News Feed: Fanout on write/read, timelines in Cassandra. Application: Social platforms; tradeoffs: push (real-time) vs pull (scalability).
- WhatsApp / Slack Real-Time Messaging: WebSockets (ALB), pub/sub (SNS). Application: Chat apps; multi-device sync with DynamoDB.
- YouTube / Netflix Video Streaming with CDN: S3 + CloudFront, adaptive bitrate. Application: Media; tradeoffs: cost vs quality.
- Rate Limiter: Token/leaky bucket in Redis. Application: API protection in fintech.
- Search Autocomplete: Trie in Elasticsearch. Application: E-commerce search.
- Payment System: Idempotency keys, sagas for consistency. Application: Stripe-like; retries with exponential backoff.
- Metrics and Monitoring Pipeline: Prometheus + Grafana. Application: Dashboards for ops teams.
- CI/CD Pipeline at Scale: CodePipeline with parallelism. Application: Large orgs; matrix builds.
- Backup & Recovery Workflows: RPO/RTO defined. Application: Compliance in healthcare.
- AI/ML: Spam Detection or Vector Search with RAG: SageMaker + Pinecone. Application: Email filters; RAG for chatbots querying docs.
- Ride-Sharing System (Uber-like):** Geospatial DB (Aurora), matching algorithms. Tradeoffs: Latency vs accuracy.
- E-Commerce Cart System:** Session in Redis, transactions in RDS. Application: High concurrency during sales.
- Logging Aggregation:** Fluentd to ELK. Application: Centralized ops.
- IoT Device Management:** Greengrass for edge, IoT Core. Application: Smart homes.
- Blockchain Integration:** Managed Blockchain for supply chain. Tradeoffs: Immutability vs speed.
For a simple web app:
graph TD
User --> CF[CloudFront CDN]
CF --> LB[ALB/NLB]
LB --> ASG[Auto Scaling Group]
ASG --> EC2[EC2 Instances] & ECS[ECS Fargate]
EC2 --> RDS[(RDS Multi-AZ)]
ECS --> Dynamo[(DynamoDB)]
RDS --> Cache[ElastiCache Redis]
All --> CW[CloudWatch Monitoring]
CW --> SNS[SNS Alerts]
Tradeoffs: CF for global cache vs direct LB for low latency.
- CAP Theorem: Consistency (e.g., RDS) vs Availability (DynamoDB eventual). Application: Banking needs CP, social media AP.
- Strong vs Eventual Consistency: Strong for transactions, eventual for reads.
- SQL vs NoSQL: SQL for joins (RDS), NoSQL for scale (DynamoDB).
- Push vs Pull Models: Push for notifications (SNS), pull for queues (SQS).
- Batching vs Streaming: Batching for efficiency (S3 uploads), streaming for real-time (Kinesis).
- Monolith vs Microservices: Monolith faster dev, microservices scalable but complex.
- Sync vs Async: Sync for simple APIs, async for long-running (Lambda events).
- Each lab in
aws-labs/<lab-name>/
. - Deliverables:
README.md
with Objective, Prerequisites (e.g., AWS CLI setup), Steps (numbered, with commands), Expected Outcome, Cleanup (to avoid costs), Cost Estimate.- Terraform Templates: Modules for reusability.
- Screenshots/CLI Outputs: Use AWS Console or
aws
commands. - Notes on Failures: What broke (e.g., IAM permissions), how fixed.
- Video recordings (optional), integration tests (e.g., with Boto3), and multi-region variants.
- Tools: AWS Free Tier, Terraform Cloud for state, Git for versioned labs.
- EC2: Launch templates + ASG, web app behind ALB/NLB.
- Application: Hosting a blog; ASG scales on CPU >70%.
- Terraform:
resource "aws_autoscaling_group" {}
. - Failover: Test instance termination.
- ECS Fargate: Containerized app behind ALB, scale, CloudWatch logs.
- Application: Microservice API; integrates with ECR.
- ECS EC2 + Capacity Providers: EC2 hosts, scaling.
- Tradeoff: Cheaper than Fargate for steady loads.
- EKS: Deployment, Service, Ingress, HPA.
- Application: Kubernetes app; use eksctl for setup.
- Lambda: Triggers from S3, API Gateway, DynamoDB streams.
- Application: Image resize on S3 upload; cold start mitigation with provisioned concurrency.
- Step Functions: Orchestrate Lambda workflow.
- Application: ETL pipeline; error handling with retries.
- Batch: Job queues for ML training. Application: Data processing.
- App Runner: Serverless containers. Application: Quick web apps.
- VPC: Custom with subnets, NAT, IGW.
- Application: Isolated envs; test ping between public/private.
- Security Groups vs NACLs: Block/allow, test.
- Application: Firewalling; SGs for instances, NACLs for subnets.
- PrivateLink, VPC Peering, Transit Gateway: Connectivity.
- Application: Multi-account access without internet.
- IAM: Policies, roles, boundaries.
- Application: Least privilege for CI/CD.
- KMS: Encrypt/decrypt.
- Application: Data at rest in S3.
- Secrets Manager vs SSM:
- Secrets Manager: Rotation for DB creds; application: Auto-rotate every 30 days.
- SSM: Cheaper for configs.
- Route 53: Routing types.
- Application: Failover for DR.
- Certificate Manager: Lifecycle.
- Application: HTTPS for ALB.
- GuardDuty, SecurityHub, Inspector: Concepts + trials.
- GuardDuty: ML threat detection; application: Alert on crypto mining.
- SecurityHub: Compliance hub.
- Inspector: Vulnerability scans on EC2.
- WAF: Rules for SQL injection. Application: Web app protection.
- Macie: Data classification in S3. Application: PII detection.
- S3: Versioning, lifecycles, signed URLs, replication.
- Application: Static sites; lifecycle to Glacier for archives.
- RDS: Multi-AZ, read replicas.
- Application: E-commerce DB; failover test <1min RTO.
- DynamoDB: GSIs, LSIs, streams.
- Application: User profiles; streams to Lambda for real-time updates.
- ElastiCache: Redis failover.
- Application: Session store; test master failover.
- Backup & Restore: Snapshots.
- Application: Restore RDS to point-in-time.
- Aurora: Serverless SQL. Tradeoff: Auto-scaling vs cost.
- Neptune: Graph DB. Application: Social networks.
- Timestream: Time-series. Application: IoT metrics.
- CloudWatch: Logs, metrics, dashboards, alarms.
- Application: CPU alarm → SNS → Lambda auto-remedy.
- CloudTrail: IAM events.
- Application: Audit trails for compliance.
- X-Ray: Trace apps.
- Application: Latency bottlenecks in microservices.
- External Monitoring: Pingdom.
- Application: Uptime checks.
- Prometheus + Grafana on EKS. Application: Custom metrics.
- OpenTelemetry: Distributed tracing. Application: Multi-service apps.
- CodeBuild + CodePipeline: ECS deploy.
- Application: Automated builds from GitHub.
- GitHub Actions → EKS: With kubectl/Helm.
- Application: Deploy charts.
- Blue/Green and Canary: Demos.
- Application: Zero-downtime updates.
- Terraform Basics: For above.
- Policy-as-Code: OPA/Conftest.
- Application: Validate no public buckets.
- ArgoCD for GitOps. Application: Declarative K8s deploys.
- Jenkins on EC2. Tradeoff: Self-managed vs CodePipeline.
Note: The primary Node.js demo app is implemented externally at https://github.com/loftwah/demo-node-app. On startup, it runs a self-test that exercises CRUD against S3, Postgres, and Redis and logs results to STDOUT. See docs/demo-apps.md
for requirements and conventions.
- Each in
aws-labs/demo-apps/<name>/
. - Includes
README.md
, Terraform, Dockerfile, app code (Python/Go/Ruby), tests (unit/integration), and security scans (e.g., Trivy). - Helm charts for K8s, cost optimization notes, and multi-env configs.
-
Rails/Go/Python API → ECS Fargate:
- ECR push, ALB service, ASG, CodePipeline CI/CD.
- Application: TODO API; scales on requests.
-
Same App → EKS:
- Deployment/Service/Ingress, ConfigMaps/Secrets, HPA, Prometheus.
- Application: Add auth; monitor pods.
-
Extend with RDS + ElastiCache:
- Connect Postgres/Redis, failover tests.
- Application: Caching queries for performance.
-
CI/CD for ECS/EKS:
- CodePipeline for ECS, GitHub Actions for EKS, blue/green/canary.
- Application: Versioned deploys.
-
Monitoring and Security:
- CloudWatch/SNS, KMS encryption, Secrets rotation, GuardDuty.
- Application: Alert on anomalies.
-
AI/ML Demo:
- Spam detection with Lambda + OpenAI API or SageMaker.
- Optional RAG: DynamoDB + embeddings for doc search.
- Application: Chat app filter.
-
Serverless Demo: API Gateway + Lambda + DynamoDB.
- Application: Event-driven.
-
IoT Demo: IoT Core + Lambda.
- Application: Device telemetry.
- Deliverables: Notes in
extras/linux-networking.md
, test commands, scripts. - netstat, lsof, tcpdump, strace:
- netstat: View ports; application: Check ALB listeners.
- lsof: Open files; debug leaks.
- tcpdump: Capture packets; application: Analyze VPC traffic.
- strace: Syscalls; debug app crashes.
- Debugging CPU/Memory/IO: top, vmstat, iostat. Application: EC2 optimization.
- Packet Analysis: tcpdump + Wireshark. Application: Troubleshoot TLS handshakes.
- VMware/Virtualization Trivia: vMotion for live migration; application: On-prem to AWS lift-shift.
- Networking Trivia: Subnets (CIDR calc), VLANs (segmentation), OSPF/BGP (routing). Application: VPC design.
- iptables/nftables: Firewalls. Application: Custom SGs.
- ss: Modern netstat. Application: Socket stats.
- Deliverables: Repo with demo branches.
- Rebase, Cherry-Pick, Bisect: Rebase for clean history; cherry-pick fixes; bisect bugs.
- Application: Feature branches in CI/CD.
- Submodules, Hooks: Submodules for deps; hooks for linting.
- Reflog and Recovery: Restore lost commits.
- Rewriting History: filter-branch/BFG for sensitive data removal.
- Advanced GitHub Actions: Matrix, reusables. Application: Multi-OS builds.
- Monorepo Strategies: Sparse checkout, lerna. Application: Large teams.
- GitOps Concepts: Flux/ArgoCD for declarative infra.
- Git LFS: Large files. Application: Models in ML.
- Git Worktrees: Parallel branches.
- Deliverables: Writeups in
extras/resilience.md
. - Chaos Testing: Kill pods (Chaos Monkey). Application: Test ASG recovery.
- DR Strategy Document: RTO/RPO, multi-region.
- Backup/Restore Workflow Test: RDS snapshots.
- Compliance Checklist: CIS Benchmarks, scans with ScoutSuite.
- Cross-Region Failover Drills: Route 53 switch.
- Load Testing: Locust/JMeter. Application: Simulate Black Friday.
- Incident Response: Post-mortems with blameless culture.
- DevSecOps: Shift-left security; tools: Snyk, Checkov.
- GitOps: ArgoCD workflows.
- Zero Trust: Implementation in AWS (e.g., verified access).
- Edge/ML Integration: Lambda@Edge for personalization.
- Multi-Cloud: Concepts: Terraform for GCP/AWS.
I've expanded this with 50+ new questions, deeper answers, and contextual details (what, why, where applied, tradeoffs, integrations). Grouped similarly, with more subsections.
- ALB: L7, HTTP/HTTPS, path/host routing, WebSockets, WAF. Application: Microservices (e.g., routing /api to backend, /static to S3 in e-commerce). Integrates with ECS/EKS.
- NLB: L4, TCP/UDP/TLS, low latency, client IP preservation. Application: High-throughput apps like gaming servers or DNS.
- Tradeoffs: ALB more features but higher latency; NLB for performance.
- Where Applied: ALB in web tiers, NLB in network appliances.
- AWS Organizations with OUs (e.g., Security, Workloads), SCPs to deny actions. Centralize via Control Tower.
- Application: Enterprises for isolation (e.g., prod account can't delete resources). Benefits: Blast radius control, tag-based billing.
- Integrations: Cross-account roles for CI/CD.
- Block public, policies (deny *), encryption (SSE-KMS), logging (CloudTrail), MFA delete.
- Application: Storing sensitive data like user uploads in healthcare; signed URLs for temp access.
- Tradeoffs: KMS cost vs SSE-S3 simplicity.
- GuardDuty: ML-based threat detection (e.g., unusual API calls). Application: Detect reconnaissance in VPCs.
- SecurityHub: Aggregates findings, benchmarks (CIS). Application: Compliance dashboards.
- Integrations: Lambda for auto-remediation.
- ACM: Auto-issue/renew/validate (DNS/email). Application: HTTPS for CloudFront in global sites.
- Private CA for internal. Tradeoffs: Free public vs paid private.
- Governs multi-account setups with baselines. Application: Landing zones for new orgs; ensures guardrails.
- On-prem AWS hardware. Application: Low-latency in factories (e.g., ML inference).
- Snowball for petabytes offline. Application: Migrating on-prem data to S3 without bandwidth.
- Private connection to AWS. Application: Hybrid cloud for high-bandwidth, low-latency (e.g., finance trading).
count
: Indexed, good for arrays; but re-creates on changes.for_each
: Keyed by map/set, stable updates.- Application:
for_each
for dynamic subnets in VPC module.
- OPA/Conftest in CI. Example: Rego rules to block public ALBs.
- Application: Prevent misconfigs in prod.
- Data sources from Secrets Manager. Application: DB creds; avoid tfvars.
- Tradeoffs: Dynamic fetch vs static (security vs speed).
- Remote state in S3/Dynamo. Application: Re-apply in secondary region with variables.
- Reusable code. Best: Versioned, tested; application: Standard VPC module across teams.
- DynamoDB table. Application: Prevent concurrent applies in teams.
- Remote exec, workspaces. Application: Collaborative IaC.
- Import resources. Tradeoffs: Terraform multi-provider vs AWS-native CFN.
- ECS: Simpler, AWS-integrated (Fargate no servers). Application: Monolithic containers.
- EKS: K8s, ecosystem (Helm, Istio). Application: Complex orchestration.
- Tradeoffs: ECS cheaper ops, EKS portable.
- ECS: CodeDeploy swaps target groups.
- EKS: Argo Rollouts. Application: Web updates without downtime.
- OIDC, secrets, scans (Snyk). Application: From code to prod with approvals.
- Token-based auth to AWS. Application: Secure workflows without keys.
- Package manager for K8s. Application: Deploying apps with charts.
- Service mesh for traffic, security. Application: mTLS, canaries.
- Multi-stage builds, distroless. Application: Reduce attack surface.
- Declarative deploys from Git. Application: Self-healing K8s.
- Vertical (resize), horizontal (replicas). Application: Read-heavy apps like analytics.
- Dynamo: Serverless, flexible schema. Application: High-write apps like gaming scores.
- RDS: ACID, joins. Application: Transactions in banking.
- Automated snapshots, test restores. Application: RPO=1hr for critical data.
- Multi-region replication. Application: Global apps with low latency.
- Data warehouse. Application: OLAP queries on petabytes.
- EFS: Shared filesystem. Application: Multi-EC2 access like CMS.
- Auto-moves objects. Application: Cost-optimize infrequent access.
- SGs: Stateful, allow-only. Application: Instance-level.
- NACLs: Stateless, deny. Application: Subnet-level.
-
Build-time vs runtime config guidance is in
docs/build-vs-runtime-config.md
. -
Practical Docker BuildKit secret examples (npm, Yarn, Vite, Rails/Webpacker, Bundler) are in
docs/build-secrets-examples.md
. -
Secrets Manager Lambda. Application: DB creds, app reloads.
- Shield/WAF, ASG. Application: Protect public APIs.
- Reachability Analyzer, routes. Application: Troubleshoot peering.
- Finds unintended access. Application: Policy tightening.
- Traffic metadata. Application: Security analysis in Splunk.
- Verified access, micro-segmentation. Application: Beyond VPN.
- Isolated compute. Application: Confidential computing for sensitive workloads.
- Rebase: Clean linear. Application: Feature integration.
- Merge: History preserve.
- Reflog. Application: Accidental deletes.
- Bug hunting. Application: Regressions post-merge.
- Sparse, CODEOWNERS. Application: Google-like setups.
- Merge repos. Alternative to submodules.
- Pre-commit linting. Application: Enforce standards.
- Branches for features/releases. Application: Versioned software.
- top, strace, perf. Application: Optimize EC2 apps.
- tcpdump port 443. Application: Debug HTTPS.
- Process: Isolated. Thread: Shared. Application: Multi-threading in Go apps.
- Service manager. Application: Daemon restarts.
- iostat, iotop. Application: Bottleneck detection.
- Container foundations. Application: Docker isolation.
- Token bucket in Redis. Application: API throttling.
- S3 + CloudFront, Dynamo metadata. Application: Dropbox-like.
- RTO/RPO, pilots. Application: Business continuity.
- Collaborative filtering, SageMaker. Application: E-commerce.
- Redis cluster. Tradeoffs: Invalidation strategies.
- SNS/SQS. Application: Push alerts.
- GPT API in Lambda. Application: Filter user content.
- Embeddings store. Application: Semantic search in RAG.
- Query vectors + LLM. Application: Knowledge bases.
- SageMaker endpoints. Application: Inference at scale.
- Fine-tune: Train on data. Prompt: Zero-shot. Tradeoffs: Cost vs speed.