Skip to main content
All Posts

AWS Three-Tier Architecture — The Design Decisions That Matter

5 min read
AWSCloudArchitectureDevOpsSecurity

Beyond the diagram: the specific design decisions behind a production-grade AWS three-tier deployment — why internal ALBs, why private subnets for RDS, and what actually validates that your architecture is secure.

Architecture diagrams make three-tier deployments look obvious. A box for the web tier, a box for the application tier, a box for the database. Arrows connecting them. Done.

The actual work is in the decisions behind those boxes — and most tutorials skip over the ones that matter most. This post covers the specific choices I made in the AWS 3-tier deployment I completed for The CloudAdvisory Oy DevOps Micro-Internship, and why I made them.

Decision 1: Two ALBs, Not One

The most important structural decision was using two Application Load Balancers: a public-facing ALB for the web tier, and an internal ALB for the application tier.

Most tutorials show a single ALB in front of the web tier and direct connections from web EC2 to application EC2. This is architecturally wrong for anything that matters.

With a single ALB and direct EC2-to-EC2 connections:

  • Your application tier EC2 instances need Security Group rules allowing inbound traffic from the web tier
  • If the web tier is ever compromised, the attacker has a direct network path to the application tier
  • You have no centralised health checking or traffic management on the internal tier

With an internal ALB:

  • The web tier talks to an ALB endpoint — it never has a direct route to application EC2 instances
  • Health checks happen at the ALB layer — unhealthy app instances are automatically removed from rotation
  • The Security Group on application EC2 only allows traffic from the internal ALB's Security Group
  • Future scaling of the application tier is clean — add instances to the ALB target group, not to every web server's firewall rules

The internal ALB costs nothing extra in a learning environment and models production architecture correctly. Use it.

Decision 2: RDS in a Private Subnet With No Internet Gateway Route

This sounds obvious. It is often not implemented correctly.

"Private subnet" in AWS means different things depending on whether the subnet's route table has a route to an Internet Gateway. A subnet with a 0.0.0.0/0 route to an IGW is a public subnet, regardless of whether the instance in it has a public IP address assigned.

For RDS to be genuinely isolated from the internet:

  1. The subnet must have no route to an Internet Gateway in its route table
  2. The RDS Security Group must allow inbound traffic only from the application tier Security Group, not from a CIDR range
  3. The RDS instance must have no publicly accessible flag set (this is a separate setting from subnet routing)

I validated all three. The test was straightforward: attempt a direct connection to the RDS endpoint from outside the VPC. It should time out — not refuse, but time out, because there is no route to the host.

Decision 3: Security Group Rules by Security Group Reference, Not CIDR

When writing Security Group ingress rules between tiers, you have two options:

Option A: Allow inbound from 10.0.1.0/24 (the web tier subnet CIDR)
Option B: Allow inbound from sg-xxxxxxxx (the web tier Security Group ID)

Option A is fragile. If the web tier ever adds an instance in a different subnet, the rule no longer covers it. If the CIDR changes, the rule breaks.

Option B is declarative and precise. It says "allow traffic from anything that belongs to this Security Group" — which means any EC2 instance that carries that Security Group, regardless of which subnet it's in. It scales automatically as the tier grows.

All ingress rules in this deployment were written as Security Group references, never CIDRs.

Decision 4: systemd for Service Persistence

EC2 instances reboot. In a real environment they get patched, resized, or fail and are replaced by Auto Scaling. Your application services need to come back up automatically.

Both the Next.js frontend and the Node.js API were configured as systemd services:

[Unit]
Description=Next.js Application
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/app
ExecStart=/usr/bin/npm start
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Restart=on-failure means the service restarts automatically if it crashes. After=network.target means it waits for the network to be available before starting. WantedBy=multi-user.target means it starts on every boot.

Without this, every EC2 reboot requires a manual SSH session to restart your application. That is not production behaviour.

The Validation Test

The architecture is not validated until you test its failure modes:

  • Web tier EC2 terminated mid-traffic → ALB health check detects unhealthy target, stops routing to it. If Auto Scaling is configured, replacement launches automatically.
  • Application tier EC2 terminated → Internal ALB detects unhealthy target. Web tier requests return 502 until the app tier recovers.
  • Direct connection to RDS from outside VPC → Connection times out. No route to host.
  • Direct connection to app EC2 bypassing internal ALB → Blocked by Security Group (only the internal ALB's SG is whitelisted).

Each of these tests either passes or reveals a misconfiguration. Run them before you call the deployment done.

What This Architecture Actually Demonstrates

The three-tier pattern is not interesting because it's complex — it's interesting because it's the baseline production pattern for web applications on AWS. Understanding why each element is there, what failure mode it prevents, and how to validate it is what separates an engineer who has deployed this from one who has read about it.