Cloud Architecture Fundamentals
Regions, availability zones, auto-scaling, and load balancers โ the building blocks every cloud architect uses. Here's how to design systems that never go down.
The restaurant that couldn't handle Saturday night
Maria opens a taco restaurant. It's tiny โ one cook, one cash register, ten seats. Monday through Thursday, it's perfect. Then Friday hits. A line wraps around the block. The cook can't keep up. Customers wait 45 minutes, get angry, leave bad reviews, and never come back. Saturday is worse.
Maria has a capacity problem. She has two choices: hire more cooks and expand the kitchen permanently (expensive, and wasteful on slow Tuesdays), or find a way to bring in extra cooks only when the line gets long and send them home when it's quiet.
That second option? That's cloud architecture in a nutshell.
Designing systems in the cloud is exactly like designing a restaurant chain. You need to decide where to put your locations (regions), how to handle rush hour (auto-scaling), who greets customers at the door and seats them at the right table (load balancers), and what happens if the kitchen catches fire (disaster recovery). Every decision you make about your cloud infrastructure maps to a real-world problem that restaurant owners, city planners, and logistics managers have been solving for centuries.
This module gives you the vocabulary and mental models to think like a cloud architect โ even if you never write a line of infrastructure code.
The Well-Architected Framework: five pillars
AWS, Azure, and Google Cloud all publish frameworks for building systems that don't fall over. AWS calls theirs the Well-Architected Framework, and it organises everything into five pillars. Think of these as the five non-negotiable qualities of any system worth building:
| Pillar | What it means | Restaurant analogy |
|---|---|---|
| Operational excellence | Can you run the system smoothly day after day? Monitoring, automation, continuous improvement | The restaurant has checklists, opening/closing procedures, and a manager who reviews what went wrong each night |
| Security | Is the system protected from threats? Access control, encryption, auditing | Only staff can enter the kitchen. The safe has a combination. Cameras record the register |
| Reliability | Does the system keep working when things break? Redundancy, failover, recovery | If the main oven breaks, there's a backup. If the power goes out, the generator kicks in |
| Performance efficiency | Are you using the right resources for the job? Right-sizing, caching, choosing the right tech | You don't hire a head chef to wash dishes, and you don't seat two people at a table for twelve |
| Cost optimisation | Are you spending wisely? Eliminating waste, using the right pricing model | You don't leave the lights on all night. You buy ingredients in bulk when they're cheaper |
Every architectural decision you make should be evaluated against these five pillars. A system that's blazing fast but costs ten times more than it should fails the cost pillar. A system that's cheap but goes down every weekend fails the reliability pillar.
High availability vs. fault tolerance vs. disaster recovery
These three terms get used interchangeably, but they mean different things. Here's how to keep them straight:
High availability (HA) means the system stays up almost all the time. It's measured in "nines" โ 99.9% uptime ("three nines") means about 8.7 hours of downtime per year. 99.99% ("four nines") means about 52 minutes per year. The goal is to minimise downtime, but you accept that brief interruptions might happen.
Restaurant analogy: The restaurant is open 7 days a week, 16 hours a day. Customers can almost always walk in and get a table.
Fault tolerance means the system keeps working even while something is actively broken. No interruption at all โ the failure is invisible to the user.
Restaurant analogy: There are two identical kitchens. If one catches fire, the other keeps producing food without a single order being missed. Customers never know anything went wrong.
Disaster recovery (DR) means you have a plan to get the system back up after a major failure. There will be downtime, but you know exactly how to recover and how long it will take.
Restaurant analogy: The restaurant burns down. But you have insurance, a backup location, recipes stored off-site, and a plan to reopen within two weeks.
| Concept | Downtime? | Cost | Use when... |
|---|---|---|---|
| High availability | Minimal (seconds to minutes) | Medium | Most production applications |
| Fault tolerance | Zero | High | Critical systems โ banking, healthcare, aviation |
| Disaster recovery | Hours to days | Lower (planning + backup costs) | Every system needs a DR plan, even if it's simple |
There Are No Dumb Questions
"Do I need all three?"
In practice, yes โ but at different levels. Every system should have a disaster recovery plan. Most production systems should be highly available. Only mission-critical systems (think stock exchanges, air traffic control) need full fault tolerance, because it's expensive to eliminate every possible point of failure.
"What's the difference between RTO and RPO?"
RTO (Recovery Time Objective) is how fast you need to be back online after a failure. RPO (Recovery Point Objective) is how much data you can afford to lose. If your RPO is 1 hour, you need backups at least every hour. If your RTO is 15 minutes, your recovery process must get you back online within 15 minutes. These two numbers drive every DR decision.
Regions and availability zones: cities and buildings
Cloud providers organise their infrastructure geographically. Understanding this hierarchy is fundamental:
Regions are independent geographic areas โ think of them as cities. AWS has regions like us-east-1 (Northern Virginia), eu-west-1 (Ireland), and ap-southeast-1 (Singapore). Each region is completely independent. A disaster in one region doesn't affect another.
Availability Zones (AZs) are isolated data centres within a region โ think of them as buildings within a city. Each AZ has its own power, cooling, and networking. If one building floods, the others keep running. A typical region has 3 AZs.
Edge locations are smaller caches spread around the world โ think of them as pop-up kiosks in airports and malls. They serve content closer to users for faster delivery (this is what CDNs like CloudFront, Akamai, and Cloudflare use).
| Level | Analogy | Purpose | Example |
|---|---|---|---|
| Region | City | Geographic isolation, data residency, latency | us-east-1 (Virginia) |
| Availability Zone | Building in a city | Fault isolation within a region | us-east-1a, us-east-1b |
| Edge location | Pop-up kiosk | Cache content close to users | 400+ locations globally (AWS CloudFront) |
When you deploy an application, you choose a region based on three factors: where your users are (lower latency), where your data is legally allowed to live (compliance), and which services are available in that region (not every service is available everywhere).
For high availability, you deploy across multiple AZs within a region. For disaster recovery, you replicate to a different region entirely.
Design a Multi-Region Architecture
25 XPAuto-scaling: the thermostat
Your home thermostat monitors the temperature. When it gets too cold, the heater turns on. When it's warm enough, the heater turns off. You don't manually flip a switch every time the temperature changes โ the system reacts automatically.
Auto-scaling works exactly the same way. You set rules: "If CPU usage exceeds 70% for 5 minutes, add two more servers. If it drops below 30% for 10 minutes, remove one server." The cloud watches your metrics and adjusts capacity automatically.
There are two types:
Vertical scaling (scale up/down) โ make a single server bigger or smaller. Give it more CPU, more RAM, more storage. This is like replacing your restaurant's small oven with a bigger one. It has limits โ you can only make one machine so large.
Horizontal scaling (scale out/in) โ add or remove servers. Instead of one big oven, you install five normal ovens. This is how most cloud systems scale, because there's no upper limit โ you can always add more servers.
| Type | How it works | Pros | Cons |
|---|---|---|---|
| Vertical | Bigger machine | Simple, no code changes | Has a ceiling, requires downtime to resize |
| Horizontal | More machines | No ceiling, no downtime | App must be designed for it (stateless) |
Auto-scaling is one of the cloud's superpowers. In the on-premises world, you had to buy enough servers for your peak traffic and let them sit idle the rest of the time. With auto-scaling, you pay for peak capacity only when you actually hit peak traffic.
Load balancing: the restaurant host
When you walk into a busy restaurant, the host doesn't send every customer to the same table. They look at which tables are available, which sections are busy, and seat you at the best option. If one section of the restaurant is closed for cleaning, the host routes everyone to the open sections.
A load balancer does exactly this for your servers. It sits in front of your application and distributes incoming requests across multiple servers. If one server goes down, the load balancer stops sending traffic to it. If a new server spins up (thanks to auto-scaling), the load balancer starts including it.
Common load balancing strategies:
- Round robin โ requests go to each server in turn: Server 1, Server 2, Server 3, Server 1, Server 2... Simple but doesn't account for server health.
- Least connections โ send the request to whichever server is handling the fewest active connections. Smarter for uneven workloads.
- IP hash โ the same user always goes to the same server. Useful when you need "sticky sessions."
- Weighted โ some servers get more traffic than others (e.g., a powerful server gets 60% of traffic, a smaller one gets 40%).
There Are No Dumb Questions
"What happens if the load balancer itself goes down?"
Cloud-managed load balancers (like AWS ALB, Azure Load Balancer, or Google Cloud Load Balancing) are themselves highly available. They run across multiple AZs and are designed to never be a single point of failure. You don't manage the load balancer's infrastructure โ the cloud provider does.
"Do I always need a load balancer?"
If you have more than one server, yes. Even if you have just one server, a load balancer adds health checking โ it can detect when your server is unhealthy and stop routing traffic to it while you fix the problem.
Microservices vs. monoliths
When you build an application, you have two fundamental architecture choices:
Monolith โ everything lives in one big codebase, deployed as one unit. The user interface, business logic, database access, payment processing, email sending โ all bundled together.
Restaurant analogy: One kitchen where a single chef handles everything โ appetisers, mains, desserts, drinks. If the chef gets sick, the entire restaurant shuts down.
Microservices โ the application is split into small, independent services. Each service does one thing, has its own database, and communicates with others via APIs.
Restaurant analogy: Separate stations โ one for appetisers, one for grills, one for desserts, one for drinks. Each station has its own chef. If the dessert station breaks down, you can still serve mains.
| Factor | Monolith | Microservices |
|---|---|---|
| Complexity | Simple to build and deploy initially | Complex โ many moving parts |
| Scaling | Scale the whole thing, even if only one part needs it | Scale individual services independently |
| Failure | One bug can crash the entire app | Failures are isolated to individual services |
| Team size | Works well for small teams | Enables large teams to work independently |
| Best for | Early-stage startups, simple apps | Large-scale systems, big organisations |
Most companies start with a monolith and evolve toward microservices as they grow. Don't start with microservices unless you have a large team and a good reason โ the operational overhead is significant.
Serverless: no servers to manage (there are still servers)
"Serverless" is one of the most confusingly named concepts in tech. There are definitely servers โ you just don't manage, provision, or even think about them. You write a function, upload it, and the cloud runs it whenever it's triggered.
Restaurant analogy: Instead of renting a kitchen and hiring staff, you use a ghost kitchen service. You submit your recipe, and the ghost kitchen makes the dish whenever an order comes in. You pay per dish, not per month. No orders? You pay nothing.
AWS Lambda, Azure Functions, and Google Cloud Functions are the big three serverless platforms. You write a function (a small piece of code), define a trigger (an HTTP request, a file upload, a database change), and the platform handles everything else โ scaling, availability, patching, monitoring.
When serverless makes sense:
- Event-driven workloads โ processing an image after upload, sending an email after a purchase
- Unpredictable traffic โ a function that gets called 10 times Monday and 10,000 times Friday
- Background jobs โ data processing, report generation, cleanup tasks
When serverless doesn't make sense:
- Long-running processes โ most serverless platforms have execution time limits (15 minutes on Lambda)
- Consistent high-traffic โ if your function runs 24/7 at full capacity, a traditional server may be cheaper
- Complex stateful applications โ serverless functions are stateless by design
Containers: packing your app in a box
A container packages your application along with everything it needs to run โ code, libraries, system tools, settings. It's like packing a lunchbox: everything needed for the meal is inside, and it works no matter what table you eat at.
Before containers, the most dreaded phrase in software was "it works on my machine." An app would run perfectly on a developer's laptop but crash on the production server because of different software versions, missing libraries, or conflicting configurations. Containers solve this by guaranteeing that the app runs the same way everywhere.
Docker is the most popular container technology. You write a Dockerfile that describes what goes in the container, build it into an image, and run that image as a container on any machine with Docker installed.
But running one container is easy. Running hundreds or thousands of containers โ starting them, stopping them, restarting crashed ones, distributing them across servers โ requires an orchestrator:
| Orchestrator | Who runs it | Key feature |
|---|---|---|
| Kubernetes (K8s) | Open source (Google origin) | The industry standard. Runs anywhere โ AWS, Azure, GCP, or your own servers |
| Amazon ECS | AWS | Tighter AWS integration, simpler than Kubernetes |
| Azure Kubernetes Service (AKS) | Azure | Managed Kubernetes on Azure |
| Google Kubernetes Engine (GKE) | Google Cloud | Managed Kubernetes on GCP (built by the team that created K8s) |
Restaurant analogy: Docker is the standardised lunchbox. Kubernetes is the catering company that manages hundreds of lunchboxes โ making sure every table gets the right meal, replacing any lunchbox that got dropped, and adding more lunchboxes when a big event is coming.
Choose the Right Compute Model
25 XP2. A complex e-commerce platform with 15 interconnected microservices running 24/7. โ
Infrastructure as code: blueprints, not handwork
In the early days, setting up a server meant clicking through a web console โ manually creating a database here, a network there, a firewall rule somewhere else. This worked until it didn't. When you needed to recreate the same setup in another region, or figure out what changed when something broke, you were hunting through console logs and hoping someone wrote it down.
Infrastructure as Code (IaC) means defining your entire infrastructure in code files โ text files that describe every server, database, network, and permission. You check these files into version control (like Git), review them in pull requests, and apply them with a single command.
Restaurant analogy: Instead of telling a contractor "build me a kitchen" and hoping they remember what you said, you give them architectural blueprints. Every detail is documented. If you want to build an identical kitchen in another city, you hand over the same blueprints.
The two dominant IaC tools:
| Tool | Created by | Works with | Language |
|---|---|---|---|
| Terraform | HashiCorp | Any cloud provider (AWS, Azure, GCP, Cloudflare, and hundreds more) | HCL (HashiCorp Configuration Language) |
| CloudFormation | AWS | AWS only | JSON or YAML |
Other notable tools include Pulumi (IaC using real programming languages like Python and TypeScript), AWS CDK (write CloudFormation in TypeScript/Python), and Bicep (Azure-specific, simpler than ARM templates).
Why IaC matters:
- Reproducibility โ spin up identical environments for development, staging, and production
- Version control โ track every change to your infrastructure the same way you track code changes
- Automation โ deploy infrastructure changes through CI/CD pipelines, not manual clicks
- Documentation โ the code is the documentation. You can always see exactly what's deployed
- Disaster recovery โ if a region goes down, redeploy the entire infrastructure from code in a new region
There Are No Dumb Questions
"Do I need to learn Terraform?"
If you're in a technical role, yes โ Terraform is the most widely adopted IaC tool and a near-universal requirement for cloud engineering roles. If you're in a non-technical role, you don't need to write Terraform, but understanding what IaC is and why your engineering team uses it will help you communicate about timelines, infrastructure costs, and change management.
"Can't I just click around in the AWS console?"
You can โ and many people do for quick experiments. This is called "ClickOps." But for anything production-grade, ClickOps is risky: there's no audit trail, no easy way to replicate the setup, and one wrong click can take down your system. IaC eliminates these risks.
Putting it all together: the restaurant chain
Let's design a cloud system the way you'd design a restaurant chain.
You want to open a taco chain that serves customers across the US, Europe, and Asia.
-
Regions โ You open locations in three cities: New York, London, and Tokyo. Each location operates independently. If the London location burns down, New York and Tokyo keep serving tacos.
-
Availability zones โ Within each city, you have 3 buildings: a main restaurant, a backup kitchen, and a prep facility. If the main restaurant floods, the backup kitchen takes over.
-
Auto-scaling โ Each location has a core staff of 5, but during lunch rush, you bring in 10 extra workers. After the rush, they go home. You don't pay 15 people to stand around at 3 p.m.
-
Load balancing โ A host at the front door seats customers at the least-busy table. If one section of the restaurant is closed, the host redirects everyone to open sections.
-
Microservices โ The kitchen is divided into stations: grill, prep, desserts, drinks. Each station operates independently. If the dessert station breaks, mains keep flowing.
-
Containers โ Every recipe is standardised in a laminated card. A new cook can pick up any recipe card and produce the exact same dish, every time, at any location.
-
Infrastructure as code โ You have a complete operations manual. Opening a new location means following the playbook, not reinventing everything from scratch.
-
Serverless โ For catering orders (unpredictable, occasional), you use a ghost kitchen. No permanent staff, no fixed costs โ you only pay when an order comes in.
This is how real cloud architectures work. Replace "restaurant" with "application," "cook" with "server," and "customers" with "requests," and you have a production-grade distributed system.
Design Your Own Cloud Architecture
50 XPBack to Maria's taco restaurant
Maria's capacity problem โ a kitchen that handled Tuesday lunch but collapsed on Saturday night โ is the same problem every cloud architect solves. Auto-scaling is hiring extra cooks when the line wraps around the block and sending them home when it is quiet. Load balancers are the host seating customers at the right table. Multi-AZ deployment is opening a second kitchen across town so one fire does not shut down the whole operation. Every concept in this module maps to a real-world problem that restaurant owners have been solving for decades โ the cloud just makes it possible at internet scale.
Key takeaways
- The Well-Architected Framework has five pillars: operational excellence, security, reliability, performance efficiency, and cost optimisation. Every architectural decision should be evaluated against all five.
- High availability minimises downtime. Fault tolerance eliminates it. Disaster recovery plans for getting back up after a major failure. Most systems need all three at different levels.
- Regions are isolated geographic areas (cities). Availability zones are independent data centres within a region (buildings). Deploy across multiple AZs for high availability, across regions for disaster recovery.
- Auto-scaling automatically adjusts capacity based on demand โ like a thermostat for your infrastructure. Prefer horizontal scaling (more servers) over vertical scaling (bigger server).
- Load balancers distribute traffic across servers, hiding failures and enabling scaling. They're the restaurant host seating customers at the right table.
- Microservices split an application into independent services. Start with a monolith unless you have a strong reason not to.
- Serverless (Lambda, Azure Functions) lets you run code without managing servers โ ideal for event-driven, unpredictable workloads.
- Containers (Docker) package apps with their dependencies. Kubernetes orchestrates containers at scale.
- Infrastructure as Code (Terraform, CloudFormation) defines infrastructure in version-controlled files โ reproducible, auditable, and automatable.
Knowledge Check
1.A company needs its payment processing system to continue working with zero downtime, even if an entire server fails mid-transaction. Which concept best describes this requirement?
2.An e-commerce platform experiences 10x traffic during Black Friday compared to a normal day. Which architectural approach best handles this pattern?
3.A startup wants to process user-uploaded images (resize, compress, generate thumbnails). Uploads are unpredictable โ 5 per hour on quiet days, 5,000 per hour after a marketing campaign. Which compute model is the best fit?
4.A team manages their cloud infrastructure by clicking through the AWS web console. A new engineer accidentally deletes a production database. Which practice would have most likely prevented this?