Cloud Security Essentials
Misconfigured S3 buckets have leaked billions of records. Here's how to secure cloud infrastructure — IAM policies, encryption, network security, and the shared responsibility model.
One misconfigured role. 100 million records stolen.
In July 2019, a former cloud engineer named Paige Thompson exploited a misconfigured web application firewall (WAF) at Capital One. The WAF had been assigned an IAM role with far more permissions than it needed — it could reach metadata services, assume other roles, and access S3 buckets containing customer data. Thompson used a server-side request forgery (SSRF) attack to hit the EC2 metadata endpoint, grabbed the WAF's temporary credentials, and used those credentials to list and download data from Capital One's S3 buckets.
The result: 106 million credit card applications exposed — names, addresses, dates of birth, credit scores, Social Security numbers, bank account numbers. Capital One paid over $190 million in settlements and fines.
Here's what's painful about this breach: none of the underlying AWS services were "hacked." S3 worked exactly as designed. IAM worked exactly as designed. The metadata endpoint worked exactly as designed. Every single component did precisely what it was configured to do. The problem was the configuration — an overprivileged role attached to a public-facing service.
The lesson: In the cloud, security is not about whether the provider's infrastructure is safe. It's about whether you configured it correctly.
The shared responsibility model: where the cloud provider stops and you start
Every cloud provider — AWS, Azure, Google Cloud — draws a clear line. They secure some things. You secure the rest. This is called the shared responsibility model, and misunderstanding it is the root cause of most cloud breaches.
| Layer | Who secures it | What it includes |
|---|---|---|
| Physical infrastructure | Cloud provider | Data centres, physical servers, networking hardware, cooling, power |
| Hypervisor & virtualisation | Cloud provider | The software that creates and isolates virtual machines |
| Network infrastructure | Cloud provider | Global backbone, DDoS protection at the edge, physical network |
| Operating system (IaaS) | You | Patching, hardening, configuration of the OS on your VMs |
| Network configuration | You | VPCs, security groups, NACLs, firewall rules, routing |
| Identity & access | You | IAM users, roles, policies, MFA, password policies |
| Data | You | Encryption, access controls, classification, backups |
| Application | You | Your code, its dependencies, its security vulnerabilities |
Think of it like renting an apartment. The landlord (AWS) makes sure the building has a solid foundation, working elevators, and fire exits. But if you leave your front door unlocked, that's on you. The landlord is not going to come lock it for you.
There Are No Dumb Questions
"If AWS secures the physical infrastructure, why do breaches still happen?"
Because the vast majority of cloud breaches aren't physical break-ins. They're misconfigurations — a storage bucket left public, an IAM role with admin access attached to a web server, an API key committed to a GitHub repo. Gartner predicted (in 2019) that through 2025, 99% of cloud security failures would be the customer's fault — a prediction that has broadly proven accurate. The infrastructure is rock-solid. The configuration is where things break.
"Does the shared responsibility model apply to compliance too?"
Yes. If you need to be HIPAA-compliant, AWS will provide HIPAA-eligible services and sign a Business Associate Agreement. But you still have to configure those services correctly, encrypt data, manage access, and maintain audit logs. AWS gives you the tools. You have to use them.
IAM in the cloud: users, roles, policies, and least privilege
IAM (Identity and Access Management) is the security system that controls who can do what on which resources. Every cloud provider has one — AWS IAM, Azure Active Directory (now Entra ID), Google Cloud IAM. The concepts are the same across all three.
Four building blocks:
| Concept | What it is | Real-world analogy |
|---|---|---|
| User | A person or service with credentials (username + password or access keys) | An employee with a building key card |
| Group | A collection of users who share the same permissions | "Engineering team" — everyone on the team gets the same access |
| Role | A set of permissions that can be assumed temporarily by users, services, or applications | A contractor badge — you wear it for the job, then hand it back |
| Policy | A JSON document that defines what actions are allowed or denied on which resources | The rules printed on the back of the key card — "Floor 3 access only, no server room" |
Here's what an overly permissive AWS IAM policy looks like — and why it's dangerous:
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
This says: "Allow this identity to do anything on every resource." This is the equivalent of giving a janitor the master key to every room in the building, including the vault. The Capital One breach happened partly because the WAF's role had far broader permissions than it needed.
The fix — least privilege:
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-app-assets/*"
}
This says: "Allow this identity to read objects from one specific bucket." Nothing else. If this role's credentials are stolen, the attacker can read assets from that bucket — not dump every database in the account.
Least privilege means giving every identity the minimum permissions needed to do its job. Not more. Not "just in case." If a Lambda function only reads from one S3 bucket, its role should allow s3:GetObject on that bucket and nothing else.
Spot the Overprivileged Policy
25 XPEncryption: at rest and in transit
Encryption converts readable data into unreadable ciphertext. Without the decryption key, the data is useless. In cloud security, you need encryption in two places:
| Type | What it protects | When it matters | How to enable it |
|---|---|---|---|
| At rest | Data stored on disk — S3 objects, database records, EBS volumes | If someone gains physical access to a disk or copies raw storage | Enable server-side encryption (SSE) on S3, encrypt EBS volumes, enable TDE on databases |
| In transit | Data moving over the network — API calls, file transfers, web traffic | If someone intercepts network traffic (man-in-the-middle attack) | Use TLS/HTTPS for all connections, enforce TLS 1.2+ |
Think of encryption at rest as a locked safe in your house. Even if a burglar gets inside, they can't read your documents without the combination. Encryption in transit is like sending that document in an armoured truck instead of an open-bed pickup — even if someone stops the truck, they can't read what's inside.
Key management matters. Encryption is only as strong as who controls the keys. Cloud providers offer key management services — AWS KMS, Azure Key Vault, Google Cloud KMS — that handle key rotation, access control, and audit logging. The worst thing you can do is hardcode encryption keys in your application source code.
There Are No Dumb Questions
"If the cloud provider encrypts data at rest by default, why should I care?"
Default encryption (like S3's SSE-S3) protects against physical theft of drives — which is the provider's risk, not yours. For your own security, you want customer-managed keys (SSE-KMS) so you control who can decrypt the data. If an attacker compromises an IAM role, default encryption won't help — they can read the data because the role has permission to. Customer-managed keys let you add another layer: even with S3 read access, you need KMS decrypt permission too.
"Is HTTPS enough for encryption in transit?"
For web traffic, yes — HTTPS (TLS) encrypts the connection between the client and your server. But inside your VPC, traffic between services might be unencrypted by default. For sensitive workloads, enable TLS between internal services too, and use VPC endpoints to keep traffic off the public internet entirely.
Network security: VPCs, security groups, and NACLs
A VPC (Virtual Private Cloud) is your own isolated network within the cloud provider's infrastructure. Think of it as your private floor in a shared office building — other tenants can't walk in, even though you're in the same building.
Inside a VPC, you control traffic with two tools:
| Tool | Level | Stateful? | What it does | Analogy |
|---|---|---|---|---|
| Security Group | Instance/resource | Yes (return traffic automatically allowed) | Acts as a firewall around individual resources — defines which ports and IP addresses can reach them | A locked door on each room — you decide who gets a key |
| NACL (Network ACL) | Subnet | No (must explicitly allow return traffic) | Acts as a firewall at the subnet boundary — broader, rule-based filtering | A security checkpoint at the floor entrance — everyone must pass through |
Best practice: default deny. Start by blocking everything, then open only the ports and IP ranges you actually need. A common mistake is leaving SSH (port 22) open to 0.0.0.0/0 — the entire internet — for "convenience." That's like leaving the front door of your office building propped open because you don't want to carry a key.
Public vs. private subnets. Resources that need to be internet-facing (load balancers, web servers) go in public subnets. Everything else — databases, internal services, caches — goes in private subnets with no direct internet access. The database should never have a public IP address. Never.
Design the Network
25 XPLogging and monitoring: your security camera system
You can't protect what you can't see. Cloud security requires continuous logging of every action — who did what, when, and from where.
| Provider | Logging service | What it captures |
|---|---|---|
| AWS | CloudTrail | Every API call made in your account — who made it, what they did, which resource, from which IP |
| Azure | Azure Monitor + Activity Log | Resource operations, sign-in events, diagnostic data |
| Google Cloud | Cloud Audit Logs | Admin activity, data access, system events |
CloudTrail is the most critical security tool in AWS. If someone creates an IAM user, deletes an S3 bucket, or launches 500 instances to mine cryptocurrency, CloudTrail records it. Without CloudTrail enabled, you're flying blind — you won't know you've been breached until the bill arrives.
What to monitor and alert on:
- Root account usage — The root account should almost never be used. Any login triggers an alert.
- IAM policy changes — Someone granting themselves admin access is a red flag.
- Security group changes — Opening port 22 to
0.0.0.0/0at 3 a.m. is suspicious. - S3 bucket policy changes — Making a bucket public should trigger an immediate alert.
- Unusual API calls — A role that normally reads from DynamoDB suddenly calling
ec2:RunInstances500 times. - Failed authentication attempts — Brute-force attacks leave a trail of failed logins.
Compliance in the cloud: SOC 2, HIPAA, and beyond
Compliance frameworks tell organisations what security controls they must have in place. Moving to the cloud doesn't remove compliance requirements — it changes how you meet them.
| Framework | What it covers | Cloud considerations |
|---|---|---|
| SOC 2 | Trust principles: security, availability, processing integrity, confidentiality, privacy | AWS, Azure, and GCP are all SOC 2 certified for their infrastructure. You must demonstrate your configuration and processes are also compliant. |
| HIPAA | Protected Health Information (PHI) in healthcare | AWS and Azure offer HIPAA-eligible services and will sign a BAA. You must encrypt PHI at rest and in transit, restrict access via IAM, enable audit logging, and configure services per HIPAA guidance. |
| PCI DSS | Credit card data | Specific requirements for network segmentation, encryption, access control. Cloud providers provide compliant infrastructure — you ensure compliant configuration. |
| GDPR | Personal data of EU residents | Data residency matters — you may need to keep data in EU regions. Configure S3/storage replication to stay within approved regions. |
The pattern is always the same: the cloud provider gives you compliant infrastructure and the tools to be compliant. Whether you actually are compliant depends on how you configure those tools.
There Are No Dumb Questions
"If AWS is SOC 2 certified, doesn't that mean my app on AWS is automatically SOC 2 compliant?"
No. AWS's SOC 2 report covers their controls — physical security, infrastructure management, availability. Your SOC 2 audit covers your controls — how you configure IAM, how you encrypt data, how you handle incidents, how you manage employee access. You inherit the physical security controls from AWS, but you still need to demonstrate everything above the hypervisor.
"What's a BAA and why does it matter for HIPAA?"
A Business Associate Agreement (BAA) is a legal contract where the cloud provider agrees to safeguard PHI according to HIPAA rules. Without a signed BAA, storing patient data on AWS is a HIPAA violation — even if you've encrypted everything perfectly. AWS will sign a BAA, but only for specific HIPAA-eligible services (S3, RDS, Lambda, etc.), not for every AWS service.
The seven deadly sins of cloud security
These are the misconfigurations that show up in breach reports over and over again:
| Sin | What happens | Real-world example |
|---|---|---|
| 1. Public S3 buckets | Data stored in S3 is accessible to anyone on the internet | Twitch (2021) — entire source code and payout data leaked from misconfigured storage |
| 2. Hardcoded credentials | API keys, database passwords, or access keys committed to source code or config files | Uber (2016) — attackers found AWS keys in a GitHub repo and accessed 57 million user records |
| 3. Overprivileged IAM roles | Roles with Action: "*" or Resource: "*" give attackers broad access when compromised | Capital One (2019) — overprivileged WAF role led to 106 million records stolen |
| 4. Disabled logging | No CloudTrail or audit logs means breaches go undetected for months | Multiple cases where crypto-mining ran for weeks before the bill revealed it |
| 5. Unpatched instances | EC2 or VM instances running outdated OS or software with known vulnerabilities | Equifax (2017) — unpatched Apache Struts led to 147 million records exposed (on-prem, but same principle applies in cloud) |
| 6. Open security groups | SSH (22), RDP (3389), or database ports (3306, 5432) open to 0.0.0.0/0 | Thousands of MongoDB instances wiped by automated scanners finding open port 27017 |
| 7. No MFA on root/admin | Attacker with a stolen password gets full account access | SIM-swap attacks on admin accounts leading to complete cloud account takeover |
Cloud Security Audit
50 XPPutting it all together: the cloud security checklist
Before any workload goes to production, run through these layers:
| Layer | Control | Status |
|---|---|---|
| Identity | MFA on all human accounts, especially root/admin | |
| Identity | Least-privilege IAM policies — no wildcards | |
| Identity | No long-lived access keys; use roles and temporary credentials | |
| Network | Resources in private subnets by default | |
| Network | Security groups default-deny; only required ports open | |
| Network | No SSH/RDP open to 0.0.0.0/0 | |
| Data | Encryption at rest with customer-managed keys (KMS) | |
| Data | Encryption in transit (TLS 1.2+) for all connections | |
| Data | S3 buckets private by default; Block Public Access enabled | |
| Logging | CloudTrail / audit logs enabled in all regions | |
| Logging | Alerts on root login, IAM changes, security group changes | |
| Secrets | No credentials in source code; use secrets manager | |
| Compliance | BAA signed (if HIPAA); data residency configured (if GDPR) | |
| Patching | Automated OS patching or managed services that handle it |
Back to Capital One. After the breach, Capital One invested heavily in cloud security — mandatory least-privilege reviews, automated scanning for overprivileged roles, blocking the metadata endpoint on public-facing instances, and real-time alerting on unusual S3 access patterns. Every one of those controls existed before the breach. They just weren't turned on.
The tools exist. The frameworks exist. The documentation is thorough. Cloud security failures are almost never about missing technology. They're about missing configuration — the gap between what the cloud provider offers and what the customer actually enables.
Key takeaways
- The shared responsibility model divides security between provider and customer. The provider secures infrastructure; you secure configuration, identity, data, and applications.
- IAM and least privilege are the foundation. Every identity should have the minimum permissions needed — no wildcards, no "just in case" admin access.
- Encrypt everything — at rest with customer-managed keys, in transit with TLS. Encryption without key management is a false sense of security.
- Network isolation through VPCs, private subnets, and security groups keeps resources unreachable by default. Only expose what must be exposed.
- Logging (CloudTrail, audit logs) is non-negotiable. If you can't see it, you can't protect it. Alert on anomalies.
- Compliance is your responsibility. The provider gives you compliant infrastructure and tools. You must configure and operate them correctly.
- Most breaches come from misconfigurations, not sophisticated attacks. Public buckets, overprivileged roles, hardcoded credentials, disabled logging — these are solved problems with known fixes.
Knowledge Check
1.In the shared responsibility model, which of the following is the CUSTOMER's responsibility when using an IaaS service like AWS EC2?
2.A Lambda function needs to read images from a single S3 bucket called 'user-avatars' and write metadata to a DynamoDB table called 'avatar-metadata'. Which IAM policy follows least privilege?
3.Your company stores patient health records on AWS and needs to be HIPAA-compliant. AWS is SOC 2 certified and offers HIPAA-eligible services. Which statement is true?
4.During a security audit, you discover that an EC2 instance's security group allows inbound SSH (port 22) from 0.0.0.0/0, CloudTrail is disabled in two out of three regions, and the application's database has a public IP address. Which remediation should you prioritise FIRST?