HIPAA for Software Engineers
From zero to HIPAA-compliant: a builder's guide that starts ELI5 and progresses to architecture decisions. Covered entities, BAAs, ePHI, encryption, audit logs, AWS/GCP/Azure HIPAA-eligible services, breach notification, and the 2025 NPRM that turns 'addressable' into 'required.'
On February 21, 2024, attackers from the ALPHV/BlackCat ransomware group walked through an unsecured Citrix portal at Change Healthcare — a UnitedHealth Group subsidiary — that did not have multi-factor authentication enabled. The protected health information of 192.7 million Americans eventually appeared in the breach notification. UnitedHealth paid a $22 million ransom, projected total breach-related expenses of $1.6 billion, reserved $1.1 billion for settlements, and now expects the all-in cost to reach roughly $2.45 billion. Industry watchers expect HIPAA penalties alone to exceed $100 million — surpassing the previous record set by Anthem in 2018.
That breach happened because someone shipped a Citrix login page without MFA. One configuration. Two and a half billion dollars.
If you build software that touches healthcare data in the United States, this is what's at the other end of the road. Most engineers I meet — bright, careful, security-aware — have only a fuzzy idea of what HIPAA actually requires. They've heard "sign a BAA," they've heard "encrypt at rest," and the rest is vibes.
Here is the field guide that should have existed when you were starting out. We start with TL;DR. We end with implementation. Read it once and you'll know enough to design correctly, talk credibly with security and compliance, and avoid the configuration that ends up on a Federal Register breach notification.
TL;DR — HIPAA in five sentences
That's the whole compressed thing. Now we unpack.
When does HIPAA actually apply?
HIPAA does not apply to all health data. It does not apply because something is medical-sounding. It applies based on who is holding the data and on whose behalf. Specifically:
- The data is Protected Health Information (PHI) — health-related and individually identifying.
- It's held by a covered entity (a healthcare provider that bills electronically, a health plan, or a healthcare clearinghouse) or by a business associate serving that covered entity.
- You're operating in the United States or processing data of U.S. patients on behalf of a U.S. covered entity.
If all three are true, HIPAA applies. If even one is false, HIPAA may not — though some other regime probably does. A consumer fitness app that you log into directly, with no doctor in the loop, is generally not HIPAA-covered (though it might still face FTC Health Breach Notification rules + state privacy laws like California's CMIA). The same app, the moment it integrates with a clinic so the clinic can read your data, almost always becomes a business associate.
If you build a SaaS product and a single hospital signs up, you are most likely a business associate from that day forward. Your obligations are real. You sign a BAA with that hospital. You sign BAAs with every vendor of yours that touches the data. You comply with the Security Rule. The hospital's compliance team will ask, audit, and remember.
What counts as PHI? The 18 Safe Harbor identifiers
The Privacy Rule's Safe Harbor de-identification method (45 CFR §164.514) lists 18 categories of identifiers. If a record contains health-related information plus any one of these, treat it as PHI:
- Names
- Geographic subdivisions smaller than a state (street, city, county, ZIP — except first 3 digits if population > 20,000)
- All elements of dates (birth, admission, discharge, death) except year. For ages over 89, only "90 or older."
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate / license numbers
- Vehicle identifiers (incl. license plates)
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers (fingerprints, voiceprints, retinal scans)
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code
That last one ("any other unique identifying code") is a deliberately broad sweep. Combined-quasi-identifiers — like a 5-digit ZIP plus DOB plus gender — have been shown to re-identify ~87% of U.S. residents in classic studies. The Safe Harbor approach is conservative for a reason.
When PHI lives in electronic systems, we call it ePHI — and that is the focus of the HIPAA Security Rule, which is where 90% of engineering work happens.
The five rules of HIPAA
HIPAA is one law with five operational rules layered on top:
- Privacy Rule (2003) — What uses and disclosures of PHI are permitted, who can see what, patients' rights to access and amend their records, the "minimum necessary" principle.
- Security Rule (2005) — How ePHI is protected. Three categories of safeguards: administrative (policies, training, risk analysis), physical (data center access, device disposal), technical (encryption, access controls, audit logs). This is the engineering rule.
- Breach Notification Rule (2009) — When PHI is breached, notify affected individuals, HHS, and the media (if > 500 individuals in a state) within 60 days.
- Enforcement Rule — Defines OCR's authority, the four-tier penalty structure, and the audit framework.
- Omnibus Rule (2013) — Closed loopholes. Made business associates directly liable. Strengthened breach notification. Updated the privacy/security rules to align with the HITECH Act of 2009.
As an engineer building software, you spend most of your time inside the Security Rule and Breach Notification Rule. The Privacy Rule mostly drives policy, contracts, and product UX (consent flows, patient access flows). The Enforcement Rule is what gets quoted to you when something breaks. The Omnibus Rule is why your cloud provider had to sign that BAA.
The Security Rule, decoded for engineers
The Security Rule (45 CFR Part 164, Subpart C) is short — about 40 pages of regulatory text — and organized into three categories of safeguards:
Administrative safeguards (§ 164.308)
Policies, processes, training. Examples: documented risk analysis (the most-cited finding in OCR enforcement), workforce security and authorization, sanction policies, incident response plans, contingency plans (DR and backup), business associate contracts, periodic technical evaluations. You write the policy. Then you actually do what it says. Then you train people on it. Then you audit yourself.
Physical safeguards (§ 164.310)
Where the servers live and who can touch them. Facility access controls, workstation security, device and media controls. If you're entirely on AWS or Google Cloud, your provider handles most of this — that's why their BAAs matter. If you ship on-prem (some hospital deployments still do), you own this fully.
Technical safeguards (§ 164.312) — the engineer's section
This is where your code actually has to do the work. Five required standards:
- Access Control — unique user identification, automatic logoff, encryption/decryption controls. Each user has their own account, no shared logins, sessions time out.
- Audit Controls — hardware, software, and procedural mechanisms that record and examine activity in systems containing ePHI. This is required, not addressable. No risk-based opt-out.
- Integrity — mechanisms to ensure ePHI is not altered or destroyed in unauthorized ways. Hashing, checksums, write-once stores, version control.
- Person or Entity Authentication — verify that someone seeking access is who they claim to be. Today: passwords + MFA strongly recommended. After 2026 NPRM: MFA required.
- Transmission Security — protect ePHI in transit. TLS 1.2+ with strong ciphers, mutual TLS for service-to-service when sensitive, no unencrypted fallback paths.
That's the entire technical safeguards section. About one page of regulation that has launched a thousand consulting practices.
Encryption: the rule that's about to harden
Today, encryption of ePHI is technically "addressable" — meaning if you can document why you don't encrypt, you can in theory get away with not encrypting. Don't. Even today, lack of encryption is the most common contributing factor in OCR penalties. After the 2025 NPRM finalizes (expected 2026), encryption at rest and in transit becomes required, with very limited exceptions.
Concretely:
- At rest: AES-256 on every disk, every database, every object storage bucket holding ePHI. Cloud-managed keys (AWS KMS, Google KMS, Azure Key Vault) with customer-managed key rotation are the easy default. Customer-managed keys (CMK) sit one tier above for compliance posture.
- In transit: TLS 1.3 preferred, TLS 1.2 minimum. Disable TLS 1.0/1.1 and SSL entirely. Use HSTS. Pin certificates in mobile apps where feasible. Internal service-to-service: mTLS or service mesh-managed TLS, not plaintext on the assumption that "the VPC is private."
- Backups: backups containing ePHI must be encrypted with the same rigor as production. Snapshots in your cloud account, exports to S3 buckets, dumps you ship to a partner — all of it.
- Key management: rotate keys (KMS handles this for you on a schedule), separate the duties of who can use vs who can administer keys, log every key use to an audit trail (CloudTrail / Cloud Audit Logs).
Encryption gives you a regulatory bonus: the Breach Notification Rule's safe harbor for encrypted data. If ePHI is exposed in transit or at rest but the underlying bytes are encrypted with sufficient strength and the keys were not compromised, the disclosure is generally not a reportable breach. This single fact has saved more healthcare companies from front-page news than any other technical control.
Audit logs: who, what, when, action — for six years
Audit logging under § 164.312(b) is required. Every system containing ePHI must record activity sufficient to reconstruct who did what, when, and what action they performed: view, create, modify, delete, print, export. Logs must be retained for a minimum of six years per § 164.316(b)(2).
Concretely, for a typical web/SaaS architecture:
- Application-level audit log: a separate, append-only stream that records every PHI-touching action by every user and service. User ID, action, resource ID, timestamp, IP, request ID, outcome. Don't try to derive this later from raw HTTP access logs — it's painful and incomplete.
- Database audit log: Postgres pgAudit, MySQL audit_log, MongoDB audit log. Records DB-level access for forensics when application logs are insufficient or compromised.
- Cloud audit log: CloudTrail (AWS), Cloud Audit Logs (GCP), Activity Log (Azure). Records who did what at the cloud-API level — IAM changes, KMS key uses, encryption setting modifications.
- Storage: separate from application data, write-once-read-many (S3 Object Lock, GCS retention policies), encrypted, access-restricted. Six years. Plan for ~10x cheaper cold storage tiers (S3 Glacier Deep Archive) for the long tail.
The mistake most teams make on day one: treating audit logs like operational logs (Datadog, CloudWatch Logs). Operational logs are designed to be deleted. PHI audit logs need to survive your CTO's fourth re-architecture.
Cloud platforms and BAAs
You almost certainly run on AWS, Google Cloud, or Azure. All three sign Business Associate Agreements. The mechanics differ:
Two universal warnings:
- A BAA is necessary, not sufficient. Signing a BAA with AWS does not make your application HIPAA-compliant. It makes AWS's services usable for ePHI. You still have to use them correctly: encryption, access controls, audit logs, configuration. There is a long history of healthcare breaches that occurred on properly BAA-covered AWS accounts because the customer left an S3 bucket public.
- Service-eligibility lists are not stable. AWS's HIPAA Eligible Services Reference is updated regularly — most recently February 10, 2026, when Amazon Bedrock and Bedrock AgentCore were added. Make a habit of checking before you adopt a new managed service. Some services have feature-level exclusions (CloudFront's Embedded POPs, Simple AD, etc) that you might miss in a quick scan.
Compliance automation vendors
Once you're past basic config, somebody has to evidence that you're doing all this — to your customers, to auditors, and eventually to OCR. The compliance-automation category has matured fast in the last three years:
None of these vendors make you compliant. They make you auditable — which is what your customers and auditors are actually checking. The compliance work is still your work; the platform automates the evidence collection, monitors drift, and gives you a portal to show prospective enterprise customers.
De-identification: when you can stop calling it PHI
If you remove the right identifiers, the data is no longer PHI and HIPAA stops applying to it. That's the entire point of de-identification. Two recognized methods:
Safe Harbor (§ 164.514(b)(2))
Remove all 18 identifiers we listed above. Done. Predictable, auditable, no expert needed. Conservative — you'll often lose useful signal (dates collapsed to year, ZIPs to first 3 digits).
Expert Determination (§ 164.514(b)(1))
A qualified expert (statistician, data privacy engineer with appropriate credentials) applies generally accepted statistical and scientific principles, determines that the risk of re-identification is "very small," and documents the methods and analysis. Lets you keep more useful data (full month dates, regional geography). The flexibility costs you the expert engagement, the documentation overhead, and the ongoing review when the data or context changes.
In practice: use Safe Harbor unless you have a specific reason to need more granular data and the budget for an Expert Determination engagement. If you're building analytics on top of PHI, set up your pipelines so that PHI never reaches the analytics warehouse — only Safe Harbor de-identified data does. This is one of the highest-leverage architectural decisions you can make early.
Common engineering pitfalls
The places engineering teams routinely break HIPAA without realizing:
- Real PHI in dev/staging. Engineers copy production data to staging to debug a bug. Now staging needs the same controls as production. Solution: synthetic data generators, or robust de-identification on the way to lower environments.
- PHI in log files. A request body containing patient data ends up in your application logs, which go to Datadog, which doesn't have a BAA on the plan you're paying for. Solution: log scrubbers/redactors (Datadog has them), structured logging discipline, BAA on observability or self-host.
- PHI in error stack traces / Sentry. Same problem, different tool. Sanitize errors before they leave the app boundary.
- PHI in third-party SDKs. Google Analytics, Hotjar, FullStory, session-replay tools, Mixpanel. None of these typically sign BAAs by default. If they capture clinical workflow screens or patient names, you have a problem. Solution: privacy-conscious instrumentation (no session replay on PHI-bearing pages, masked inputs).
- PHI in URLs / query params. Patient IDs in path segments, search queries with PHI — both end up in HTTP access logs, browser history, referrer headers. Use POST + opaque IDs.
- Backups outside the BAA boundary. A nightly mysqldump shipped to a personal Dropbox is a breach waiting to happen. Backups are PHI. Treat them like production.
- Email containing PHI without a BAA. Postmark's appointment reminder with the patient's name and date. Mailchimp newsletter with member's diagnosis. Check the BAA status of your email provider before any PHI touches the body.
- No automatic logoff. A workstation left logged in at a clinic counter exposes PHI. Implement idle session expiration. Combine with screen-lock policies on managed devices.
- Forgetting subcontractors. Your transcription service uses Otter, which uses AWS, which uses... Each layer needs its BAA chain intact. Maintain a vendor inventory.
- No documented risk analysis. The single most-cited finding in OCR enforcement. The Security Rule requires it. Most teams skip it. Do the risk analysis. Write it down. Update it annually.
Breach notification
If a breach happens, the clock starts. The Breach Notification Rule (§ 164.404) requires:
- Notify affected individuals within 60 days of discovery, in writing, with specific required content.
- Notify HHS via the OCR breach reporting portal. Within 60 days if more than 500 individuals are affected; annually for smaller breaches.
- Notify prominent media in the relevant state/jurisdiction if more than 500 individuals are affected in that state.
- Document everything. Even non-reportable incidents need a paper trail showing your analysis of why they didn't meet the breach threshold.
As a business associate, you typically have to notify your covered-entity client "without unreasonable delay" — most BAAs specify a number of days (often 5–30). The covered entity then runs the public notification clock. If your BAA imposes 7-day notice and you discover something on a Friday afternoon, the clock is running before Monday's standup.
Penalties: the math of compliance
Civil monetary penalties scale with culpability:
A single breach often involves many violations (per individual record, per day uncorrected, per affected control). The Anthem settlement was $16M in 2018 for the 78.8M-record breach disclosed in 2015. Premera paid $6.85M to OCR plus $10M to a multi-state coalition. The Change Healthcare-related OCR penalty has not been finalized as of mid-2026, but industry analysts expect it to exceed $100M alone — separate from the $2.45B in total breach costs UnitedHealth has projected.
Beyond OCR civil penalties, you can also face: state attorney general suits (most states have data breach laws and many can stack on HIPAA penalties), private litigation (HIPAA itself does not create a private right of action, but state laws often do), HHS criminal referrals to DOJ for willful violations, and the all-but-quantifiable cost of being on the front page of WIRED.
The economic frame is straightforward: a serious HIPAA program — appropriate cloud architecture, compliance automation, basic operational discipline, audit prep — costs a healthcare-focused startup roughly $50K–$300K/year fully loaded. A serious breach costs $1.5M–$2.5B+ depending on scale. The math does not require a strategy off-site.
What's changing in 2026: the Security Rule NPRM
On December 27, 2024, OCR published a Notice of Proposed Rulemaking — the first substantive update to the HIPAA Security Rule since 2003. The 60-day comment period closed March 7, 2025. A final rule is expected late 2025 or early-mid 2026, with a 12–24 month compliance window typical.
The headline shift: the "required" vs "addressable" distinction goes away. Things that have been merely "strongly recommended" become mandatory:
Translation for engineers: if you're starting a new healthcare project today, build to the NPRM-spec from day one. Encryption everywhere, MFA everywhere, vuln scans every six months, network segmentation as architectural default. The teams that designed for 2026 in 2025 will pass straight through. The teams treating encryption as "addressable" will be re-architecting in 18 months.
Adjacent compliance frameworks worth knowing
- HITECH (2009) — extended HIPAA, made business associates directly liable, beefed up breach notification, added penalty tiers.
- SOC 2 Type II — audit framework focused on security/availability/confidentiality. Not healthcare-specific, but most enterprise buyers ask for it. Vanta/Drata/Secureframe make this dramatically less painful.
- HITRUST CSF — a healthcare-specific control framework that maps to HIPAA + NIST + ISO + others. Many large healthcare buyers require HITRUST as a procurement gate. $50K–$200K to certify, 12–18 month engagements typical.
- 21 CFR Part 11 — FDA's regulation for electronic records and electronic signatures. Applies if you're building software that touches FDA-regulated processes (clinical trials, medical device software).
- GDPR — EU's privacy regime. If you have any EU patient data, GDPR + HIPAA both apply. They overlap conceptually but have different definitions, timelines, and enforcement bodies.
- State laws (CMIA, NY SHIELD, CCPA-derived health rules) — many states have additional health-data laws that stack on HIPAA. California's CMIA is the most aggressive.
- 42 CFR Part 2 — federal substance use disorder records protection, more restrictive than HIPAA in scope. OCR began enforcing under newly delegated authority on February 16, 2026.
An implementation roadmap (90 days, two engineers)
If you're starting a healthcare-adjacent product from zero today and you have two engineers, here's a realistic 90-day path to a HIPAA posture you can credibly defend in front of a hospital security review:
Days 1–14: foundation
- Pick your cloud (AWS / GCP / Azure) and accept the BAA via Artifact (AWS) or equivalent.
- Inventory every third-party service you'll use. For each one that will see PHI: confirm BAA path, sign or queue.
- Stand up a separate AWS/GCP account or project for "HIPAA workloads" — segregated from non-PHI infrastructure.
- Default to encryption everywhere: KMS-encrypted RDS/Aurora, KMS-encrypted S3, TLS 1.3 ingress.
- Set up CloudTrail / Cloud Audit Logs / Activity Logs with multi-region replication and Object Lock retention of 6 years on the storage bucket.
Days 15–45: app architecture
- Implement application-level audit logs: separate append-only stream, per-user-action event records, stored separately with 6-year retention.
- SSO + MFA for all internal users. Build an authentication layer that supports MFA for clinician-facing sign-in.
- Role-based access controls. Minimum-necessary access by default. Document the role matrix.
- Automatic session timeout (15-30 min for clinician workflows is typical).
- Set up your error tracker (Sentry / equivalent) on a BAA-eligible plan, with PHI scrubbing rules in place before any production traffic.
- Set up your observability (Datadog / equivalent) on a BAA-eligible plan, with log scrubbing rules in place.
Days 46–75: operational layer
- Adopt a compliance automation platform (Vanta / Drata / Secureframe). Connect your AWS, GitHub, Okta, etc.
- Write the policy set: information security policy, incident response plan, business continuity / DR plan, access control policy, BAA management policy. The platform will give you templates — start there, customize to fit reality.
- Conduct your first Security Rule risk analysis. Document every system that touches ePHI, every threat, every control. Update annually.
- Do a tabletop incident response exercise. Document it.
- Stand up your vulnerability scanning (e.g., AWS Inspector, Qualys, Tenable) and pentesting cadence (annually, externally).
Days 76–90: harden & evidence
- Workforce training on HIPAA + your specific policies. Documented, with attestation.
- Backup + restore testing. Run a real DR drill.
- Audit log review process — who reviews, how often, what triggers escalation.
- Vendor inventory finalized with BAA status per vendor. Annual review cadence in calendar.
- Run an external HIPAA gap assessment. The result is your road map for the next quarter and the artifact you hand to your first enterprise customer's security team.
After 90 days, you have something defensible. Not perfect, not HITRUST-certified, not auditor-blessed — but a posture that holds up to a healthcare buyer's standard security questionnaire and gives you a clear runway to deeper compliance work as you scale.
The closing frame
HIPAA is not a moat. The big cloud providers, compliance vendors, and SaaS infrastructure players have systematically lowered the cost of compliance. A two-engineer startup in 2026 can stand up a credible HIPAA posture in a quarter that would have taken a 50-person company a year in 2010. The companies that ship correctly first will win healthcare contracts not because they're more compliant, but because they're done worrying about compliance and can focus on the actual product.
HIPAA is also not a vibe. It's a specific federal law with specific requirements, specific penalties, and specific case law. The vibes-based version — "we encrypt things, we have a BAA, we're fine" — is exactly the version that produces $2.45 billion breach costs ten years later. The 2025 NPRM is OCR's signal that the era of "good enough" is ending. Encryption mandatory, MFA mandatory, segmentation mandatory, vuln scans mandatory.
If you're an engineer building healthcare-touching software, the move is the same one as for every other serious infrastructure: design for the future spec, document every decision, automate the evidence, and keep a vendor inventory that someone other than you can read. Then ship, and let your compliance posture be a quiet competitive advantage rather than the thing keeping you up at night.
The healthcare data tax is real — except the tax that compounds isn't compliance work, it's the cost of not doing it. One unsecured Citrix portal at the wrong end of a $2.45 billion bill is the lesson of our era.
Go encrypt the thing. Go sign the BAA. Go log the access. Go set the audit retention to six years. The rest is just rigor.