Protecting patient privacy in medical databases and collections of medical photos

Robert Bateman

Health data is arguably the most sensitive type of information. Failing to protect patient privacy can lead to fraud, create severe distress, and even put lives at risk.

But health data and medical images are increasingly important in treatment and research. Databases are getting bigger, and health data is becoming more diverse. The need for strong security and privacy safeguards has never been more crucial.

This article looks at the legal framework for protecting medical databases and collections of medical photos, the methods for keeping health data secure, and the severe consequences of failing to protect patient privacy.

Introduction to patient privacy concerns in medical databases and collections of medical photos

Healthcare puts the patient at the center of their treatment, and maintaining trust between patients, practitioners, and healthcare institutions is essential.

Since the Ancient Greeks, doctors have pledged to guard their patients’ “holy secrets” as part of the Hippocratic Oath. But keeping medical information confidential is much harder now than two millennia ago.

Case study: Cancer patient’s medical photos leaked online

In February 2023, staff at Leigh Valley Health Network (LVHN) received an email from a ransomware group known as BlackCat (or ALPHV) stating:

“We have the data of your client base of patients, namely their passports, personal data, questionnaires, nude photos, and the like.”

BlackCat threatened to publish the information online unless LVHN paid a ransom. The healthcare provider refused to pay, and BlackCat started leaking photos it had obtained during the attack—including identifiable photos of patients naked from the waist up.

One of LVHN’s cancer patients heard about the attack in the media. She emailed her doctors to ask if her medical records were involved. Over a month later, LVHN called the patient and told her that her “nude photos”, showing both her face and chest, had been compromised.

LVHN offered the patient an apology and two years of credit monitoring as compensation for the incident. Understandably, the patient was not satisfied with the healthcare provider’s response.

The patient is currently fighting a lawsuit against LVHN seeking “actual, consequential, and nominal” damages—both for herself and for “hundreds if not thousands” of other people affected by the breach.

Why are medical data breaches so expensive?

For 13 years running, IBM has found that health information breaches are the most expensive data breaches. And the cost of healthcare data breaches increased by 53.3% between 2020 and 2023, averaging over nearly $11 million per incident.

Why are data breaches involving health data so expensive?

●      Healthcare is a tightly regulated industry, and regulators can issue severe penalties. Even countries with generally weak privacy laws safeguard medical records strictly.

●      Health data is complex and includes many types of information. This variety and complexity can make health data breaches harder to detect, prevent, and mitigate.

●      When medical histories leak, people can experience distress, embarrassment, and reputational damage. The victims of health data breaches are therefore much likelier to succeed in court.

●      Health data breaches can lead to Insurance fraud. When resources are misallocated, or people receive inappropriate treatment, fraud can cost lives as well as money.

Medical images, such as X-rays, MRIs, and pathology slides, are essential to most medical records. As we saw in the LVHN case above, the damage can be particularly severe when medical photos are breached.

Laws covering patient privacy and health data

As noted, medical records and medical photos are subject to particularly strict rules. Here’s a look at some legal requirements for medical data in Europe and the US.


Personal data revealing an individual’s health is a type of “special category data” under the General Data Protection Regulation (GDPR), which applies in the UK and across the European Economic Area (EEA).

Here’s how some of the GDPR’s principles of data processing might apply to health information:

●      Data minimization: Only process health information where necessary for a specific purpose. Where possible, remove identifiers from medical records. If someone’s face appears in a medical photo or scan, blur the face to conceal the person’s identity.

●      Storage limitation: Don’t store health information “in a form which permits identification” for longer than necessary. Delete or anonymize health data as soon as you no longer need it (unless you’re legally required to retain it for a certain period).

●      Security: Protect health information via “appropriate technical or organizational measures” such as restricting access to medical information or blurring identifiers in medical photos.

Several EU countries have issued their largest GDPR fines against organizations that failed to comply with the rules and principles for processing health data. For example:

●      Croatia: €5.8 million issued to a debt collection agency that failed to demonstrate its “legal basis” for processing health data.

●      Portugal: €4.3 million against the national statistics agency for unlawfully transferring census data (including health information) to the US.

●      The Netherlands: €3.7 million against the tax authorities, partly due to processing health data without a legal basis.

These were all complex cases involving several GDPR infringements—but in each case, the penalty was more severe because the violations involved health data.

United States

In the US, healthcare providers and their “business associates” are regulated under the Health Insurance Portability and Accountability Act (HIPAA), which protects “protected health information” (PHI) and electronic PHI (ePHI).

HIPAA tightly controls how healthcare providers store and use health information. The law provides looser rules for “limited data sets” from which information such as names, biometric identifiers, and “full face photographic images” have been removed.

Here are some of the highest penalties under HIPAA:

●      Anthem Inc. paid a $16 million HIPAA penalty after attackers stole the medical data of nearly 80 million people. On top of the penalty, the company also settled a lawsuit for $115 million.

●      Premera Blue Cross paid a $6.85 million penalty after a hack compromised PHI over over 10 million people. Again, the company settled a separate class action lawsuit, this time for $74 million.

●      Advocate Health Care paid $5.5 million after four computers were stolen from its offices, affecting over 4 million patients.

But HIPAA doesn’t cover every company handling health data, so individual states have passed privacy laws to plug HIPAA’s gaps.

●      In the past few years, around 16 states, including California, Texas, and Washington, have passed new privacy laws

●      These new laws are inspired by the EU’s GDPR and affect businesses across almost every industry.

●      Almost all of these new laws categorize health-related information as a type of “sensitive data” and require businesses to implement “reasonable security measures” to keep such data safe.

Utilizing medical databases and photo collections in healthcare

There are countless uses for medical data, including:

●      Diagnosis

●      Treatment planning

●      Research

●      Education

Increasingly, health data is used for AI-driven research.

In November 2023, Imperial College London researchers showed how an AI model trained on “more than 1 million images from real-world screening programs” could help improve breast cancer detection.

But while AI-driven health research can create significant benefits, large collections of medical photos can also pose huge privacy and security risks.

Security issues in medical databases and collections of medical photos

Research into the state of medical image security has revealed some shocking findings.

●      In 2019, researchers from Greenbone Networks discovered 5 million patients’ images “sitting unprotected on the internet”.

●      A follow-up study in 2020 found a huge increase in the amount of examination photos available online—over a billion medical images and patient records.

●      A further investigation nearly four years later revealed that the problem has still not been solved.

The reasons for this poor security landscape are complex—hundreds of healthcare providers storing medical images on thousands of poorly secured servers.

But there’s another way to ensure that identifiable photos don’t fall into the wrong hands—anonymization.

Anonymizing medical data and protecting patient privacy

If medical data is truly anonymous, it cannot be used to identify a patient. Even if anonymous data is compromised, this will not impact a patient’s medical confidentiality.

But what makes a piece of medical data “anonymous”?

The EU’s anonymization standard is particularly strict. Data protection regulators endorse the following three-part test to help decide whether a given piece of data is anonymous.

●      Is it still possible to single out an individual?

●      Is it still possible to link records relating to an individual?

●      Can information be inferred concerning an individual?

Data generalization and randomization techniques—when properly performed—can achieve a high standard of anonymization in aggregate data sets.

Of course, not all medical data can or should be anonymized. In most contexts, medical records must be linked to a specific patient.

But where it is possible to anonymize parts of a medical record and still use the data for its intended purpose, the principle of data minimization requires you to do so.

Anonymyzing medical photos

While medical photos might need to be stored within a larger, identifiable medical record, they rarely need to contain identifiable information themselves.

Earlier, we looked at a case study involving a breast cancer patient whose medical images were leaked online. This incident could have been much less severe if the healthcare provider had blurred the patient’s face so she was not identifiable.

Blurring faces and other identifiers can render an image anonymous. The European Data Protection Board (EDPB) says:

“ for instance blurring (a) picture with no retroactive ability to recover the personal data that the picture previously contained, the personal data are considered erased in accordance with GDPR.

In other words: If you effectively blur all identifiers in a medical photo, the photo will no longer contain personal data. From a GDPR perspective, blurring identifiers in photos “erases” the personal data.

Full-face photographic images under HIPAA

In the US, HIPAA includes a similar principle. The law allows healthcare providers to render health information “not individually identifiable” by removing certain types of data from a medical record, including “full-face photographic images”.

As such, blurring faces means the photo can be used for a broader variety of purposes—even without the patient’s consent (as long as the photo is not linked to other information that could identify the patient).

Even organizations not subject to HIPAA should anonymize medical data and photos where possible.

New US state privacy laws, like the California Consumer Privacy Act (CCPA) and Colorado Privacy Act (CPA), exclude de-identified and anonymous information from the rules. This could include medical photos where people’s faces have been blurred.

Biometric data protection and security measures for patient privacy

We’ve explored how blurring faces and other identifiers turn personal data (images of identifiable individuals) into anonymous data.

Nonetheless, there are privacy considerations even during the anonymization process.

Biometric tools vs Gallio PRO

Healthcare providers and other organizations are directly responsible for any third-party software they use and must be diligent when choosing a provider.

Some anonymization software uses biometric identification to analyze a person’s face and detect whenever that individual appears in a photo. Biometric data used for identification purposes is “special category data” under the GDPR and is “sensitive data” under many other data protection and privacy laws.

Using biometrics to identify an individual is considered high-risk data processing subject to strict controls, and it carries a risk of severe penalties if done improperly.

Face anonymization for medical photos

Using Gallio PRO anonymization software is less risky than using anonymization that relies on biometric techniques.

●      Gallio PRO anonymization model uses computer vision techniques to detect patterns across vast data sets.

●      A dataset derived from real photos is used to train the model, which learns to recognize faces across a variety of lighting conditions, image resolutions, and environments.

●      This training method leads to higher accuracy rates, reducing the risk of missing people’s faces when anonymizing photos.

●      Facial images are transformed into unidentifiable data during the training process. No identifiable images of people’s faces remain in the AI system. This method helps preserve the privacy of anyone whose face appeared in the training photos.

Gallio PRO software never learns to recognize individuals—just what “a human face” looks like. Privacy-protecting training techniques mean the software can achieve high accuracy rates without any significant risk to individuals.

Protecting patient privacy in medical databases and medical photos

●      Health data and medical images are among the most sensitive types of personal data.

●      Cyberattacks against healthcare providers are extremely costly and can cause significant injury to patients.

●      Researchers have repeatedly shown that a lot of medical data is poorly secured. Many high-profile cybersecurity incidents have involved leaked medical images and health data.

●      When people’s medical photos are leaked, regulators can issue steep fines, and courts can award significant damages to affected patients.

●      Anonymizing patient photos is a vital means of protecting patient privacy and meeting legal obligations.

●      To minimize privacy and data protection risks, use anonymization software that can achieve high accuracy rates without relying on biometric identification.