Metadata Under the Microscope: EXIF/XMP, Fingerprinting, and Forensics. How to Prevent Re-Identification After Anonymization?

Łukasz Bonczol
8/4/2025

Today, every photo or video recording is not just an image, but also a hidden package of information that can lead to unauthorized disclosure of personal data. EXIF/XMP metadata, fingerprinting techniques, and forensic tools can enable the identification of people and places even after seemingly thorough visual anonymization. This issue affects public institutions, companies, and individual users alike.

As a data protection expert, I observe a growing number of cases where the unawareness of the existence and significance of metadata in visual materials leads to serious GDPR violations. Imagine this scenario: a police unit uploads an intervention video on its YouTube channel, carefully blurs faces and license plates, but forgets to remove geolocation metadata. As a result, despite visual anonymization, re-identification of participants is possible - and this constitutes a potential violation of data protection regulations with both financial and reputational consequences.

In this article, we will examine what image and video metadata are, what privacy risks they pose, and how to comprehensively anonymize visual materials in compliance with GDPR, eliminating the risk of re-identification through hidden data invisible to the human eye.

Black and white image of hands typing on a laptop keyboard, with technical diagrams visible on the screen.

What Exactly Are EXIF/XMP Metadata in Images and Recordings?

EXIF (Exchangeable Image File Format) and XMP (Extensible Metadata Platform) metadata are digital "labels" attached to image and video files, containing a broad range of technical and contextual information. EXIF is a standard mainly used in digital photography, while XMP is a newer format developed by Adobe, applied across a wider spectrum of multimedia files.

What do these metadata sets actually contain? The list is surprisingly long: geolocation data (GPS), date and time of creation, device information (camera/camcorder model, serial number), technical parameters (aperture, exposure), and even a photo thumbnail. In some cases, metadata may also contain author information, descriptions, keywords, or even biometric data if the device supports face recognition.

From the GDPR perspective, such information often constitutes personal data or can enable indirect identification of individuals, making it a critical aspect to consider when processing visual materials.

Close-up of abstract patterns with circular shapes and textured surfaces, resembling liquid or condensation on a window. Black and white.

How Can Metadata Lead to Privacy and GDPR Violations?

Metadata in visual materials create multiple paths for potential privacy breaches, even if the images themselves have undergone visual anonymization. Geolocation data can reveal the exact coordinates of where a photo was taken which, combined with date and time, allows deductions about who may have been present. Device information can link back to the owner, especially in the case of professional equipment with unique serial numbers.

From my consulting experience, the most common scenario of violation is publishing photos containing EXIF metadata by public institutions or companies. For example, a municipality publishing local event photos may inadvertently disclose participants’ home addresses if GPS data is embedded. Similarly, sharing materials with media without cleaning the metadata first can unintentionally expose sensitive information.

Under Article 5 of the GDPR, data controllers are required to process data securely, which includes protection against unauthorized disclosure. Failure to remove metadata can therefore be interpreted as a violation of this obligation.

Hand holding a magnifying glass over a laptop keyboard in black and white.

Image Fingerprinting - What Is It and Why Does It Threaten Anonymization?

Image fingerprinting is a technique for creating a unique "fingerprint" of a visual file based on its intrinsic characteristics, patterns, and properties. Unlike metadata, which are added to files, fingerprinting relies on the content itself, making it far more difficult to remove.

Fingerprinting techniques analyze elements such as pixel characteristics, compression patterns, camera sensor noise (unique to each device), and color or contrast structures. These "digital fingerprints" can survive even after blurring faces or license plates, enabling the correlation of anonymized material with other images from the same source.

This threat is particularly significant in the context of machine learning and advanced AI algorithms, which can process massive amounts of visual data to identify patterns invisible to the human eye.

What Forensics Techniques Can Re-Identify Anonymized Materials?

Digital forensics is rapidly advancing with tools that may undermine traditional anonymization methods. Techniques such as image reconstruction, shadow and reflection analysis, or deblurring can sometimes restore information thought to be permanently removed.

Especially concerning are advances in so-called "de-anonymization algorithms," which through contextual analysis, motion pattern comparison (in video), and correlation with publicly available data, can re-identify individuals. For example, even if a face is blurred, body posture, gait, or clothing may still serve as identifiers.

Another technique to note is "super-resolution," which uses AI to upscale images, sometimes revealing details that seemed unreadable after standard blurring.

A laptop with binary code on the screen is placed on a large padlock with a key inserted, symbolizing cybersecurity.

Comprehensive Anonymization - How to Effectively Remove Metadata from Image and Video Files?

Effective anonymization of visual materials requires a holistic approach addressing both visible content and hidden metadata. The first step is always to remove or edit EXIF/XMP metadata. This can be done using specialized software such as ExifTool, or through professional editing applications.

For materials intended for external release (e.g., to media or social platforms), a good practice is generating new files instead of editing originals. Exporting/re-exporting often removes sensitive metadata automatically.

For organizations handling large volumes of visual materials, the recommended solution is implementing automated on-premise anonymization tools that combine visual anonymization (e.g., blurring faces, plates) with metadata cleaning. This minimizes human error and ensures consistent anonymization standards.

Close-up of a dark, textured fabric with diagonal lines and a sheen, creating a pattern of intersecting threads and subtle highlights.

Why Blurring Faces and License Plates May Be Insufficient

Traditional anonymization methods, such as blurring or pixelating faces and license plates, may prove insufficient. First, as noted, they do not address metadata, which may still contain identifying information. Second, modern deblurring and reconstruction algorithms can sometimes partially reverse the process.

A key challenge is also "contextual anonymization" - even if a person’s face is blurred, other visual elements such as distinctive clothing, tattoos, scenery, or companions can enable identification. In the age of social media, where vast numbers of photos are voluntarily shared, correlating such details with public images creates an increasingly real risk.

Additionally, in video materials, traditional approaches often fail to account for motion continuity. Inconsistent blurring across frames or lagging behind fast-moving objects can result in moments where anonymization fails.

Close-up of a camera lens aperture with overlapping blades forming a circular pattern, creating a geometric design in black and white.

Best Practices for GDPR-Compliant Anonymization of Visual Materials

Based on guidance from the European Data Protection Board and GDPR audit experience, I recommend the following best practices for visual data anonymization:

  • Apply a layered approach addressing both visible elements (faces, license plates, distinctive identifiers) and metadata
  • Use advanced methods such as complete replacement of sensitive elements (instead of simple blurring)
  • Implement verification procedures to test anonymization reliability before release
  • Prefer on-premise anonymization software over cloud solutions to minimize unauthorized access risks
  • Keep formal records of anonymization operations as part of GDPR documentation

Equally important is implementing the principle of privacy by design - considering privacy requirements at the data acquisition and processing design stage, not as an afterthought.

A gray security camera mounted on a tiled wall, facing slightly downward, with a sleek, modern design.

How to Automate Metadata Removal across Large File Sets?

For organizations processing large volumes of visual material, manual metadata removal is impractical and error-prone. Automation is essential for both efficiency and data security. There are several approaches:

The first involves scripts that leverage command-line tools such as ExifTool, which can be integrated into workflows. These scripts can batch-process folders, removing targeted metadata fields or replacing them with neutral values.

A more advanced approach involves implementing dedicated Digital Asset Management (DAM) systems with built-in anonymization features. On-premise DAM solutions enable not only automated metadata removal but also full control over workflows and processing logs.

For organizations with the highest security needs, specialized AI-powered anonymization platforms are recommended. These can identify and anonymize both visual elements and metadata automatically. One such example is Gallio Pro, which combines advanced image recognition with metadata management.

Special Cases - Anonymization for Law Enforcement and Media

Law enforcement and media operate in distinct legal and operational contexts affecting anonymization. Police publishing on YouTube or sharing footage with media must balance GDPR obligations with public interest and operational needs.

For law enforcement, a critical distinction is between internal investigative materials (where anonymization may be limited) and materials intended for public release, which require full GDPR compliance. A practical approach is maintaining two versions - full for internal use, and comprehensively anonymized for publication.

Media, on the other hand, often demand rapid access, which can lead to insufficient anonymization. Therefore, institutions sharing materials with journalists should apply a standardized "pre-release" anonymization procedure covering both visual and metadata aspects. Geolocation and timestamp data are especially important to remove as they can enable person identification.

In both cases, on-premise solutions are recommended to maintain full control over anonymization without the security risks of cloud services.

Person in a hoodie working on a laptop in a dimly lit server room, surrounded by racks of equipment.

How AI and Machine Learning Affect Anonymization and Re-Identification

Artificial intelligence and machine learning represent both a challenge and an opportunity in anonymization. On one hand, advanced forensic AI tools dramatically increase re-identification capabilities, analyzing gait patterns, body shapes, or reconstructing blurred images with surprising accuracy.

On the other hand, AI also provides new ways to anonymize more effectively. Modern image recognition algorithms can identify faces, plates, and identifiers more reliably than humans and can go beyond simple blurring by generating synthetic replacements (e.g., realistic but fictitious substitute faces).

Particularly promising are solutions using GANs (Generative Adversarial Networks), which generate convincing synthetic substitutes for sensitive visual elements while maintaining naturality and coherence. This makes re-identification significantly harder: instead of merely masking data (which could be reversed), sensitive elements are replaced entirely with new, non-existent data.

GDPR does not specify strict technical standards for anonymization, focusing instead on results - data are anonymized if identification of a person becomes impossible or disproportionately difficult. This approach grants flexibility but also creates uncertainty over minimum acceptable technical safeguards.

The European Data Protection Board and national regulators have, however, published guidelines that can serve as de facto standards. According to these, effective anonymization requires permanent, irreversible removal of all potential identifiers, including metadata. Importantly, effectiveness must be assessed not only against current technologies but also in consideration of foreseeable advancements.

In practice, this means data controllers should take a conservative approach and assume re-identification methods will continue to evolve. Comprehensive anonymization of both visual and metadata layers is thus not only best practice but also the most reliable way to ensure GDPR compliance and minimize legal risk.

It should also be noted that in the event of a personal data breach resulting from inadequate anonymization, supervisory authorities will assess whether the controller took all reasonable precautions, considering available technology and industry best practices.

Close-up of a face with binary code projected onto it, creating a pattern of light and shadow across the skin and eye. Black and white.

How to Verify the Effectiveness of Metadata Anonymization?

Verification of anonymization should be a standard step before publishing or sharing visual materials. This process can be divided into several key stages:

  1. Technical metadata inspection - using tools (e.g. ExifTool) to thoroughly analyze remaining metadata and confirm all potential identifiers are removed
  2. Forensic resilience testing - attempting re-identification using available forensic techniques
  3. Contextual review - analyzing whether contextual elements (surroundings, clothing, distinctive objects) still allow identification despite identifier removal
  4. Process documentation - formally documenting verification steps as part of GDPR compliance evidence

For highly sensitive or widely published materials, an independent anonymization audit by external experts is also advisable. Such an audit can both reveal vulnerabilities and provide legal protection in the case of GDPR claims.

If you want to explore advanced solutions for comprehensive anonymization, including automated metadata cleaning, check Gallio Pro - an on-premise platform designed with GDPR compliance in mind.

Hands holding a camera over a lightbox displaying film negatives, with a dark background.

FAQ - Frequently Asked Questions on Metadata and Anonymization

Is removing EXIF metadata enough to anonymize photos in line with GDPR?

No, removing EXIF metadata alone is insufficient. Comprehensive anonymization also requires removal or anonymization of visible identifying features such as faces and plates, as well as contextual identifiers within the image.

How can I check what metadata my photo or video contains?

You can use free tools such as ExifTool, metadata viewers, or functions in professional graphics software. In Windows, basic metadata can also be viewed via file properties.

Does converting file formats (e.g. JPG to PNG) remove metadata?

Not always. Some conversions preserve metadata between formats. Best practice is to purposefully remove metadata with dedicated tools before converting formats.

Does publishing photos with geolocation metadata always breach GDPR?

It depends on context and content. If the image contains identifiable individuals and the geolocation metadata contributes to identification, the risk of GDPR violation is high. In contrast, a landscape photo without people generally poses a lower risk.

What are the penalties for improper anonymization of visual materials under GDPR?

Improper anonymization leading to personal data breaches may result in fines up to €20 million or 4% of the company’s annual global turnover. Penalties depend on factors such as the scale of the breach, data categories, and degree of negligence.

Monochrome image of embossed question marks surrounded by wavy contour lines on a smooth background.

References list

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (GDPR) Article 29 Working Party, “Opinion 05/2014 on Anonymisation Techniques” (WP216) European Data Protection Board, “Guidelines 3/2019 on processing of personal data through video devices” Information Commissioner’s Office (UK), “Anonymisation: managing data protection risk code of practice” (2012) Polish Data Protection Authority (UODO), “Personal data protection in video and photographic materials” (2019) ISO/IEC 19794-5:2011 - Information technology - Biometric data interchange formats - Part 5: Face image data Narayanan A., Shmatikov V. (2010) “Privacy and Security: Myths and Fallacies of ‘Personally Identifiable Information’,” Communications of the ACM, Vol. 53 No. 6