What is Re-identification (Data Re-identification)?

Re-identification, or data re-identification, is the process of linking data that was originally intended not to directly identify a person back to a specific individual. In practice, this means reversing the effects of pseudonymization, ineffective anonymization, or combining multiple data sets in a way that makes it possible to determine the identity of a person visible in a photo or video recording. In the context of visual materials, this risk primarily concerns faces, license plates, and indirect identifiers such as location, recording time, clothing, event context, or unique vehicle characteristics.

From a GDPR perspective, re-identification is critically important because the assessment of whether a given material has been effectively anonymized depends on whether a person can still be identified using means that are reasonably likely to be used. This test follows from Recital 26 of the GDPR, that is, Regulation (EU) 2016/679. If, after blurring a face or license plate, there is still a realistic possibility of establishing identity based on other elements in the frame or on metadata, the material should not be treated as anonymous.

Re-identification (Data Re-identification) - Definition

In operational terms, data re-identification in images and video means the ability to assign an anonymized or partially anonymized visual record to the same person or the same vehicle as a previously known reference record. In technical literature, this concept also appears as person re-identification, vehicle re-identification, or identity linkage. It does not always mean recovering a full name. It is enough to reliably determine that the person in material A is the same person as in material B, and then link that result with additional external information.

In practice, photo and video anonymization most often face re-identification risk in three situations. First, when face blurring or license plate blurring is too weak and can be bypassed. Second, when other indirect identifiers remain visible. Third, when the material contains metadata or context that enables correlation with other data sources.

Element

Importance for Re-identification

Example in Video Material

 

Face

Direct or biometric identifier

Incompletely blurred face in a side shot

License plate

Vehicle identifier, sometimes indirectly identifying the owner or user

Partially readable number after export compression

Clothing and body shape

Indirect identifier

The same coat, backpack, and walking route

Metadata

Source of correlation with other data sets

Date, time, GPS, device name

Scene context

Makes identification easier when only a few people are involved

Entrance to a specific company or private property

The Role of Data Re-identification in Photo and Video Anonymization

Assessing re-identification risk is one of the core tests of anonymization quality. Simply applying a blur effect, mask, or pixelation does not in itself prove that privacy protection is effective. What matters is the final result and the material’s resistance to being linked back to a person using reasonably available technical and organizational means.

For photos and video recordings, it is especially important to distinguish between anonymization and pseudonymization. If the data controller or recipient of the material can still reconstruct identity because they have the original file, a linkage key, other reference recordings, or detailed metadata, this usually does not qualify as anonymization in the strict sense. This distinction is important for the Data Protection Officer when assessing legal basis, retention, disclosure of materials, and information obligations.

  • Anonymization should reduce the possibility of identification to a level that is practically irreversible.
  • Pseudonymization reduces risk but still leaves open the possibility of linking the data back to a person.
  • Re-identification is an indicator that the protection method used was insufficient in the specific context of use.

Data Re-identification Technologies and Mechanisms

In video surveillance and visual analysis systems, re-identification may rely on both manual analysis and machine learning models. In particular, deep learning is used to build models that recognize faces, people, or vehicles based on feature vectors. This is the same broad technological direction that, on the privacy protection side, enables training models to automatically detect faces and license plates and then blur them. However, detection alone and blurring alone do not eliminate the entire risk if other scene features remain unchanged.

Typical re-identification mechanisms include:

  • comparing facial features if the blurring was incomplete or ineffective,
  • person re-identification based on clothing, body shape, gait, and movement trajectory,
  • vehicle re-identification based on make, model, color, damage, and surroundings,
  • correlation of EXIF metadata, timestamp, location, and sequence of events,
  • linking the material with publicly available data, such as event coverage.

In practice, Gallio PRO is used for automatic face blurring and license plate blurring in photos and video processed outside real-time mode. The software does not anonymize live video streams and does not automatically detect logos, tattoos, name badges, documents, or content displayed on monitor screens. These elements may require manual editing precisely because, if left in the material, they can increase the risk of re-identification.

Key Parameters and Metrics for Data Re-identification

Re-identification risk should be assessed in measurable terms. In research environments, record-matching quality metrics are used, while in compliance environments the focus is on the probability of identification given specific adversary resources. For video and photo materials, both the quality of detecting objects to be blurred and the resistance of the final image to reconstruction or mask bypass are important.

Metric / Parameter

Meaning

Practical Notes

 

Detection recall

Percentage of faces or license plates detected for anonymization

Low recall increases the number of unblurred identifiers

Detection precision

Percentage of correct detections

Low precision reduces operational quality, but usually affects privacy less than low recall

mAP

Mean Average Precision for object detection

A common metric for evaluating detection models

Rank-1 / Recall@k

Success rate of finding the correct identity within the top-k results

Used in person re-identification research

mAP for re-ID

Quality of retrieving the same person or vehicle within a data set

The higher it is, the greater the risk of linking recordings

Masking level

Degree of unreadability of the face or license plate after export

It should be evaluated after final compression, not only in the working preview

A simple model can be helpful when assessing risk:

Re-identification risk = probability of matching x availability of auxiliary data x impact of anonymization failure

This is not a normative formula, but a useful analytical simplification for DPIAs and internal testing.

Challenges and Limitations of Data Re-identification

The biggest issue is usually not the presence of a face alone, but the sum of the information left in the material. Even correct face blurring may not be enough if the recording shows a rare event, a precise location, and an exact time. In a small community or workplace environment, that combination may be enough to identify a person.

The main limitations and sources of error include:

  • a false sense of security after applying a simple blur,
  • leaving unblurred license plates or faces in individual frames,
  • failing to account for reflections in glass, mirrors, or screens,
  • exporting material with metadata that facilitates correlation,
  • failing to consider legal exceptions for publishing someone’s image, which do not remove the need for a case-by-case risk assessment.

In Poland, the status of license plates as personal data depends on the context. In practice, data protection authorities and legal scholarship emphasize the need for caution, while court decisions sometimes support the view that a license plate alone does not always constitute personal data. From a compliance perspective, it is safer to take contextual re-identification risk into account rather than relying solely on an abstract classification of a single identifier.

Normative and Source References for Data Re-identification

The concept of re-identification should be interpreted in light of both legal and technical sources. The most important are the acts and documents that define identifiability of a person and the criteria for assessing the means likely to be used.

  • GDPR - Regulation (EU) 2016/679, Recital 26 and Article 4(1) and 4(5) - identifiability of a person and pseudonymization.
  • Article 29 Working Party Opinion 05/2014 on Anonymisation Techniques - discussion of the risks of singling out, linkability, and inference, 2014.
  • EDPB Guidelines 4/2019 on Article 25 Data Protection by Design and by Default, version adopted on 20 October 2020.
  • ISO/IEC 20889:2018 - Privacy enhancing data de-identification terminology and classification of techniques.
  • NISTIR 8053 - De-Identification of Personal Information, National Institute of Standards and Technology, 2015.

These documents do not deal exclusively with images and video, but their criteria can be directly applied to visual materials. The concepts of linkability and singling out are particularly useful because they accurately reflect the risk of linking several recordings to the same person despite face blurring.

Examples of Use Cases and Re-identification Risk Assessment

In practice, the assessment should focus on the specific use case rather than on the technology alone. The same level of blurring may be sufficient for internal training material but insufficient for publication on the internet, where the amount of auxiliary data available is incomparably greater.

  • Parking lot recording - faces were blurred, but license plates and the time of the event were left visible. The re-identification risk is high.
  • Reception area footage - faces blurred, but an employee ID badge remains visible. The risk is still significant.
  • Publication from a public event - an exception may apply where a person’s image is only part of the overall scene, but the assessment must still consider the nature of the shot and whether a specific person can be singled out.
  • Evidence archive - even after blurring, the material may still be personal data if the controller keeps the original and can restore the link.