What Is a Re-identification Risk Assessment?

Re-identification Risk Assessment - Definition
The Role of Re-identification Risk Assessment in Image and Video Anonymization
Technologies and Re-identification Risk Assessment Procedure
Key Parameters and Metrics
Challenges and Limitations
Use Cases
Normative References and Sources

Re-identification Risk Assessment - Definition

Re-identification Risk Assessment is a structured process for estimating the likelihood that individuals remain identifiable after anonymization techniques have been applied to images and video recordings. From a legal perspective, the key reference point is the GDPR, which in Recital 26 requires that anonymization render identification not reasonably likely, taking into account all means reasonably likely to be used by the controller or any other person (EU 2016/679). From a technical standpoint, risk assessment frameworks are defined, among others, by ISO/IEC 20889:2018 and ISO/IEC 27559:2022, which describe classes of de-identification techniques and risk assessment processes in the context of data, including visual data.

In the context of image and video anonymization, a Re-identification Risk Assessment involves empirically and contextually verifying to what extent face blurring and license plate blurring prevent re-identification using modern face recognition methods and license plate OCR. This includes testing with deep learning models that are necessary for building automated redaction systems (face and plate detection), as well as attack simulations using similar or more advanced recognition models.

The Role of Re-identification Risk Assessment in Image and Video Anonymization

A Re-identification Risk Assessment defines blurring parameters prior to deployment and then verifies the effectiveness of anonymization on sample materials. In practice, this means determining filter strength, mask margins, and frame sequence processing methods to ensure that the risk of face recognition or license plate reading remains low under realistic attack scenarios.

In many Western European countries, license plate blurring is required in specific use cases (e.g., Street View publications). In Poland, there is no universal obligation to blur license plates in every situation. However, guidance from data protection authorities (including the EDPB/Article 29 Working Party) emphasizes the need to minimize identification risk depending on the context.

Within the Gallio PRO environment, the re-identification risk assessment process focuses on faces and license plates. Gallio PRO operates on-premise, does not perform real-time anonymization, and automates only face and license plate blurring. Other potentially identifying elements, such as logos or tattoos, can be manually masked in the built-in editor and should also be considered as part of the risk assessment.

Technologies and Re-identification Risk Assessment Procedure

A comprehensive re-identification risk assessment combines detection, anonymization, and attack tools. In practice, deep neural networks are used for detecting faces and license plates, blurring algorithms are applied for anonymization, and independent recognition systems are employed to measure residual identification risk.

Detection and masking: face detectors (e.g., convolutional neural networks such as RetinaFace) and license plate detectors, followed by Gaussian blur or pixelation with parameters adjusted to object size.
Attack model: face recognition based on embeddings (e.g., ArcFace) and OCR for license plates. These models reflect reasonably accessible means available to a potential attacker.
Procedure: first, estimate identification performance on non-anonymized material (baseline level); then repeat the tests after anonymization and measure the decrease in identification probability.
Contextual evaluation: analyze additional factors such as distinctive clothing, unique accessories, EXIF metadata, and audio. When necessary, apply manual masking beyond faces and license plates.

Key Parameters and Metrics

Metrics used in a Re-identification Risk Assessment should be measurable, reproducible, and reported with uncertainty. It is recommended to use 95% confidence intervals for binomial measures.

Metric	Definition	Measurement Notes
p_reid	Empirical probability of re-identification after anonymization = number of correct identifications / number of attempts	Report with 95% confidence interval for the binomial distribution
Recall@k	Percentage of cases where the correct identity appears within the top k search results	Test on a reference gallery; compare before and after anonymization
FNR_det	Miss rate for faces/plates = number of missed detections / number of ground truth objects	Threshold IoU, e.g., 0.5 relative to ground truth annotations
Mask coverage	Proportion of the face/license plate area covered by the mask	Mask IoU relative to ground truth; margin control required
Blur strength (s)	Gaussian sigma or pixelation block size normalized by interpupillary distance or plate height	Report as a fraction of object size

In the literature (particularly in the context of health data), acceptable re-identification risk thresholds around 0.09 are sometimes referenced in expert opinions (El Emam et al., 2013). The GDPR does not define a specific numerical threshold. In the context of images and video, acceptance criteria should be defined based on testing with a realistic attack model and material representative of the intended use case.

Challenges and Limitations

Even after effective face or license plate blurring, re-identification may still be possible through contextual information. A proper Re-identification Risk Assessment must consider both technical and organizational factors.

Auxiliary information: clothing, body shape, location, time, and unique accessories. In such cases, manual masking in Gallio PRO is recommended.
Detection errors: partial occlusions, motion, and motion blur increase FNR_det. Sequence-level quality control is required.
Reconstruction-based attacks: super-resolution and deblurring may enhance quality; therefore, blur strength must be conservatively selected relative to object size.
Metadata: EXIF data and embedded thumbnails may disclose personal information. They should be removed during the publication process.
Legal discrepancies: the absence of a single numerical threshold across the EU requires documenting assumptions and threat models for each project.

Use Cases

Re-identification Risk Assessment is applied in practical scenarios involving visual data processing by controllers and processors.

Publishing training and promotional materials with prior face and license plate blurring.
Sharing CCTV footage upon request of authorized entities while minimizing the risk of disclosing identities of bystanders.
Anonymizing research materials and AI datasets, including reporting p_reid and FNR_det metrics.
Meeting legal obligations in EU Member States where license plate blurring may be required depending on context, while documenting the assessment methodology.

Normative References and Sources

The following documents and publications form the foundation for defining and conducting a Re-identification Risk Assessment for images and video:

GDPR, Regulation (EU) 2016/679, Recital 26 and Article 4. Available via EUR-Lex.
ISO/IEC 20889:2018 - Privacy enhancing data de-identification terminology and classification of techniques. ISO, 2018.
ISO/IEC 27559:2022 - Privacy enhancing data de-identification framework. ISO, 2022.
Article 29 Working Party, Opinion 05/2014 on Anonymisation Techniques, 2014.
CNIL, Guide to Anonymisation, 2019. https://www.cnil.fr
NISTIR 8053, De-Identification of Personal Information, NIST, 2015.
Deng J. et al., ArcFace: Additive Angular Margin Loss for Deep Face Recognition, CVPR 2019 (99.83% on LFW).
El Emam K., Arbuckle L., Anonymizing Health Data, Morgan Kaufmann, 2013 (discussion of approximately 0.09 risk thresholds in expert assessments).