Artificial Intelligence in Anonymization - Definition
Artificial Intelligence in anonymization refers to the use of AI-driven methods to detect and conceal personal identifiers in images and video recordings - most notably human faces and license plates. The objective is to minimize the risk of identifying a natural person in line with the GDPR definition of anonymization. According to Recital 26 of the GDPR: “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable” is not considered personal data (Regulation (EU) 2016/679).
In the context of images and video, AI-based anonymization typically involves a processing pipeline that includes: detection of sensitive objects, temporal tracking, quality verification, mask application (e.g., Gaussian blur, pixelation), and export of the processed material. Deep learning methods are widely used to train models capable of detecting faces and license plates under diverse real-world conditions, supporting effective visual anonymization and GDPR compliance.
The Role of AI in Image and Video Anonymization
AI enables automatic, repeatable, and scalable masking of sensitive areas while preserving as much background detail as possible. This is particularly important for long-duration recordings, where manual redaction would be disproportionately expensive, time-consuming, and error-prone.
- Detection: The model classifies and localizes faces and license plates in individual frames.
- Tracking: Multi-object tracking (MOT) algorithms maintain object consistency across frames, stabilizing masks and reducing flicker.
- Masking: Operators are applied to prevent content reconstruction in typical use cases (e.g., Gaussian blur with sufficiently high sigma or pixelation with large block size).
- Export: Output is saved using lossless or lossy codecs while preserving the integrity of masked regions.
Gallio PRO context: The system automatically blurs only faces and license plates. It does not anonymize full body silhouettes, does not perform real-time anonymization, and does not store detection logs containing personal or sensitive data. Other elements (e.g., logos, tattoos, name badges, documents, screens) can be masked manually in the editor.
AI Technologies Used in Anonymization
The foundational layer consists of convolutional and single-stage object detectors trained on large datasets, combined with efficient tracking algorithms. The choice of architecture depends on the trade-off between sensitivity, false positives, and processing throughput.
- Face detection: RetinaFace (Deng et al., 2020), with facial landmark regression, supports stable masking under tilt and partial occlusion. Performance is commonly evaluated on the WIDER FACE dataset (Yang et al., 2016).
- License plate detection: YOLOv5/YOLOv8 (Ultralytics, 2020-) or EfficientDet models trained on domain-specific datasets (e.g., CCPD, 2018) enable detection of small objects under varying lighting conditions.
- Object tracking: DeepSORT (Wojke et al., 2017) and ByteTrack (Zhang et al., 2022) improve mask continuity in dynamic scenes.
- Deployment: On-premise implementations using ONNX Runtime or NVIDIA TensorRT facilitate compliance with GDPR Article 5 principles of data minimization and purpose limitation through local processing.
Parameter selection (e.g., Gaussian blur sigma, pixelation block size, bounding box margin) should reflect the re-identification risk in a specific use case, in line with ISO/IEC 20889:2018 guidance on de-identification technique classification.
Key Parameters and Metrics in AI-Based Anonymization
Quality assessment should combine detection metrics with masking effectiveness metrics. Below are the most important measures and their operational significance in AI video anonymization systems.
Metric | Definition / Notes | Unit
|
|---|---|---|
Precision (P) | P = TP / (TP + FP) - proportion of correct detections; limits masking of irrelevant areas | 0-1 |
Recall (R) | R = TP / (TP + FN) - proportion of detected objects; high recall minimizes identity disclosure risk | 0-1 |
F1 Score | F1 = 2PR / (P + R) - balance between precision and recall | 0-1 |
IoU | IoU = |B ∩ B̂| / |B ∪ B̂| - overlap between ground truth and detection; affects mAP | 0-1 |
mAP@[τ] | Mean Average Precision at IoU threshold τ (e.g., 0.5, 0.5:0.95) - standard in object detection | 0-1 |
Latency | Frame processing time (including detection and masking) | ms/frame |
Throughput | Frames per second at a given hardware configuration and resolution | fps |
FPH / FN | False positives per hour and number of missed objects - critical for risk audits | count |
Coverage | Percentage of face/license plate area covered by the mask after stabilization | % |
For compliance processes, high recall and an appropriate mask margin beyond object contours are essential to reduce re-identification risk based on edge details or compression artifacts.
Challenges and Limitations
The effectiveness of Artificial Intelligence in anonymization depends on scene conditions and how closely operational data matches the training distribution. Below are key technical and regulatory risks.
- Image conditions: Strong motion, motion blur, low contrast, and occlusions reduce recall, particularly for small license plates.
- Domain variability: Unusual fonts and layouts, face masks, glasses, extreme angles - these require adaptation or domain-specific training.
- Mask stability: Lack of tracking leads to mask flickering; MOT algorithms and trajectory smoothing mitigate this issue.
- Legal aspects: According to EDPB Guidelines 3/2019, a person’s image may constitute personal data, and a license plate number may qualify as personal data depending on context and identifiability. Before publishing or sharing material, organizations should assess risk and legal basis. In practice, faces and license plates are often blurred. In Poland, regulatory guidance and case law indicate that the classification of license plates as personal data depends on circumstances and is not always unequivocal.
The selection of techniques and operational thresholds should result from a documented risk assessment and transparent data processing policies, with reference to ISO/IEC 27001:2022 (information security management) and ISO/IEC 20889:2018 (de-identification classification).
Normative References and Sources
The following bibliography includes legal acts, standards, and technical publications supporting the definitions and practices described above.
- GDPR: Regulation (EU) 2016/679, Recital 26 and Article 4 - EUR-Lex, 2016.
- EDPB: Guidelines 3/2019 on processing of personal data through video devices, Version 2.0, 2020.
- WP29/EDPB: Opinion 05/2014 on Anonymisation Techniques (WP216), 2014.
- ISO/IEC 20889:2018 - Privacy enhancing data de-identification - Terminology and classification, ISO, 2018.
- ISO/IEC 27001:2022 - Information security, cybersecurity and privacy protection - ISMS requirements, ISO, 2022.
- ENISA: Recommendations on shaping technology according to GDPR provisions, 2019.
- RetinaFace: Jiankang Deng et al., “RetinaFace: Single-shot Multi-Level Face Localisation in the Wild,” CVPR Workshops, 2020.
- WIDER FACE: Shuo Yang et al., “WIDER FACE: A Face Detection Benchmark,” CVPR, 2016.
- YOLOv5/YOLOv8: Ultralytics Documentation and Model Cards, 2020-2023.
- CCPD: X. Xu et al., “Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline,” ECCV Workshops, 2018.
- DeepSORT: N. Wojke, A. Bewley, D. Paulus, “Simple Online and Realtime Tracking with a Deep Association Metric,” ICIP, 2017.
- ByteTrack: Y. Zhang et al., “ByteTrack: Multi-Object Tracking by Associating Every Detection Box,” ECCV, 2022.
- UODO: Materials and guidance on video surveillance and image publication - uodo.gov.pl, review 2018-2023.