Definition
Artificial Neural Networks (ANN) are a family of machine learning models composed of layers of interconnected computational units that transform input data into desired outputs by learning weights. In image and video processing, deep neural networks (Deep Neural Networks, DNN) are primarily used, including Convolutional Neural Networks (CNN), trained with backpropagation algorithms, often on labeled datasets. This definition is consistent with ISO/IEC 22989:2022, which standardizes core AI and ML concepts, and ISO/IEC 23053:2022, which describes frameworks for AI systems based on machine learning.
In image and video anonymization, neural networks play a critical role as detectors and/or segmenters of sensitive objects. An ANN model locates faces and license plates, after which the system applies post-processing operations such as blurring, pixelation, or masking. The quality and safety of the anonymization process depend on the model’s ability to detect all instances of sensitive objects while maintaining an acceptable false positive rate.
Role in Image and Video Anonymization
- Face detection - CNN-based detectors return bounding boxes or masks that define areas to be blurred. Common approaches include single-stage object detectors and specialized face detection models.
- License plate detection - Object detection models locate license plates under varying lighting conditions and viewing angles. Segmentation can further refine the shape of the applied mask.
- Video tracking - Object association algorithms across frames stabilize masks and reduce flickering. This works in tandem with ANN-based detection.
- Post-processing - After detection, the system applies a blur filter of defined strength, pixelation, or a solid mask. Post-processing parameters are selected to ensure that individuals are not (or are no longer) identifiable, in line with Recital 26 of the GDPR.
ANN Technologies Used for Blurring
- Detection architectures - YOLOv3-v4 and newer single-stage variants offer fast inference with strong accuracy (Redmon et al., 2018; Bochkovskiy et al., 2020). Two-stage models such as Faster R-CNN deliver higher precision at the cost of increased latency.
- Face detectors - RetinaFace combines detection with facial landmark estimation, improving mask localization for non-frontal poses (Deng et al., 2020).
- Segmentation - U-Net and its derivatives precisely delineate contours when irregularly shaped masks are required (Ronneberger et al., 2015).
- Frameworks and deployment - PyTorch or TensorFlow for training, with conversion to ONNX or TensorRT for on-premise deployment. Optimizations include INT8 quantization, pruning, and layer fusion.
- Acceleration - CUDA- and cuDNN-enabled GPUs, alternatively CPUs with AVX2 or dedicated NPU accelerators. Performance depends on input resolution, batch size, and network complexity.
Key ANN Parameters and Metrics for Anonymization
Parameter | Definition | Practical relevance | Source
|
|---|---|---|---|
IoU | Intersection over Union - the overlap area divided by the union area of the detection box and ground truth. | Determines whether a detection matches an object. A common evaluation threshold is 0.5. | Pascal VOC |
Precision | TP / (TP + FP) | Higher precision means fewer non-face or non-plate areas are blurred. | COCO, VOC |
Recall | TP / (TP + FN) | Critical for privacy protection - minimizes missed faces. | COCO, VOC |
F1 Score | 2 × Precision × Recall / (Precision + Recall) | A balanced metric for selecting confidence thresholds. | COCO, VOC |
Mean Average Precision at IoU = 0.5 | Classic object detection metric under the VOC methodology. | Pascal VOC | |
[email protected]:0.95 | Mean mAP averaged over IoU thresholds from 0.5 to 0.95 in steps of 0.05 | A demanding COCO metric that better reflects overall quality. | COCO |
Latency | Inference time per frame [ms] | Important for smooth video processing, including batch workflows. | NIST AI RMF 2023 |
Throughput | Frames per second [fps] | Supports on-premise capacity planning. | NIST AI RMF 2023 |
Confidence threshold | Minimum model confidence required to report a detection | Higher thresholds reduce false positives but may lower recall. | COCO |
NMS IoU | IoU threshold for Non-Maximum Suppression | Controls merging of duplicate bounding boxes. | COCO |
Metric sources: Pascal VOC (Everingham et al., 2010), COCO (Lin et al., 2014). The NIST AI Risk Management Framework 1.0 (2023) recommends selecting and monitoring metrics related to performance and risk throughout the AI system lifecycle.
Challenges and Limitations
- Domain shift - performance degradation outside the training data distribution, such as with different cameras, lighting, or weather conditions.
- Occlusions and motion blur - more difficult detections, including profiles and partially visible faces.
- Data bias - insufficient representation of certain groups can lead to recall disparities. Fairness and performance parity testing is required.
- Adversarial effects - unusual patterns or reflections may interfere with detection.
- Legal requirements - models process personal data at the input stage. A valid legal basis and data minimization principles under the GDPR are required.
Example Use Cases in Gallio PRO
- Automatic blurring of faces and license plates in images and videos using CNN-based detectors. The software does not blur full human silhouettes.
- No real-time processing - batch processing of files rather than live streams.
- Manual mode in the editor for other elements such as logos, tattoos, documents, or screens, without automatic detection of these classes.
- On-premise deployment - full control over data flows within the organization and no data sent to the cloud. The system does not store logs containing face or license plate detection data.
Standards and References
- ISO/IEC 22989:2022 - Artificial intelligence - Concepts and terminology. ISO, 2022.
- ISO/IEC 23053:2022 - Framework for AI systems using machine learning. ISO, 2022.
- Regulation (EU) 2016/679 (GDPR) - Recital 26 and Article 4(1). Official Journal of the EU, 2016.
- EDPB, Guidelines 3/2019 on processing of personal data through video devices, version 2.0, 29 January 2020.
- Goodfellow, Bengio, Courville, Deep Learning, MIT Press, 2016.
- Everingham et al., The Pascal Visual Object Classes Challenge, IJCV, 2010.
- Lin et al., Microsoft COCO, ECCV 2014.
- Redmon, Farhadi, YOLOv3, arXiv:1804.02767, 2018; Bochkovskiy et al., YOLOv4, arXiv:2004.10934, 2020.
- Deng et al., RetinaFace, arXiv:1905.00641, 2020.
- Ronneberger et al., U-Net, MICCAI 2015.
- NIST, AI Risk Management Framework 1.0, 2023.