YOLO - definition
YOLO (You Only Look Once) is a modern object detection algorithm that enables real-time recognition of multiple objects in images and videos simultaneously. The name reflects the fact that the model “looks” at the entire image once, significantly speeding up processing compared to traditional methods analyzing images patch by patch.
This technology is widely used in AI-powered tools for anonymous processing of visual data, such as blurring faces and license plates in images and videos.
How YOLO works?
The algorithm divides an image into a grid of cells, each responsible for detecting objects whose centers fall within that cell. YOLO predicts bounding boxes and assigns class probabilities. All computations are performed simultaneously, allowing analysis of dozens of frames per second.
Importance of YOLO for image and video anonymization
YOLO is a key component in anonymization tools, enabling fast and accurate detection and localization of faces and license plates for automatic blurring or masking. Its speed and accuracy make it ideal for live applications and large-scale data processing.
Practical applications of YOLO in anonymization
- Automatic detection and blurring of faces in surveillance footage.
- Detection and masking of license plates in drone or dashcam recordings.
- Processing data in media asset management systems requiring GDPR compliance.
Challenges and limitations of YOLO
YOLO's object detection can be less effective with dense scenes, challenging lighting, or overlapping objects. Real-time processing demands powerful infrastructure typically available in on-premise environments.
See also
- Object detection
- Neural networks
- Masking and blurring
- Deep learning
1) Definition (precise and verifiable)
YOLO is a family of object detection algorithms that formulates detection as a single-pass (single-shot) process — the model predicts bounding boxes and class probabilities directly from the entire image in a single evaluation, enabling real-time operation. The first version (YOLOv1) was introduced in 2016; the baseline achieved ~45 FPS, while “Fast YOLO” reached ~155 FPS on contemporary GPUs [Redmon et al., 2016].
Newer versions (YOLOv4–YOLOv10) extend the speed–accuracy trade-off and modify training and post-processing pipelines (e.g., YOLOv10 introduces NMS-free end-to-end detection) [Bochkovskiy et al., 2020; Wang et al., 2024].
2) Relevance in image and video anonymization
In anonymization pipelines, sensitive objects (faces, license plates) must be detected before being transformed (blurred, pixelated, etc.). YOLO is often used as the detection component due to:
- Low latency (frame-by-frame inference in real time)
- Scalability to edge devices (lightweight model variants)
- High COCO benchmark performance (mAP@[.5:.95], the industry-standard metric)
Legal note: Effective anonymization under GDPR Recital 26 must ensure that individuals are no longer identifiable, considering time, cost, and technology. Detection accuracy—especially false negatives—directly affects residual re-identification risk. Relevant technical terminology is provided in ISO/IEC 20889:2018 (De-identification techniques classification).
3) Core metrics and formulas
IoU (Intersection over Union)
IoU(A,B)=∣A∪B∣/∣A∩B∣
Used to assess overlap between predicted and ground-truth bounding boxes.
AP and mAP (COCO standard)Average Precision (AP) is the area under the precision–recall curve; COCO AP@[.5:.95] averages AP across IoU thresholds from 0.50 to 0.95 (step 0.05).
mAP=1C∑c=1CAPcmAP = \frac{1}{C} \sum_{c=1}^{C} AP_cmAP=C1c=1∑CAPc
Latency and FPS benchmarks
Model | Dataset | Hardware | Throughput |
YOLOv1 | VOC 2007 | Titan X | ~45 FPS (base), 155 FPS (Fast YOLO) |
YOLOv4 | COCO | Tesla V100 | 43.5% AP (COCO), ~65 FPS |
YOLOv10 | COCO | RTX 4090 | up to 46% lower latency vs YOLOv9-C at similar accuracy |
4) Version overview
Version | Year | Authors / Paper | Key features | Reported metrics* |
YOLOv1 | 2016 | Redmon et al. | Unified single-shot detector | 45/155 FPS |
YOLOv4 | 2020 | Bochkovskiy et al. | CSP backbone, CIoU, Mosaic | 43.5% AP (COCO), ~65 FPS |
YOLOv7 | 2022 | Wang et al. | “Trainable bag-of-freebies” | SOTA real-time detector |
YOLOv8 | 2023 | Ultralytics | Simplified architecture (det/seg/pose) | High mAP, low params |
YOLOv9 | 2024 | WongKinYiu | Programmable Gradient Information (PGI) | Improved accuracy |
YOLOv10 | 2024 | Wang et al. | End-to-end, NMS-free detection | Lower latency, higher efficiency |
YOLOv11 | 2024 | Ultralytics | Optimized mAP-to-params ratio | ~22% fewer params vs v8m |
* Values depend on variant (n/s/m/l/x), resolution, and hardware setup.
5) Architecture and processing pipeline
- Backbone – feature extraction (e.g., CSPNet, ELAN).
- Neck – multi-scale feature fusion (FPN/PAN).
- Head – predicts bounding boxes, classes, confidence; newer versions integrate detection without post-NMS.
- Post-processing – traditional Non-Maximum Suppression (NMS) or NMS-free in end-to-end training (YOLOv10).
6) Integration in anonymization systems
Objective: minimize false negatives (missed detections of faces/plates), accepting moderate false positives (extra blur regions).
Recommended operational setup (1080p video, 25–30 FPS, GPU T4/A10):
- Model variant: s or m (speed–accuracy balance)
- Input resolution: 640–960 px on the longest side
- Confidence threshold: 0.2–0.35
- IoU threshold (for NMS): 0.5–0.7
- MOT tracking: combine with multi-object tracker to ensure mask stability
- Validation: measure Recall@IoU=0.5 for critical classes (face, plate); operational target Recall ≥ 0.98
Risks and mitigations
Risk | Mitigation |
Occlusion or low light | increase input resolution, apply brightness augmentations |
Fast motion / blur | use stabilization or higher shutter speed |
Domain mismatch (non-COCO objects) | apply transfer learning on custom domain data |
Legal / standard context
- GDPR Recital 26 – defines anonymous data scope.
- ISO/IEC 20889:2018 – taxonomy of de-identification methods.
- WP29/EDPB 05/2014 – guidelines on anonymization limits and residual risk.
7) Acceptance testing checklist (for DPOs or QA teams)
Metric | Requirement | Comment |
Recall (critical classes) | ≥ 0.98 @ IoU=0.5 | prevent under-anonymization |
Precision | report jointly with Recall | avoid excessive blurring |
Latency (p95) | ≤ 40 ms/frame (edge) or 20 ms (GPU) | real-time threshold |
Temporal stability | ≥ 95 % of frames maintain consistent mask | avoid flickering |
Robustness | tests in night/rain/reflection scenarios | domain coverage |
8) Implementation attributes summary
Attribute | Description | Source |
mAP@[.5:.95] | Averaged AP for IoU thresholds 0.5–0.95 | COCO metric (Lin et al., 2014) |
FPS / latency | Frame processing speed | YOLOv1: 45/155 FPS; YOLOv4: ~65 FPS |
NMS vs E2E | Non-Maximum Suppression vs end-to-end | YOLOv10 – NMS-free |
Model size | Parameters and FLOPs | YOLOv11 – ~22% fewer params vs v8m |
9) References
- Redmon J. et al., You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640 v5, 2016.
- Bochkovskiy A., Wang C.Y., Liao H.Y.M., YOLOv4: Optimal Speed and Accuracy of Object Detection, 2020.
- Wang C.Y. et al., YOLOv7: Trainable Bag-of-Freebies, 2022.
- Ultralytics Documentation, YOLOv8 and YOLOv11 Model Zoo, 2023–2024.
- WongKinYiu, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, 2024.
- Wang A. et al., YOLOv10: Real-Time End-to-End Object Detection, 2024.
- Lin T.Y. et al., Microsoft COCO: Common Objects in Context, 2014.
- GDPR (UE 2016/679), Recital 26 – Definition of anonymous data.
- ISO/IEC 20889:2018 – Privacy enhancing data de-identification terminology and classification of techniques.
- WP29 / EDPB, Opinion 05/2014 on Anonymisation Techniques.