What is YOLO (You Only Look Once)?

YOLO - definition

YOLO (You Only Look Once) is a modern object detection algorithm that enables real-time recognition of multiple objects in images and videos simultaneously. The name reflects the fact that the model “looks” at the entire image once, significantly speeding up processing compared to traditional methods analyzing images patch by patch.

This technology is widely used in AI-powered tools for anonymous processing of visual data, such as blurring faces and license plates in images and videos.

How YOLO works?

The algorithm divides an image into a grid of cells, each responsible for detecting objects whose centers fall within that cell. YOLO predicts bounding boxes and assigns class probabilities. All computations are performed simultaneously, allowing analysis of dozens of frames per second.

Importance of YOLO for image and video anonymization

YOLO is a key component in anonymization tools, enabling fast and accurate detection and localization of faces and license plates for automatic blurring or masking. Its speed and accuracy make it ideal for live applications and large-scale data processing.

Practical applications of YOLO in anonymization

  • Automatic detection and blurring of faces in surveillance footage.
  • Detection and masking of license plates in drone or dashcam recordings.
  • Processing data in media asset management systems requiring GDPR compliance.

Challenges and limitations of YOLO

YOLO's object detection can be less effective with dense scenes, challenging lighting, or overlapping objects. Real-time processing demands powerful infrastructure typically available in on-premise environments.

See also

  • Object detection
  • Neural networks
  • Masking and blurring
  • Deep learning

1) Definition (precise and verifiable)

YOLO is a family of object detection algorithms that formulates detection as a single-pass (single-shot) process — the model predicts bounding boxes and class probabilities directly from the entire image in a single evaluation, enabling real-time operation. The first version (YOLOv1) was introduced in 2016; the baseline achieved ~45 FPS, while “Fast YOLO” reached ~155 FPS on contemporary GPUs [Redmon et al., 2016].

Newer versions (YOLOv4–YOLOv10) extend the speed–accuracy trade-off and modify training and post-processing pipelines (e.g., YOLOv10 introduces NMS-free end-to-end detection) [Bochkovskiy et al., 2020; Wang et al., 2024].

2) Relevance in image and video anonymization

In anonymization pipelines, sensitive objects (faces, license plates) must be detected before being transformed (blurred, pixelated, etc.). YOLO is often used as the detection component due to:

  • Low latency (frame-by-frame inference in real time)
  • Scalability to edge devices (lightweight model variants)
  • High COCO benchmark performance (mAP@[.5:.95], the industry-standard metric)

Legal note: Effective anonymization under GDPR Recital 26 must ensure that individuals are no longer identifiable, considering time, cost, and technology. Detection accuracy—especially false negatives—directly affects residual re-identification risk. Relevant technical terminology is provided in ISO/IEC 20889:2018 (De-identification techniques classification).

3) Core metrics and formulas

IoU (Intersection over Union)

IoU(A,B)=∣A∪B∣/∣A∩B∣​

Used to assess overlap between predicted and ground-truth bounding boxes.

AP and mAP (COCO standard)Average Precision (AP) is the area under the precision–recall curve; COCO AP@[.5:.95] averages AP across IoU thresholds from 0.50 to 0.95 (step 0.05).

mAP=1C∑c=1CAPcmAP = \frac{1}{C} \sum_{c=1}^{C} AP_cmAP=C1​c=1∑C​APc​

Latency and FPS benchmarks

Model

Dataset

Hardware

Throughput

YOLOv1

VOC 2007

Titan X

~45 FPS (base), 155 FPS (Fast YOLO)

YOLOv4

COCO

Tesla V100

43.5% AP (COCO), ~65 FPS

YOLOv10

COCO

RTX 4090

up to 46% lower latency vs YOLOv9-C at similar accuracy

4) Version overview

Version

Year

Authors / Paper

Key features

Reported metrics*

YOLOv1

2016

Redmon et al.

Unified single-shot detector

45/155 FPS

YOLOv4

2020

Bochkovskiy et al.

CSP backbone, CIoU, Mosaic

43.5% AP (COCO), ~65 FPS

YOLOv7

2022

Wang et al.

“Trainable bag-of-freebies”

SOTA real-time detector

YOLOv8

2023

Ultralytics

Simplified architecture (det/seg/pose)

High mAP, low params

YOLOv9

2024

WongKinYiu

Programmable Gradient Information (PGI)

Improved accuracy

YOLOv10

2024

Wang et al.

End-to-end, NMS-free detection

Lower latency, higher efficiency

YOLOv11

2024

Ultralytics

Optimized mAP-to-params ratio

~22% fewer params vs v8m

* Values depend on variant (n/s/m/l/x), resolution, and hardware setup.

5) Architecture and processing pipeline

  1. Backbone – feature extraction (e.g., CSPNet, ELAN).
  2. Neck – multi-scale feature fusion (FPN/PAN).
  3. Head – predicts bounding boxes, classes, confidence; newer versions integrate detection without post-NMS.
  4. Post-processing – traditional Non-Maximum Suppression (NMS) or NMS-free in end-to-end training (YOLOv10).

6) Integration in anonymization systems

Objective: minimize false negatives (missed detections of faces/plates), accepting moderate false positives (extra blur regions).

Recommended operational setup (1080p video, 25–30 FPS, GPU T4/A10):

  • Model variant: s or m (speed–accuracy balance)
  • Input resolution: 640–960 px on the longest side
  • Confidence threshold: 0.2–0.35
  • IoU threshold (for NMS): 0.5–0.7
  • MOT tracking: combine with multi-object tracker to ensure mask stability
  • Validation: measure Recall@IoU=0.5 for critical classes (face, plate); operational target Recall ≥ 0.98

Risks and mitigations

Risk

Mitigation

Occlusion or low light

increase input resolution, apply brightness augmentations

Fast motion / blur

use stabilization or higher shutter speed

Domain mismatch (non-COCO objects)

apply transfer learning on custom domain data

Legal / standard context

  • GDPR Recital 26 – defines anonymous data scope.
  • ISO/IEC 20889:2018 – taxonomy of de-identification methods.
  • WP29/EDPB 05/2014 – guidelines on anonymization limits and residual risk.

7) Acceptance testing checklist (for DPOs or QA teams)

Metric

Requirement

Comment

Recall (critical classes)

≥ 0.98 @ IoU=0.5

prevent under-anonymization

Precision

report jointly with Recall

avoid excessive blurring

Latency (p95)

≤ 40 ms/frame (edge) or 20 ms (GPU)

real-time threshold

Temporal stability

≥ 95 % of frames maintain consistent mask

avoid flickering

Robustness

tests in night/rain/reflection scenarios

domain coverage

8) Implementation attributes summary

Attribute

Description

Source

mAP@[.5:.95]

Averaged AP for IoU thresholds 0.5–0.95

COCO metric (Lin et al., 2014)

FPS / latency

Frame processing speed

YOLOv1: 45/155 FPS; YOLOv4: ~65 FPS

NMS vs E2E

Non-Maximum Suppression vs end-to-end

YOLOv10 – NMS-free

Model size

Parameters and FLOPs

YOLOv11 – ~22% fewer params vs v8m

9) References

  1. Redmon J. et al., You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640 v5, 2016.
  2. Bochkovskiy A., Wang C.Y., Liao H.Y.M., YOLOv4: Optimal Speed and Accuracy of Object Detection, 2020.
  3. Wang C.Y. et al., YOLOv7: Trainable Bag-of-Freebies, 2022.
  4. Ultralytics Documentation, YOLOv8 and YOLOv11 Model Zoo, 2023–2024.
  5. WongKinYiu, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, 2024.
  6. Wang A. et al., YOLOv10: Real-Time End-to-End Object Detection, 2024.
  7. Lin T.Y. et al., Microsoft COCO: Common Objects in Context, 2014.
  8. GDPR (UE 2016/679), Recital 26 – Definition of anonymous data.
  9. ISO/IEC 20889:2018 – Privacy enhancing data de-identification terminology and classification of techniques.
  10. WP29 / EDPB, Opinion 05/2014 on Anonymisation Techniques.