What is Computer Vision?

Definition
Application domains & relevance
Core technologies and methods
Challenges and limitations
Specific considerations for anonymisation of visual data
Normative and technical references
Implementation guidance

Definition

Computer vision is a sub‑discipline of artificial intelligence (AI) and machine learning (ML) focused on enabling computer systems to automatically acquire, process, analyse and interpret visual data - such as still images, video streams, multi‑dimensional sensor inputs (for example point clouds or depth maps) - for the purpose of deriving meaningful information or driving autonomous decision‑making. Wikipedia+2IBM+2From a theoretical perspective, computer vision “seeks to automate tasks that the human visual system can do.” WikipediaIn technology‑driven contexts, it aims to empower machines to “see, observe and understand” visual input, akin to human vision, but using cameras, sensors and algorithms. IBM+1In the context of image and video anonymization, computer vision serves as the technical foundation: enabling detection, localisation and tracking of personal‑identifiable elements (faces, licence plates, etc.), thereby allowing downstream anonymisation operations (masking, blur, redaction).

Application domains & relevance

Domain	Use‑case example	Relevance to visual data handling / anonymisation
Public safety / surveillance	Crowd analysis, trespass detection	Requires anonymisation of non‑consenting individuals in video feeds
Automotive (ADAS / autonomous vehicles)	Pedestrian/vehicle/lane detection	Visual feeds captured by vehicles must respect privacy regulations
Healthcare and medical imaging	Automated diagnosis from scans (X‑ray/MRI)	Patient imagery is sensitive and often requires de‑identification
Industrial and manufacturing	Visual inspection of production lines	Cameras may capture workers or sensitive items - anonymisation can be needed
Retail & customer analytics	Customer behaviour tracking, product recognition	Visual analytics must consider privacy and data protection when persons are visible

Core technologies and methods

Key technical components

Technology	Purpose	Notes
Convolutional Neural Networks (CNNs)	Feature extraction from image data, classification & detection	Foundational for many computer‑vision models
Semantic & instance segmentation	Pixel‑level labelling of objects/regions	Enables fine‑grained masking beyond bounding boxes
Object detection	Locating and classifying objects in images or frames	Yields bounding boxes / masks - essential for anonymisation
Object tracking	Following objects across sequential frames (video)	Ensures consistency of anonymisation across time
Optical Character Recognition (OCR)	Extracting text from images/video (e.g., licence plates)	Supports anonymising textual PII in vision feeds
Depth estimation / 3D reconstruction	Recovering 3D structure or depth from visual data	Helps in scene understanding when multi‑sensor data available
Attention / transformer models in vision	Modelling spatial/temporal dependencies in visual data	Emerging in advanced CV systems for robust performance arXiv

Quality metrics and performance targets

Metric	Typical benchmark / target range	Importance in real‑time vision / anonymisation
Accuracy (classification)	e.g., ≥ 90% in controlled settings	Indicates correctness of classification subsystems
mAP (mean Average Precision) for detection	~0.5-0.9 depending on dataset/complexity	Measures how well objects are detected/localised
Frame‑rate (FPS)	≥ 25-30 fps for real‐time video	Needed to maintain fluid processing and timely anonymisation
Latency (response time)	≤ 100‑200 ms (real‐time systems)	Critical so anonymisation occurs promptly, avoiding exposure
False Positive / False Negative rates	Ideally < 5‑10% in high‑risk use cases	Balancing FP/FN is essential in anonymisation workflows
Hardware/inference resources	GPU/TPU/edge ASICs required for high throughput	Infrastructure impacts feasibility and cost

Challenges and limitations

Challenge	Description	Impact on anonymisation or operational use
Variable lighting, reflections, weather	Poor or changing illumination degrades detection accuracy	May increase false negatives (e.g., faces not detected)
Complex backgrounds, occlusion, crowding	Objects may be partially hidden or overlap	Harder to reliably detect and mask sensitive elements
Limited or biased training data	Insufficient coverage of real‑world variation reduces model robustness	May produce errors or propagate bias in detection
Real‑time processing constraints	High resolution or multiple streams raise computational demands	May force trade‑offs - lower accuracy, slower processing
Privacy, legal and ethical issues	Visual data often contains PII; regulatory compliance required	Systems must integrate anonymisation, auditing, DPIA
Inverse problem / 3D from 2D ambiguity	Recovering scene geometry from image alone is ill‑posed Wikipedia	May impair localisation precision for anonymisation tasks

Specific considerations for anonymisation of visual data

In applications where computer vision supports anonymisation of images and video, the following operational aspects are particularly relevant:

Systems must reliably detect personal identifiers (faces, bodies, objects, license plates) across frames and modalities.
Localisation (bounding boxes or segmentation masks) must be sufficiently accurate to cover the sensitive region without excessive non‑sensitive coverage.
For video/live streams, detection, tracking and masking must be synchronised with minimal latency and drift to avoid exposure or artefacts.
False negatives (missed identifiers) pose privacy and regulatory risk; false positives (over‑masking) reduce utility of material.
Detailed logging and audit trails (which object was detected, when, what mask applied) support compliance and enable oversight by data protection officers.
Infrastructure & operations must handle the scale (high resolution, multiple streams, edge/cloud hybrid), while maintaining data security (encryption in transit & at rest), access controls and retention policies.

Normative and technical references

ISO/IEC 22989:2022 - Artificial intelligence - Terminology and classification (covers computer vision concepts).
ISO/IEC 24029‑1:2021 - Assessment of the robustness of neural networks (relevant for vision systems).
European Data Protection Board (EDPB) Guidelines 03/2019 on processing of personal data through video devices - emphasises appropriate technical measures and risk assessment in video systems.
Industry definitions:
- IBM: “Computer vision is a field of artificial intelligence (AI) that uses machine learning and neural networks to teach computers and systems to derive meaningful information from digital images, videos and other visual inputs.” IBM
- Microsoft Azure: “Computer vision enables machines to interpret, analyse, and pull meaningful data from images and videos, replicating human sight and cognitive abilities.” Microsoft Azure
Standard datasets and benchmarks: COCO (Common Objects in Context), ImageNet, OpenImages - used widely to validate vision model performance.

Implementation guidance

Select suitable models depending on anonymisation objective (for example face detection → MTCNN or RetinaFace; general object detection → YOLOv8).
Prepare representative datasets for training or validation that reflect operational conditions (camera angles, lighting, crowd density).
Measure baseline metrics of detection and localisation (e.g., mAP, latency, false‑negative rate) in true operational environment.
Deploy pipeline: image capture → object detection → tracking (if video) → localisation → mask/blur/redaction → output. Ensure end‑to‑end latency is within acceptable bounds.
Provide audit/traceability: record detection events, applied anonymisation actions, timestamps - enables oversight by DPOs and evidence of compliance.
Secure deployment: ensure input/transmission streams are encrypted, access to model outputs is controlled, anonymised data is retained only as needed, and documentation (DPIA) is maintained.

Back to Glossary