What is Computer Vision?

Computer Vision - definition

Computer vision is an interdisciplinary field within artificial intelligence and machine learning focused on enabling computers to automatically analyze, interpret, and understand images and video sequences, extracting meaningful and actionable information from visual data.

The goal of computer vision systems is to allow machines to perceive and make sense of visual inputs with a level of understanding similar to that of humans, enabling tasks such as object recognition, scene understanding, and activity interpretation in diverse real-world environments.

How Computer Vision Works

The computer vision process generally involves three key stages:

  • Image Acquisition: Capturing images or video through cameras, sensors, or scanners.
  • Preprocessing: Enhancing image quality by noise reduction, normalization, and correction.
  • Analysis and Interpretation: Applying advanced algorithms and machine learning models to detect, classify, segment, and understand objects or motions within the images or video streams.

Applications of Computer Vision

  • Security and Surveillance: face recognition and monitoring public spaces.
  • Automotive: autonomous driving systems for obstacle detection and navigation.
  • Healthcare: medical image analysis and diagnostics.
  • Manufacturing: quality control and defect detection.
  • Retail: product recognition, inventory management, and customer behavior analysis.

Technologies Used in Computer Vision

Core technologies include convolutional neural networks (CNNs), which enable deep feature extraction from images, image segmentation techniques, object detection algorithms, 3D reconstruction, and synthetic data generation.

Challenges and Limitations of Computer Vision

Computer vision systems face challenges such as variable lighting conditions, complex backgrounds, fast motion, limited image quality, and high computational demands. Ethical and privacy concerns especially regarding personal data protection also present crucial considerations.

See Also

  • Machine Learning
  • Artificial Intelligence (AI)
  • Deep Learning
  • Image Segmentation

Poprawna wersja

Computer Vision

Definition

Computer vision is a sub‑discipline of artificial intelligence (AI) and machine learning (ML) focused on enabling computer systems to automatically acquire, process, analyse and interpret visual data - such as still images, video streams, multi‑dimensional sensor inputs (for example point clouds or depth maps) - for the purpose of deriving meaningful information or driving autonomous decision‑making. Wikipedia+2IBM+2From a theoretical perspective, computer vision “seeks to automate tasks that the human visual system can do.” WikipediaIn technology‑driven contexts, it aims to empower machines to “see, observe and understand” visual input, akin to human vision, but using cameras, sensors and algorithms. IBM+1In the context of image and video anonymization, computer vision serves as the technical foundation: enabling detection, localisation and tracking of personal‑identifiable elements (faces, licence plates, etc.), thereby allowing downstream anonymisation operations (masking, blur, redaction).

Application domains & relevance

Domain

Use‑case example

Relevance to visual data handling / anonymisation

Public safety / surveillance

Crowd analysis, trespass detection

Requires anonymisation of non‑consenting individuals in video feeds

Automotive (ADAS / autonomous vehicles)

Pedestrian/vehicle/lane detection

Visual feeds captured by vehicles must respect privacy regulations

Healthcare and medical imaging

Automated diagnosis from scans (X‑ray/MRI)

Patient imagery is sensitive and often requires de‑identification

Industrial and manufacturing

Visual inspection of production lines

Cameras may capture workers or sensitive items - anonymisation can be needed

Retail & customer analytics

Customer behaviour tracking, product recognition

Visual analytics must consider privacy and data protection when persons are visible

Core technologies and methods

Key technical components

Technology

Purpose

Notes

Convolutional Neural Networks (CNNs)

Feature extraction from image data, classification & detection

Foundational for many computer‑vision models

Semantic & instance segmentation

Pixel‑level labelling of objects/regions

Enables fine‑grained masking beyond bounding boxes

Object detection

Locating and classifying objects in images or frames

Yields bounding boxes / masks - essential for anonymisation

Object tracking

Following objects across sequential frames (video)

Ensures consistency of anonymisation across time

Optical Character Recognition (OCR)

Extracting text from images/video (e.g., licence plates)

Supports anonymising textual PII in vision feeds

Depth estimation / 3D reconstruction

Recovering 3D structure or depth from visual data

Helps in scene understanding when multi‑sensor data available

Attention / transformer models in vision

Modelling spatial/temporal dependencies in visual data

Emerging in advanced CV systems for robust performance arXiv

Quality metrics and performance targets

Metric

Typical benchmark / target range

Importance in real‑time vision / anonymisation

Accuracy (classification)

e.g., ≥ 90% in controlled settings

Indicates correctness of classification subsystems

mAP (mean Average Precision) for detection

~0.5-0.9 depending on dataset/complexity

Measures how well objects are detected/localised

Frame‑rate (FPS)

≥ 25-30 fps for real‐time video

Needed to maintain fluid processing and timely anonymisation

Latency (response time)

≤ 100‑200 ms (real‐time systems)

Critical so anonymisation occurs promptly, avoiding exposure

False Positive / False Negative rates

Ideally < 5‑10% in high‑risk use cases

Balancing FP/FN is essential in anonymisation workflows

Hardware/inference resources

GPU/TPU/edge ASICs required for high throughput

Infrastructure impacts feasibility and cost

Challenges and limitations

Challenge

Description

Impact on anonymisation or operational use

Variable lighting, reflections, weather

Poor or changing illumination degrades detection accuracy

May increase false negatives (e.g., faces not detected)

Complex backgrounds, occlusion, crowding

Objects may be partially hidden or overlap

Harder to reliably detect and mask sensitive elements

Limited or biased training data

Insufficient coverage of real‑world variation reduces model robustness

May produce errors or propagate bias in detection

Real‑time processing constraints

High resolution or multiple streams raise computational demands

May force trade‑offs - lower accuracy, slower processing

Privacy, legal and ethical issues

Visual data often contains PII; regulatory compliance required

Systems must integrate anonymisation, auditing, DPIA

Inverse problem / 3D from 2D ambiguity

Recovering scene geometry from image alone is ill‑posed Wikipedia

May impair localisation precision for anonymisation tasks

Specific considerations for anonymisation of visual data

In applications where computer vision supports anonymisation of images and video, the following operational aspects are particularly relevant:

  • Systems must reliably detect personal identifiers (faces, bodies, objects, license plates) across frames and modalities.
  • Localisation (bounding boxes or segmentation masks) must be sufficiently accurate to cover the sensitive region without excessive non‑sensitive coverage.
  • For video/live streams, detection, tracking and masking must be synchronised with minimal latency and drift to avoid exposure or artefacts.
  • False negatives (missed identifiers) pose privacy and regulatory risk; false positives (over‑masking) reduce utility of material.
  • Detailed logging and audit trails (which object was detected, when, what mask applied) support compliance and enable oversight by data protection officers.
  • Infrastructure & operations must handle the scale (high resolution, multiple streams, edge/cloud hybrid), while maintaining data security (encryption in transit & at rest), access controls and retention policies.

Normative and technical references

  • ISO/IEC 22989:2022 - Artificial intelligence - Terminology and classification (covers computer vision concepts).
  • ISO/IEC 24029‑1:2021 - Assessment of the robustness of neural networks (relevant for vision systems).
  • European Data Protection Board (EDPB) Guidelines 03/2019 on processing of personal data through video devices - emphasises appropriate technical measures and risk assessment in video systems.
  • Industry definitions:
    • IBM: “Computer vision is a field of artificial intelligence (AI) that uses machine learning and neural networks to teach computers and systems to derive meaningful information from digital images, videos and other visual inputs.” IBM
    • Microsoft Azure: “Computer vision enables machines to interpret, analyse, and pull meaningful data from images and videos, replicating human sight and cognitive abilities.” Microsoft Azure
  • Standard datasets and benchmarks: COCO (Common Objects in Context), ImageNet, OpenImages - used widely to validate vision model performance.

Implementation guidance

  • Select suitable models depending on anonymisation objective (for example face detection → MTCNN or RetinaFace; general object detection → YOLOv8).
  • Prepare representative datasets for training or validation that reflect operational conditions (camera angles, lighting, crowd density).
  • Measure baseline metrics of detection and localisation (e.g., mAP, latency, false‑negative rate) in true operational environment.
  • Deploy pipeline: image capture → object detection → tracking (if video) → localisation → mask/blur/redaction → output. Ensure end‑to‑end latency is within acceptable bounds.
  • Provide audit/traceability: record detection events, applied anonymisation actions, timestamps - enables oversight by DPOs and evidence of compliance.
  • Secure deployment: ensure input/transmission streams are encrypted, access to model outputs is controlled, anonymised data is retained only as needed, and documentation (DPIA) is maintained.