What is Video Frame Sampling?

Video Frame Sampling - definition

Video Frame Sampling is the controlled selection of a subset of frames from a video sequence in order to reduce computational cost, data volume, or to tailor processing to specific task requirements. In practice, it involves selecting every n-th frame, keyframes, frames at scene changes, or frames chosen according to an adaptive rule. This concept differs from frame rate conversion (changing FPS), because it refers to selecting frames for analysis and does not necessarily require re-encoding the entire video stream.

In the context of image and video anonymization, Video Frame Sampling defines how many and which frames are analyzed by algorithms for face detection, license plate detection, tracking, and masking operations. The choice of sampling strategy and density has a direct impact on anonymization completeness as well as on computational cost and processing time. Typical frame rates include 25 and 29.97 fps. Sampling decisions must therefore account for scene dynamics, the source FPS of the material, and GDPR compliance requirements.

The role of Video Frame Sampling in anonymization

Anonymizing faces and license plates requires detecting every instance where an object appears in the footage. Sampling too sparsely may miss short-lived exposures, rapid head turns, or objects visible for only a few frames. Sampling too densely increases computational cost and processing time without a meaningful gain in effectiveness for certain scenes.

In practice, anonymization pipelines combine detection on selected frames with inter-frame tracking to interpolate masks on frames that are not fully analyzed. Tracking methods (e.g., SORT, DeepSORT) reduce the number of detector invocations while maintaining continuous blurring of objects between sampled frames. The obligation to implement appropriate technical and organizational measures stems from the GDPR (Articles 5 and 32) and EDPB guidance on processing data from video devices, which emphasizes reducing the identifiability of individuals (source: EDPB, Guidelines 3/2019, version 2.1, 20 January 2022).

Sampling technologies and strategies

The choice of a frame sampling strategy depends on the type of footage, the codec, and the target detection performance. Below is an overview of the most commonly used approaches and their implications for video anonymization.

Strategy

Description

Use in anonymization

Risk of missed detections

Computational complexity

 

Uniform every n frames

Fixed temporal step, e.g., every 2nd or 5th frame

Simple cost control, predictable behavior

Medium - short exposures may be missed

Low

Keyframe-based sampling

Analysis of I-frames from the GOP in H.264/H.265

Efficient for footage with a regular GOP structure

Medium to high with long GOPs

Low to medium

Scene change detection

Frames selected at abrupt content changes

Focus on moments with the highest variability

Lower in dynamic scenes, higher in static ones

Medium

Motion-adaptive sampling

Denser sampling during high motion, sparser when static

Good balance between cost and event coverage

Low to medium

Medium

Keyframe + tracking

Detection on base frames, mask interpolation via tracking

Common in video detection, effective for anonymization

Low with stable tracking

Medium

At the codec level, GOP structures and I/P/B frames are defined in ITU-T H.264 | ISO/IEC 14496-10 (AVC) and ISO/IEC 23008-2 (HEVC). Using I-frames as samples is an engineering practice that can reduce decoding and analysis cost compared to processing every frame, although in many workflows the sequence (or parts of it) is decoded anyway depending on tools and formats.

Key parameters and metrics in anonymization

Evaluating sampling effectiveness should combine temporal parameters with detection and compliance metrics. The most important attributes are summarized below.

Parameter / metric

Description and relevance

 

Stride k

Fixed sampling step in frames. The larger k is, the lower the cost and the higher the risk of missed detections.

Effective FPS f_eff

f_eff = f_src / k, where f_src is the source frame rate. Determines the temporal density of masking.

Maximum temporal gap

Δt_max ≈ 1 / f_eff. Approximate upper bound of the window between two analyzed frames (for uniform sampling); in practice, unmasked gaps should not occur if masks are propagated via tracking.

Recall_video

Percentage of all face/license plate occurrences in the entire video that were masked. Critical for compliance.

Precision_video

Percentage of applied masks that correspond to real objects. Affects post-processing visual quality.

F1_video

Harmonic mean of precision and recall, enabling comparison of sampling variants.

Processing latency

Time from start to completion of anonymization. Important in batch processing. Gallio PRO does not perform real-time anonymization.

In practice, sampling is combined with CNN-based detectors and inter-frame tracking, as demonstrated in research on video object detection with temporal aggregation (FGFA) and DeepSORT tracking. Reducing detector calls while maintaining result continuity is key to balancing cost and coverage.

Challenges and limitations

Sampling choices are constrained by the technical properties of the footage and by legal requirements. Missed detections are more frequent with motion blur, low exposure, rolling shutter artifacts, and in footage with intense motion.

  • Compliance risk - any missed face or license plate weakens anonymization effectiveness. The EDPB highlights the need for appropriate technical and organizational measures to reduce identifiability in published materials (source: EDPB Guidelines 3/2019).
  • GOP structure - long GOPs in H.264/H.265 make sampling based solely on I-frames less effective.
  • FPS variability - standard frame rates defined by ITU-R and SMPTE require adapting sampling parameters to the source to limit Δt_max.
  • AI models - effectiveness depends on well-trained face and license plate detectors. Deep learning models trained on representative data are essential for automated blurring.

Use cases and implementation practice

In on-premise batch processing, a keyframe + tracking strategy is commonly used: detection on base frames, mask propagation via tracks, followed by selective densification in segments with high uncertainty. This approach reduces cost while maintaining high anonymization coverage.

  • Gallio PRO automatically blurs faces and license plates. It does not support automatic detection of logos, tattoos, documents, or screen content; these elements can be blurred manually in the editor.
  • Gallio PRO does not perform real-time anonymization or process live streams. Frame sampling applies to offline processing of video files and images.
  • Gallio PRO operates on-premise and does not store logs containing face or license plate detection results.
  • In the EU, blurring license plates is often recommended when publishing footage, depending on context and legal basis. Supervisory authority practices vary by country. In Poland, interpretations can be ambiguous, although guidance from the DPA (UODO) and the EDPB points to data minimization.

Standards and references

The following standards and technical publications are relevant to frame sampling, video codecs, and GDPR compliance.

  • ITU-R BT.709-6 - Parameter values for the HDTV standards for production and international programme exchange, 2015. https://www.itu.int/rec/R-REC-BT.709
  • ITU-T H.264 | ISO/IEC 14496-10 - Advanced Video Coding, 2019 edition. https://www.itu.int/rec/T-REC-H.264 and https://www.iso.org/standard/76682.html
  • ISO/IEC 23008-2:2020 - High efficiency coding and media delivery in heterogeneous environments - Part 2: HEVC. https://www.iso.org/standard/79388.html
  • IEC 62676-4:2014 - Video surveillance systems for use in security applications - Part 4: Application guidelines. https://webstore.iec.ch/publication/6027
  • EDPB, Guidelines 3/2019 on the processing of personal data through video devices, version 2.1, 20 January 2022. https://edpb.europa.eu
  • X. Zhu et al., Flow-Guided Feature Aggregation for Video Object Detection, ICCV 2017. https://openaccess.thecvf.com/content_iccv_2017/html/Zhu_Flow-Guided_Feature_Aggregation_ICCV_2017_paper.html
  • N. Wojke et al., Simple Online and Realtime Tracking with a Deep Association Metric (DeepSORT), 2017. https://arxiv.org/abs/1703.07402