Video Frame Sampling - definition
Video Frame Sampling is the controlled selection of a subset of frames from a video sequence in order to reduce computational cost, data volume, or to tailor processing to specific task requirements. In practice, it involves selecting every n-th frame, keyframes, frames at scene changes, or frames chosen according to an adaptive rule. This concept differs from frame rate conversion (changing FPS), because it refers to selecting frames for analysis and does not necessarily require re-encoding the entire video stream.
In the context of image and video anonymization, Video Frame Sampling defines how many and which frames are analyzed by algorithms for face detection, license plate detection, tracking, and masking operations. The choice of sampling strategy and density has a direct impact on anonymization completeness as well as on computational cost and processing time. Typical frame rates include 25 and 29.97 fps. Sampling decisions must therefore account for scene dynamics, the source FPS of the material, and GDPR compliance requirements.
The role of Video Frame Sampling in anonymization
Anonymizing faces and license plates requires detecting every instance where an object appears in the footage. Sampling too sparsely may miss short-lived exposures, rapid head turns, or objects visible for only a few frames. Sampling too densely increases computational cost and processing time without a meaningful gain in effectiveness for certain scenes.
In practice, anonymization pipelines combine detection on selected frames with inter-frame tracking to interpolate masks on frames that are not fully analyzed. Tracking methods (e.g., SORT, DeepSORT) reduce the number of detector invocations while maintaining continuous blurring of objects between sampled frames. The obligation to implement appropriate technical and organizational measures stems from the GDPR (Articles 5 and 32) and EDPB guidance on processing data from video devices, which emphasizes reducing the identifiability of individuals (source: EDPB, Guidelines 3/2019, version 2.1, 20 January 2022).
Sampling technologies and strategies
The choice of a frame sampling strategy depends on the type of footage, the codec, and the target detection performance. Below is an overview of the most commonly used approaches and their implications for video anonymization.
Strategy | Description | Use in anonymization | Risk of missed detections | Computational complexity
|
|---|---|---|---|---|
Uniform every n frames | Fixed temporal step, e.g., every 2nd or 5th frame | Simple cost control, predictable behavior | Medium - short exposures may be missed | Low |
Keyframe-based sampling | Analysis of I-frames from the GOP in H.264/H.265 | Efficient for footage with a regular GOP structure | Medium to high with long GOPs | Low to medium |
Scene change detection | Frames selected at abrupt content changes | Focus on moments with the highest variability | Lower in dynamic scenes, higher in static ones | Medium |
Motion-adaptive sampling | Denser sampling during high motion, sparser when static | Good balance between cost and event coverage | Low to medium | Medium |
Keyframe + tracking | Detection on base frames, mask interpolation via tracking | Common in video detection, effective for anonymization | Low with stable tracking | Medium |
At the codec level, GOP structures and I/P/B frames are defined in ITU-T H.264 | ISO/IEC 14496-10 (AVC) and ISO/IEC 23008-2 (HEVC). Using I-frames as samples is an engineering practice that can reduce decoding and analysis cost compared to processing every frame, although in many workflows the sequence (or parts of it) is decoded anyway depending on tools and formats.
Key parameters and metrics in anonymization
Evaluating sampling effectiveness should combine temporal parameters with detection and compliance metrics. The most important attributes are summarized below.
Parameter / metric | Description and relevance
|
|---|---|
Stride k | Fixed sampling step in frames. The larger k is, the lower the cost and the higher the risk of missed detections. |
Effective FPS f_eff | f_eff = f_src / k, where f_src is the source frame rate. Determines the temporal density of masking. |
Maximum temporal gap | Δt_max ≈ 1 / f_eff. Approximate upper bound of the window between two analyzed frames (for uniform sampling); in practice, unmasked gaps should not occur if masks are propagated via tracking. |
Recall_video | Percentage of all face/license plate occurrences in the entire video that were masked. Critical for compliance. |
Precision_video | Percentage of applied masks that correspond to real objects. Affects post-processing visual quality. |
F1_video | Harmonic mean of precision and recall, enabling comparison of sampling variants. |
Processing latency | Time from start to completion of anonymization. Important in batch processing. Gallio PRO does not perform real-time anonymization. |
In practice, sampling is combined with CNN-based detectors and inter-frame tracking, as demonstrated in research on video object detection with temporal aggregation (FGFA) and DeepSORT tracking. Reducing detector calls while maintaining result continuity is key to balancing cost and coverage.
Challenges and limitations
Sampling choices are constrained by the technical properties of the footage and by legal requirements. Missed detections are more frequent with motion blur, low exposure, rolling shutter artifacts, and in footage with intense motion.
- Compliance risk - any missed face or license plate weakens anonymization effectiveness. The EDPB highlights the need for appropriate technical and organizational measures to reduce identifiability in published materials (source: EDPB Guidelines 3/2019).
- GOP structure - long GOPs in H.264/H.265 make sampling based solely on I-frames less effective.
- FPS variability - standard frame rates defined by ITU-R and SMPTE require adapting sampling parameters to the source to limit Δt_max.
- AI models - effectiveness depends on well-trained face and license plate detectors. Deep learning models trained on representative data are essential for automated blurring.
Use cases and implementation practice
In on-premise batch processing, a keyframe + tracking strategy is commonly used: detection on base frames, mask propagation via tracks, followed by selective densification in segments with high uncertainty. This approach reduces cost while maintaining high anonymization coverage.
- Gallio PRO automatically blurs faces and license plates. It does not support automatic detection of logos, tattoos, documents, or screen content; these elements can be blurred manually in the editor.
- Gallio PRO does not perform real-time anonymization or process live streams. Frame sampling applies to offline processing of video files and images.
- Gallio PRO operates on-premise and does not store logs containing face or license plate detection results.
- In the EU, blurring license plates is often recommended when publishing footage, depending on context and legal basis. Supervisory authority practices vary by country. In Poland, interpretations can be ambiguous, although guidance from the DPA (UODO) and the EDPB points to data minimization.
Standards and references
The following standards and technical publications are relevant to frame sampling, video codecs, and GDPR compliance.
- ITU-R BT.709-6 - Parameter values for the HDTV standards for production and international programme exchange, 2015. https://www.itu.int/rec/R-REC-BT.709
- ITU-T H.264 | ISO/IEC 14496-10 - Advanced Video Coding, 2019 edition. https://www.itu.int/rec/T-REC-H.264 and https://www.iso.org/standard/76682.html
- ISO/IEC 23008-2:2020 - High efficiency coding and media delivery in heterogeneous environments - Part 2: HEVC. https://www.iso.org/standard/79388.html
- IEC 62676-4:2014 - Video surveillance systems for use in security applications - Part 4: Application guidelines. https://webstore.iec.ch/publication/6027
- EDPB, Guidelines 3/2019 on the processing of personal data through video devices, version 2.1, 20 January 2022. https://edpb.europa.eu
- X. Zhu et al., Flow-Guided Feature Aggregation for Video Object Detection, ICCV 2017. https://openaccess.thecvf.com/content_iccv_2017/html/Zhu_Flow-Guided_Feature_Aggregation_ICCV_2017_paper.html
- N. Wojke et al., Simple Online and Realtime Tracking with a Deep Association Metric (DeepSORT), 2017. https://arxiv.org/abs/1703.07402