What is a Privacy Budget?

Privacy Budget - definition

Privacy Budget is a measure of the total allowable privacy loss in a system that uses differential privacy mechanisms. Formally, it refers to mechanisms that satisfy (ε, δ)-differential privacy, where ε (epsilon) and δ (delta) parameterize an upper bound on the risk of disclosing information about a single record. The privacy budget accumulates with each operation performed on the data (so‑called composition) and sets a limit on the total amount of privacy “spent” across multiple queries or training iterations.

Differential privacy definition: a mechanism M provides (ε, δ)-DP if, for any neighboring datasets D and D′ and for any set of outputs S, the following holds: P[M(D) ∈ S] ≤ eε P[M(D′) ∈ S] + δ (see Dwork et al., 2006; Dwork, Roth, 2014). In practice, the privacy budget describes how much total “ε” (and the corresponding “δ”) can be spent across all data operations.

Translations: EN: Privacy Budget; DE: Datenschutzbudget / Privatsphärenbudget; FR: Budget de confidentialité; ES: Presupuesto de privacidad.

Role in image and video anonymization

In image and video anonymization, the privacy budget becomes critical when machine learning is used with explicit control over information leakage about individuals appearing in the source material. This is particularly relevant when training models for face detection and license plate detection, which are then used for automated blurring. The use of DP-SGD or privacy-preserving label aggregation methods means that each training step “consumes” part of the privacy budget, while a Data Protection Officer (DPO) can oversee the overall risk level by setting limits for ε and δ.

The blurring process itself (e.g., face blurring in the output video) does not require a privacy budget, provided that no additional data or logs are collected that would allow personal data to be reconstructed. The privacy budget matters at the stage of building AI models from datasets containing images of people and when publishing statistics derived from video datasets. This distinction aligns with the view that differential privacy is a protection mechanism applied during data processing and model training, rather than merely a visual editing technique (WP29, 2014; ISO/IEC 20889:2018; ISO/IEC 27559:2022).

In practice, the privacy budget is tied to specific techniques for adding controlled noise and for privacy accounting. Below is a summary of the most important approaches used when training models for face and license plate blurring.

  • DP-SGD: stochastic gradient descent with gradient clipping and noise addition, providing (ε, δ)-DP at the level of training runs. It offers formal privacy guarantees at the cost of reduced model accuracy (Abadi et al., CCS 2016).
  • PATE: aggregation of labels from multiple “teacher” models with added noise, limiting the information revealed about individual training examples (Papernot et al., ICLR 2017/2018).
  • RDP and accounting: privacy accounting using Rényi Differential Privacy and the moments accountant enables tighter composition bounds and more accurate estimation of the total privacy budget (Mironov, S&P 2017; Abadi et al., 2016).
  • Tools: TensorFlow Privacy and Opacus (PyTorch) libraries implement DP-SGD and privacy accounting, supporting practical control of ε and δ during detector training (TF Privacy and Opacus documentation).

Key parameters and metrics (Privacy Budget)

Differential privacy parameters are precisely defined in the literature and standards. In Gallio PRO processes, the most important parameters are those that determine the strength of privacy protection during training of face and license plate recognition models.

Parameter / metric

Meaning and notes

Source

 

ε (epsilon)

Strength of the DP guarantee. Smaller ε means stronger privacy protection and usually lower model utility. ε ≥ 0.

Dwork, Roth (2014)

δ (delta)

Probability of violating the DP guarantee. It should be “negligible” relative to dataset size.

Dwork, Roth (2014)

Composition

Total privacy budget grows with the number of queries or epochs. Advanced composition and RDP allow tighter estimation of cumulative (ε, δ).

Dwork, Roth (2014); Mironov (2017)

Accountant

Moments accountant and RDP accountant methods are used to accurately track budget consumption during training.

Abadi et al. (2016); Mironov (2017)

Model utility

mAP / precision / recall for face and license plate detection. Quality degradation depends on ε, δ, and noise calibration.

Abadi et al. (2016)

Public-sector example

The US Census Bureau used a privacy budget of ε = 12.2 for 2020 Census redistricting data, illustrating large-scale budget allocation.

US Census Bureau (2021)

Challenges and limitations

Planning a privacy budget requires balancing privacy protection against detection quality. The interpretation of “what ε means” is unintuitive for business users and requires education. It is also important to distinguish between two different meanings of “privacy budget”: in differential privacy it refers to (ε, δ) parameters, while in the browser advertising ecosystem a separate concept was proposed to limit fingerprinting via API query budgets. The latter is not equivalent to DP (see WICG/Chromium discussions under the Privacy Sandbox).

  • No regulatory threshold: EU law does not define acceptable ε/δ values. A risk-based approach and adequacy of anonymization effects are required (GDPR, Recital 26; WP29, 2014).
  • Continuous composition: repeated experiments and model retraining accumulate privacy loss. Proper accounting and “reset” policies on new data are necessary.
  • Quality trade-off: overly restrictive ε values may reduce face or license plate detector mAP to operationally unacceptable levels.

Practical use cases in Gallio PRO

In Gallio PRO deployments, the privacy budget is relevant when building or fine-tuning models that detect faces and license plates for automated blurring. A Data Protection Officer can define (ε, δ) limits for the training process and composition rules.

  • Face detector training with DP-SGD: the number of epochs, sample size, and noise level determine the final (ε, δ). Once the limit is reached, training must stop or be redesigned.
  • Label aggregation: in video labeling projects, PATE mechanisms can be used to control the privacy budget during teacher vote aggregation.
  • Metadata export: publishing dataset statistics (e.g., distribution of faces per frame) should be covered by the same privacy budget.
  • Operational use: Gallio PRO does not perform real-time anonymization and does not collect logs of face or license plate detections, reducing secondary risks related to personal data leakage.

Normative references and standards

Regulations and standards define terminology and best practices, even though they do not set hard thresholds for ε and δ.

  • GDPR (EU) 2016/679, Recital 26 - definition of anonymous information and the requirement to consider “all means reasonably likely” to identify individuals.
  • WP29 (now EDPB), Opinion 05/2014 on Anonymisation Techniques - classification of techniques and risk assessment criteria.
  • ISO/IEC 20889:2018 - Privacy enhancing data de-identification - terminology and classification of techniques.
  • ISO/IEC 27559:2022 - Privacy enhancing data de-identification framework.
  • NISTIR 8053:2015 - De-Identification of Personal Information - frameworks for evaluating techniques and risks.

Sources and further reading

The following materials provide definitions, composition accounting, and real-world examples in deep learning and public-sector policy.

  • C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, TCC 2006 - definition of (ε, δ)-DP.
  • C. Dwork, A. Roth, The Algorithmic Foundations of Differential Privacy, FnT TCS, 2014 - theoretical foundations and composition.
  • M. Abadi et al., Deep Learning with Differential Privacy, CCS 2016 - DP-SGD and moments accountant.
  • I. Mironov, Rényi Differential Privacy, IEEE S&P 2017 - RDP and accounting.
  • N. Papernot et al., Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (PATE), ICLR 2017/2018.
  • US Census Bureau, Disclosure Avoidance System for the 2020 Census, parameter selection (ε = 12.2 for redistricting data), 2021 - technical documentation.
  • WICG/Chromium, Privacy Sandbox - discussions on “privacy budget” in browsers (a concept distinct from DP).