What is Differential Privacy?

Definition

Differential Privacy (DP) is a mathematical privacy framework that ensures that the output of a statistical query does not reveal whether any specific individual is present in the input dataset. DP achieves this by adding calibrated random noise to query results so that the inclusion or removal of one record has a negligible impact on the output.

Formal definition (normative quote): A randomized mechanism M provides ε-differential privacy if for all datasets differing by a single record and all measurable subsets S of outputs:

\[ \Pr[M(D_1) \in S] \le e^{\varepsilon} \cdot \Pr[M(D_2) \in S]. \]

The extended form (ε, δ)-DP allows a small probability δ of exceeding this bound.

Parameters and mechanisms

Parameter / Mechanism

Description

ε (epsilon)

Quantifies privacy loss; lower ε means stronger privacy but greater distortion.

δ (delta)

Slack parameter for approximate DP; allows low-probability deviations.

Sensitivity (Δf)

Maximum influence of one record on a query result.

Noising mechanisms

Laplace, Gaussian - fundamental noise-generating methods.

Composition

Defines how privacy loss accumulates across multiple queries.

Noise scaling

In the Laplace mechanism, noise is sampled from:

\[ \text{Laplace}(0, \frac{\Delta f}{\varepsilon}) \]

where sensitivity Δf determines the scale of the distribution.

Advantages

  • Provides auditable, mathematically defined privacy guarantees.
  • Resistant to adversaries with auxiliary background knowledge.
  • Enables safe release of aggregated statistics.
  • Integrates with ML, federated learning and large‑scale analytics.

Limitations

  • Accuracy decreases with stronger privacy (lower ε).
  • Repeated queries accumulate privacy loss (privacy budget).
  • DP protects query outputs, not infrastructure (logs, metadata).
  • Less suitable for applications requiring precise or deterministic values.

Applications in image and video anonymization

DP is not used to blur faces or obfuscate pixels directly. Instead, its value lies in protecting metadata and aggregated outputs derived from visual analytics:

  • CCTV statistics - counts of events or detected objects with privacy guarantees.
  • Video analytics - aggregated behavioural metrics without revealing identifiable traces.
  • Research datasets - sharing anonymized labels, counts or metadata extracted from images.
  • Federated ML systems - training models on visual data with differential privacy noise applied.

Relevance for Data Protection Officers

Differential Privacy complements visual anonymization by protecting aggregated insights derived from image and video data. It ensures that statistical reporting or analytics does not reintroduce identifiable information even when underlying datasets originate from sensitive imagery.

Variants and standards

  • ε‑DP - canonical definition.
  • (ε, δ)-DP - approximate differential privacy.
  • Local Differential Privacy (LDP).
  • Distributed / Federated DP.