What Is L-Diversity?

L-Diversity Definition

L-diversity is a privacy model proposed as an extension of k-anonymity. It was described by A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam in 2007 in a scientific paper on protecting against attribute disclosure in datasets. In the simplest terms, the model requires that each group of records that is indistinguishable with respect to so-called quasi-identifiers contains at least l well-represented values of a sensitive attribute. The goal is to reduce situations in which a person cannot easily be singled out, yet a confidential characteristic linked to their record can still be inferred with high probability.

In the context of photo and video anonymization, this concept does not refer directly to the act of blurring faces or license plates. L-diversity is instead a model for assessing the risk of information disclosure in datasets, metadata, or structured feature sets derived from images and video. It becomes relevant when an organization creates, stores, or shares organized collections of information originating from visual materials, such as scene descriptions, timestamps, locations, object classes, detection results, or event statistics.

If photo or video material has undergone face blurring, but detailed metadata has been retained, the risk of identification or inference may still exist. For example, a combination of location, time, event type, and other features may narrow down the group of possible individuals. In this setting, l-diversity can be used as a supporting criterion when designing secure secondary datasets, but it does not replace image anonymization techniques. In practice, Gallio PRO automatically blurs faces and license plates, while l-diversity applies to the accompanying data layer or analytics datasets built from material after anonymization.

How to Understand L-Diversity in Image and Video Processing

In systems that process images and video, personal data may exist in several layers at the same time. The first layer is the visual content itself, where identifiers include a face, a vehicle license plate, or other features that make identification possible. The second layer consists of metadata and derived features, such as the recording date, geolocation, camera number, event type, number of people in frame, or activity classification.

L-diversity applies mainly to the second layer. If an organization exports anonymized recordings together with a descriptive table, anonymity does not depend solely on the quality of face blurring. It also depends on whether the published groups of records reveal overly uniform information about people or events.

Data Layer

Example

Does L-Diversity Apply?

Practical Notes

 

Pixel data

A face visible in the frame

Not directly

This is where face detection and face blurring are used

Pixel data

A vehicle license plate

Not directly

This is where license plate detection and blurring are used

Metadata

Time, location, event type

Yes

Inference risk remains even after image anonymization

Analytical features

Number of people, object classes, scene tags

Yes

Requires an assessment of quasi-identifiers and sensitive attributes

The Relationship Between L-Diversity and Face and License Plate Anonymization

Image anonymization involves removing or significantly limiting the ability to identify a person or vehicle within the visual material itself. In practice, this means detecting faces and license plates and then blurring them. Automatic detection is most commonly based on machine learning models, including deep learning, because traditional methods relying on simple image features are usually less robust to changes in lighting, angle, occlusion, and recording quality.

This distinction is important. Deep learning is often used to build AI models that detect faces and license plates, which can then be used to anonymize visual content. L-diversity does not describe the quality of the detection model. It also does not specify how strongly a face should be blurred or how large an area of a license plate should be covered. Instead, this model is used to assess the privacy of tabular or structured data that may be generated alongside the photo and video anonymization process.

In practice, this means two separate levels of protection:

  • the visual content level: detecting and blurring faces and license plates,
  • the secondary data level: reducing the risk of identification or inference from metadata and analytical reports, including through k-anonymity, l-diversity, or more advanced privacy models.

Key Parameters and Conditions of L-Diversity

To apply l-diversity, you must first identify the quasi-identifiers and the sensitive attribute. Quasi-identifiers are features that may not identify a person on their own, but when combined with other data can significantly narrow the pool of possible individuals. In data derived from video, these may include camera location, time range, venue category, or event type.

The literature most commonly describes three interpretive variants:

  • distinct l-diversity: each equivalence class contains at least l distinct values of the sensitive attribute,
  • entropy l-diversity: the distribution of sensitive attribute values has sufficiently high entropy,
  • recursive (c, l)-diversity: additionally limits the dominance of the most frequent values in order to avoid only apparent diversity.

A simplified condition for entropy l-diversity can be written as follows:

H(S) = - Σ p(s) log p(s) >= log(l)

where H(S) is the entropy of the sensitive attribute distribution in a given equivalence class, and p(s) is the probability of the value s.

Parameter

Meaning

Practical Meaning for Video Data

 

k

Size of the equivalence class

Minimum number of records with the same quasi-identifiers

l

Minimum diversity of the sensitive attribute

Limits the possibility of guessing a confidential characteristic of an event or person

Entropy

A measure of distribution diversity

Protects against classes dominated by a single value

Limitations of L-Diversity in Protecting Visual Materials

L-diversity is not a sufficient model for the entire process of photo and video anonymization. The literature points out that it may fail with highly skewed data distributions and where sensitive attribute values are semantically similar. This issue has been discussed, among other places, in the development of the t-closeness model, presented by N. Li, T. Li, and S. Venkatasubramanian in 2007.

In practice, for visual materials the limitations are as follows:

  • the model does not protect the image itself if a face or license plate remains visible,
  • the model does not solve identification through scene context, such as a distinctive location or a unique vehicle,
  • the model is difficult to apply to raw unstructured material without first transforming it into tabular form,
  • the condition based solely on the number of distinct values may be too weak if those values are semantically very similar.

Practical Application in an On-Premise Environment

In environments aligned with the principle of data minimization, a sensible approach is to combine several layers of protection. First, the visual material should be anonymized by blurring faces and license plates. Next, the scope of metadata should be limited, and the risk of re-identification in derived datasets should be assessed.

For on-premise solutions, an additional benefit is greater control over data flows, retention, and access policies. However, this does not change the fact that data security also depends on which export datasets are created after processing is complete. L-diversity can be used as an audit criterion for reports, statistics, and event logs built from processed recordings.

Standards and Sources

L-diversity is not a legal standard or an ISO standard. It is a scientific model used in privacy engineering. When assessing compliance in image and video processing, it should be treated as a supporting tool, not as a substitute for obligations arising under data protection law. For visual material processing, the key framework is the GDPR, especially the principles of data minimization, privacy by design, and risk assessment for the rights and freedoms of data subjects.

  • Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M., "l-Diversity: Privacy Beyond k-Anonymity", ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.
  • Li N., Li T., Venkatasubramanian S., "t-Closeness: Privacy Beyond k-Anonymity and l-Diversity", ICDE 2007, IEEE.
  • Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 - GDPR.