Data privacy risk assessment and mitigation strategies in modern IT environments

Anurag Chaudhary

1/2/2025

Application of data science on datasets can reveal sensitive data
Data aggregated from different sources makes it challenging to attribute ownership of specific data points to individual users
Data sharing across organizations for computational or research purposes risks data loss or exposure
Conducting privacy risk assessment and developing mitigation strategies
Identifying data sources and types
Charting a data flow map
Implementing data governance policies and procedures is a top priority. It embodies the following measures:
Moving towards a future of responsible data handling
About Gallio PRO

Data privacy is often hailed as a fundamental right. It requires eliminating all harmful actions that can materialize into an organization exposing their customers’, employees’, intellectual, or financial data. Processing consumers’ personal data by organizations, either to enhance consumer experience or to optimize sales, poses risks to the privacy of individuals.

Regulatory bodies’ and consumers’ expectations around data privacy call for a proper management and governance structure that balances data privacy and utility. Risk management plays a vital role in data protection, adjusting the requirements necessary to comply with privacy laws and mitigating risks arising from unwarranted data processing.

As per Article 4 of GDPR, data processing refers to a set of operations performed on personal data, including but not limited to the collection, storage, alteration, disclosure, and disposal of personal data. Privacy risks can affect any or all operations of the data life cycle, resulting in legal penalties, loss of consumer trust, and reputational damage for a company.

An organization uses personal data internally in many ways, typically allowing data engineers to build data sets, data scientists to train ML models, and data analysts to generate insights and reports. Simply handing over critical information to data wranglers can be risky or substandard. Ideally, first, the characteristics or potential risks of the dataset should be explored to identify issues that may result in data exposure or breach.

Application of data science on datasets can reveal sensitive data

Data science involves analyzing and processing a large amount of data. Techniques used in data science, such as linkage or combining multiple datasets, can potentially uncover patterns, revealing sensitive information.

Data analytics or machine learning techniques can reveal correlations between seemingly unrelated data points, which may be indicative of sensitive information. For instance, while analyzing a dataset that contains information like individuals’ web browsing history or medical data, a data scientist may discover that individuals who browse for specific medical conditions are likely to have a particular health condition. Such correlations can potentially reveal sensitive health information.
Even after anonymization, machine learning models can be manipulated to re-identify or de-anonymize data. The presence of enough unique identifiers or a combination of datasets with other sources of information enables re-identification. For instance, in 2008, two researchers using statistical techniques matched a supposedly anonymous Netflix Prize dataset containing movie ratings to specific IMDB movie ratings that were publicly available, re-identifying individual users and inferring sensitive information about their movie preferences.
A key aspect of data privacy is minimizing the risk of re-identifying an individual from seemingly anonymous information, even after anonymization techniques have been applied and combined with different datasets. This objective is achievable with the implementation of effective anonymization techniques. Gallio Pro specializes in providing innovative solutions related to photo and video anonymization. Its irreversible anonymization solutions help organizations comply with data protection regulations while still leveraging data for intended purposes.

Download free demo

Data aggregated from different sources makes it challenging to attribute ownership of specific data points to individual users

Data gathered from distinct sources for business analytics make it hard to trace their origin and, therefore, challenging to attribute ownership of specific data points to individual users. For example, a healthcare organization may aggregate electronic health records, bills, and patient feedback to gain insights into patient satisfaction, but enabling the exercise of data subject rights may not be practical. Such instances require keeping tags on anonymized data or making use of AI-powered Data Subject Access Request (DSAR) solutions to provide users with greater transparency and control over their data.

Data sharing across different organizations becomes necessary to access larger or more diverse datasets for collaborative research, business partnerships, industry analysis, data mining, and emergency response purposes. Such scenarios give rise to the risk of unauthorized access, data breaches, or loss of data control or ownership, especially when multiple parties do not trust each other or when the sensitivity of the data doesn’t allow for free access or sharing. Lack of appropriate security measures, in conjunction with inadequate data sharing agreements, in such scenarios aggravates the challenges with data privacy and regulatory compliance.

Ideally, first the characteristics or potential risks of the dataset should be explored to identify issues that may result in data exposure or breach. An IT manager should consider a range of technical, legal, or ethical requirements to protect the personal data of individuals represented in the data.

Conducting privacy risk assessment and developing mitigation strategies

IT managers are solely responsible for all such tasks in small- to medium-sized enterprises that are distributed among CISOs, DPOs, and privacy managers as soon as the organizational needs evolve. Carrying out a privacy risk assessment is one key task where IT managers play an important role in assisting other privacy stakeholders in an organization to effectively identify and rectify privacy risks.

Article 35 of the GDPR requires organizations to conduct a privacy impact assessment (PIA), especially when the processing of sensitive information is involved and its results can risk the rights and freedoms of data subjects. PIA provides a framework for identifying, assessing, and mitigating privacy risks associated with an organization's products, operations, or services.

Conducting a PIA helps IT managers find privacy risks early in the planning process, prompting proactive privacy and security considerations in their operations. Based on the findings of the risk assessment, they mark each processing activity as a “go” or “no go.” They analyze the levels of risks and evaluate whether to retain, modify, transfer, or avoid them. They further communicate the severity of the risk to the other privacy stakeholders for informed decision-making.

Identifying data sources and types

The IT manager should have a focus on identifying sources of all sources of personal data, irrespective of their whereabouts—in siloes, structured, or unstructured formats. For example, unstructured data is typically found in emails, social media feeds, documents, and multimedia files. Techniques like machine learning algorithm training, natural language processing (NLP), named entity recognition (NER), data discovery tools, and keyword matching help extract sensitive data from unstructured datasets.

Structured data includes financial data, sales data, user data, etc. These are normally stored in databases, data lakes, data warehouses, application logs, and cloud services. Extracting sensitive data from structured data encompasses identifying the scope of the sensitive data elements as per the applicable governing laws or regulations, developing extraction rules like regular expressions and string matching, and securely storing the data in encrypted or masked formats.

Charting a data flow map

Data mapping is a recognized way of tracing the digital railroad of data used within a business's IT infrastructure. It catalogues sensitive information from its origin to its transit through and beyond the organization to where it is stored. The overlay captures all stages in the life cycle of sensitive information, including protocols, encryption status, retention policies, access controls, etc.

Anonymized man drawing diagrams on a whiteboard

Data flow maps also help businesses become GDPR compliant. It proves beneficial in keeping records of processing activities (Article 30), performing DPIAs (Article 35), demonstrating privacy by design (Article 5), establishing a lawful basis for processing (Article 6), detailing data practices (Article 12), and managing data subject access requests (Articles 15–18, 20–21).

Determine the data flow
- Identify the types of personal data the business collects, including customer data, employee data, vendor data, partner data, or any individual who interacts with the business.
- Identify the methods of collecting data. Its origin may be through cookies, social media accounts, online forms, paper forms, in-person interactions, phone calls, etc.
- Identify data processing operations and activities. The data may be sorted, analyzed, filtered, transformed, or mixed with other data through the use of automated tools or manually.
- Identify where personal data is stored. Data storage systems comprise CRM systems, company servers, local machines, paper records, databases, cloud-based storage systems, etc., including backups and archives.
Determine the data access
- Identify individuals with access to personal data, such as IT administrators, data processors, customer care representatives, marketing personnel, contractors, and third-party service providers.
- Identify their roles and responsibilities. It assists in determining the job functions and tasks performed by specific individuals, ensuring that access privileges are granted only to those with legitimate needs.
Implement access-based controls
- Develop governing guidelines and rules: Policies establish workflows for approving access requests, empowering businesses to identify who has access privileges, monitor access, and revoke access when no longer required.
- Access Control Models: Diversify access controls as role-based (RBAC) and attribute-based (ABAC). RBAC ties access privileges to jobs and responsibilities, whereas ABAC focuses on attributes associated with the user (department, security clearance level, role, group), the resources being accessed (sensitivity, type, ownership), and the environment in which access is requested (device type, geographic location of the user, network security level).
- Authentication and authorization mechanisms: Authorization mechanisms involve setting up access permissions and configuring systems to restrict permissions based on users’ roles and attributes of the resources. Authentication mechanisms involve passwords, biometric authentication, or multi-factor authentication to verify the identity of individuals attempting to access personal data.

Implementing data governance policies and procedures is a top priority. It embodies the following measures:

Data classification: Data classification helps organizations identify and protect their most sensitive data. Data can be classified based on various factors, including the level of sensitivity of the data, the legal requirements governing the data, and the impact on the organization’s operations and reputation in case of a breach. IT managers should label each data type with designations like top secret, secret, confidential, or restricted.
Data encryption: IT managers should ensure that the data sets are properly anonymized or de-identified before allowing data ninjas to run any processing mechanisms. Data encryption converts sensitive information into an unreadable format. However, encryption doesn’t modify the underlying data itself, allowing authorized users with appropriate keys to identify individuals from the encrypted dataset. Masking, generalization, and suppression techniques anonymize personal data while still allowing for meaningful analysis.
Data retention, disposal, and archiving: Data usage policies introduced in place by IT managers encourage lawful, fair, and transparent data management, ensuring that the data is not kept longer than necessary and is properly disposed of after use. Data disposal involves permanently deleting electronic or physical records of the data in such a way that they can’t be retrieved or reconstructed. Data archiving involves securely retaining such data that may come in handy for auditing, business continuity planning, or regulatory requirements.

Download free demo

Moving towards a future of responsible data handling

Increasing expectations for privacy from both individuals and regulatory frameworks have amplified the need for effective management of data privacy. While data fuels growth and innovation for organizations, they are required to strike a balance between data utility and individual privacy. It necessitates encompassing technical, legal, and ethical considerations to enable safe data handling and processing.

The role of IT managers, especially in small- and medium-sized organizations, is paramount in establishing a data privacy risk management framework and fostering a culture of best privacy practices within an organization. Their active participation in setting high-level goals, carrying out privacy risk assessments, and implementing robust security measures leads to responsible, ethical, and compliant data processing.

About Gallio PRO

Gallio PRO is an on-premise software solution for automated anonymization of photos and videos (face and license plate blurring) to ensure privacy protection and GDPR/DSGVO compliance. Unlike cloud-based online tools, it runs securely as a desktop application on Windows and Mac, with a Linux version available for industrial use and workflow integration. It allows users to retain full control over data as third-party data transfers are not required. Gallio PRO features an intuitive interface and supports selective anonymization, allowing users to choose objects to blur. Its advanced AI ensures maximum accuracy without using biometric data, providing irreversible blurring. Trusted by corporations, governments, and NGOs for tasks like Data Subject Access Requests and ADAS preparation.