Italy’s Public Data Puzzle: How to Reuse Information Without Re-Identifying People
Public-sector data can power analytics and AI, but the real security question is whether privacy controls survive linkage, reuse, and inference.
Governments often hold some of the most useful data in society, but usefulness creates pressure: once information is reused across systems, the risk of exposing citizens rises with it. The central challenge is not simply hiding names. It is deciding whether a dataset should be anonymised, pseudonymised, protected with Privacy Enhancing Technologies, or transformed into synthetic records before it is shared or analysed.
Fast Facts
- Public administrations can reuse data for services, analytics, and AI only if privacy protections match the intended use.
- Pseudonymisation lowers exposure, but the data can still remain personal data if the re-linking key exists.
- Anonymisation is a much higher bar because identification must not be reasonably likely using available means.
- K-anonymity can reduce singling out, but it does not eliminate every re-identification path.
- Synthetic data may support sharing and testing, but its privacy value depends heavily on how it is generated.
The real risk is not just theft
The interesting cyber problem here is inferential disclosure. Even when direct identifiers are removed, a person may still be singled out by combining quasi-identifiers, external datasets, or repeated releases over time. That is why privacy engineering treats data reuse as a controlled process, not a one-time cleanup step.
Pseudonymisation is useful when a public body needs continuity for legitimate processing, but it does not end the privacy analysis. The mapping between a code and a real person must be separated and protected, otherwise the protection is fragile. By contrast, anonymisation aims for a stronger outcome: the person should no longer be identifiable by means reasonably likely to be used in that context.
This is also where PETs matter. In practice, they are less a single product than a toolbox for privacy-preserving processing. Depending on the use case and threat model, they can help public administrations analyse data, exchange information, or support AI workflows while reducing exposure of raw records. The catch is that suitability cannot be assumed. It has to be tested against the actual release, the actual users, and the actual data linkages that exist in the ecosystem.
K-anonymity and synthetic data fit into the same risk-management frame. K-anonymity can make records harder to distinguish, but it is not a blanket guarantee against inference. Synthetic data can be safer than direct sharing, but only if the generation method is designed to prevent leakage and preserve utility in a measured way.
The broader lesson is straightforward: public data can be valuable without becoming a privacy liability, but only if the control chosen matches the threat. For a public administration, that means treating privacy as an engineering discipline, with governance, testing, and review built into the data pipeline before anything is published, exchanged, or fed into AI.
Conclusion
The article’s core message is that innovation in the public sector does not have to come at the expense of citizens’ rights. The harder part is technical discipline: choosing the right privacy control, understanding its limits, and assuming that reuse always creates new ways to re-identify people. In modern government data work, privacy is not a box to tick. It is the condition that makes reuse defensible.
WIKICROOK
- Anonymization: A process that aims to make a person no longer identifiable by means reasonably likely to be used.
- Pseudonymization: Replacing direct identifiers with codes while keeping the re-linking information separately protected.
- Privacy Enhancing Technologies (PETs): Techniques that reduce exposure of sensitive data during storage, sharing, or analysis.
- K-anonymity: A model that makes each record indistinguishable from at least k-1 others on chosen attributes.
- Synthetic data: Artificially generated data that imitates real records for testing, analysis, or AI use.




