Home Insights De-identified data and AI: what the I-MED decision means for privacy compliance
Share

De-identified data and AI: what the I-MED decision means for privacy compliance

A recent OAIC decision provides useful guidance for businesses on techniques for de-identifying personal information – especially for the purpose of training AI models. Importantly, good governance and planning for privacy is vital when commencing a new initiative involving new technology.

The Office of the Australian Information Commissioner (OAIC) recently concluded its preliminary inquiries into I-MED Radiology Network Limited's (I-MED) disclosure of de-identified patient data to Annalise.ai without taking regulatory action. I-MED had been sharing the de-identified patient data without patient consent or providing notice to train an AI model for diagnostic imaging.

The OAIC found that the data was sufficiently de-identified and no longer constituted 'personal information' under the Privacy Act 1988 (Cth) (Privacy Act).

Key takeaways for de-identifying personal information

Businesses should consider the following key takeaways from the OAIC's report when proposing to use de-identified personal information to train AI models:

  1. Develop a robust de-identification methodology: Use recognised standards and techniques (e.g. hashing, redaction, aggregation) to ensure data is no longer reasonably identifiable. Ensure this methodology is documented and reviewed regularly.

  2. Mitigate risk of re-identification: Impose contractual obligations on data recipients to prevent re-identification, including prohibiting data merging and AI based re-identification. Use technical controls to prevent linkage with other datasets.

  3. Strengthen data governance: Establish clear internal policies and procedures for de-identification and data sharing, aligned with frameworks like the 5-Safes Principles. Include prescriptive guidance and ensure staff are trained on compliance requirements.

  4. Transparency and reputation risks: Inform customers about how their de-identified data may be used to mitigate reputational risks.

In addition to any Privacy Act considerations, businesses should also be mindful of consumer law risks including:

  1. False, misleading or deceptive conduct: Ensure that any representations made to consumers about the use of their data, including how the business will be using and training AI models, are not false, misleading or deceptive.

  2. Unfair contract terms and unfair trading practices: Requiring consumers to allow their data to be used to train an AI model as a prerequisite or condition of receiving services could potentially amount to an ‘unfair contract term’ if included in a contract, or constitute an ‘unfair trading practice’ under the Federal Government’s proposed unfair trading practices regime. See Extending an unfair trading practices prohibition to commercial arrangements with small businesses: a potential chilling effect for more information.

Background to the OAIC’s preliminary inquiries

Between 2020 and 2022, I-MED shared de-identified patient data (including clinical scans and reports) with Annalise.ai to train an AI model for diagnostic imaging. Patients were not notified, nor was consent obtained.

Following media coverage in September 2024, the OAIC launched a preliminary inquiry into I-MED’s data practices in response to growing public concern over the use of personal information in AI development. The OAIC has emphasised that training AI models with personal information is a high-risk activity under the Privacy Act and will be a regulatory focus going forward.

Application of the APPs

As part of its preliminary inquiry, the OAIC examined whether I-MED’s disclosure of patient data to Annalise.ai involved personal information, including health information, or whether the information had been sufficiently de-identified so that the Australian Privacy Principles (APPs) did not apply.

The OAIC’s assessment focused on the following key concepts:

Personal Information

The APPs apply to information about an identified or reasonably identifiable individual. Health information is treated as ‘sensitive information’ under the Privacy Act and is subject to more stringent requirements for how it is collected, used and disclosed by organisations.

The definition of personal information may also soon be expanded to information or an option that “relates to” an identified individual as one of the Federal Government’s many proposed privacy reforms.

De-identification

Information that has been de-identified so that it no longer reasonably identifies an individual will not be subject to the APPs. De-identification involves removing or altering identifiers (e.g. names, addresses or rare traits) to prevent re-identification.

Re-identification risk

Data will only be considered de-identified, and exempt from the application of the APPs, in circumstances where the process to re-identify an individual is so impractical that there is almost no likelihood of it occurring.

The risk of ‘re-identification’ is heightened when the de-identified information is combined with other datasets or processed by AI systems that are trained on very large datasets. An MIT study in 2015 found that just four fairly vague pieces of information are enough to identify 90% of individuals in a data set of 1.1 million users’ credit-card transactions.

The broader privacy risks from re-identification are also being addressed as part of upcoming privacy reforms, with the Federal Government agreeing to further consultation on introducing a criminal offence for malicious re-identification of de-identified information, and considering how de-identified information can be protected from unauthorised re-identification under the Privacy Act.

IMED’s de-identification process

After reviewing IMED’s data practices, the OAIC was satisfied that the following measures that IMED implemented were sufficient to de-identify the personal information, and the APPs would not apply.

Technical measures

I-MED de-identified the patient records by:

  • segregating the patient data from the underlying dataset;

  • scanning the records with text recognition software;

  • using two hashing techniques (for unique identifiers such as patient ID numbers, and names, addresses and phone numbers);

  • time-shifting dates (to a random date within a specified number of years);

  • aggregating certain fields into large cohorts to avoid identification of outliers; and

  • redacting any text that appears within or within 10% from the boundary of an image scan.

Contractual measures

I-MED imposed the following obligations on Annalise.ai in relation to how it handled the de-identified patient data:

  • prohibiting them from doing any act, or engaging in any practice, that would result in the patient data becoming 'reasonably identifiable';

  • prohibiting them from disclosing or publishing the patient data for any purpose (to prevent wider dissemination of the dataset and accordingly reduce the risk that the patient data may become re-identifiable in the hands of other third parties or the public domain);

  • requiring them to store the patient data in a secure environment, and

  • requiring them to notify I-MED if it inadvertently received any patient personal information.

These contractual obligations were crucial in addressing rare instances where personal information was mistakenly shared with Annalise.ai due to de-identification errors. In line with its contractual duties, Annalise.ai promptly identified and reported these issues to I-MED, allowing the data to be deleted or properly de-identified.

Governance measures

I-MED developed a Data De-identification Policy and Approach to guide how de-identified patient data was shared, which reflected many of the practices endorsed by the National Institute of Standards and Technology. This included:

  • utilising the 5-Safes Principles (safe people, projects, settings, data and outputs);

  • ensuring separation of the Annalise.ai and I-MED environments;

  • utilising a ‘Data Use Agreement Model’;

  • imposing prescriptive de-identification standards;

  • removing or transforming all direct identifiers; and

  • utilising top and bottom coding and aggregation of outliers.

Guidance for using de-identified personal information

The OAIC’s report serves as a valuable reference for businesses seeking to use de-identified personal information responsibly. By adopting rigorous technical, contractual, and governance measures, organisations can reduce privacy risks and comply with legal obligations, while still leveraging data for innovation.


Authors

Philip Catania

Consultant

Theonie Scott

Special Counsel

Jake Fava

Associate

Mark Salamy

Associate


Tags

Technology, Media and Telecommunications Competition/Antitrust Health

This publication is introductory in nature. Its content is current at the date of publication. It does not constitute legal advice and should not be relied upon as such. You should always obtain legal advice based on your specific circumstances before taking any action relating to matters covered by this publication. Some information may have been obtained from external sources, and we cannot guarantee the accuracy or currency of any such information.

Share
  • Print article

Key Contact

NORTH-james-highres_SMALL

James North

Head of Technology, Media and Telecommunications

Other Contacts

DIXIT arvin SMALL

Arvind Dixit

Partner

KOLIVOS-eugenia-highres_SMALL

Eugenia Kolivos

Head of Intellectual Property

REYNOLDS-Ian-highres_SMALL

Ian Reynolds

Partner

BURGER Jodie SMALL

Jodie Burger

Partner

CATANIA_Phil_SMALL

Philip Catania

Consultant

SCOTT Theonie SMALL

Theonie Scott

Special Counsel

Related Capabilities