Loading Logo

Erased but remembered – the power of AI to summarize the past

December 2025
 by Malgorzata Peterman-Krawczuk

Erased but remembered – the power of AI to summarize the past

December 2025
 By Malgorzata Peterman-Krawczuk

Generative AI learns from data in many forms – text, images, videos, or even sound. The larger and more complex the input data, the better the AI model’s performance. Large Language Models (LLMs) are trained on enormous datasets of text, gathered from publicly available sources over weeks/months. This includes websites (both content and code), books, digital historical records and academic research papers, open-access databases, news articles, and social media. AI models learn by scanning the billions of data points in these sources to absorb patterns, grammar, and everything in between to make predictions, solve problems, or create new data.

LLMs are trained on publicly available data, some of this may include sensitive or personal information, which may unintentionally and inadvertently be memorized. This raises questions about data retention and deletion. Can AI ‘forget’ data it has been trained on?

Deleted ≠ Erased

AI’s ‘memory’ is more permanent than any archive. Once the model is trained, a single paragraph cannot be deleted from the training data without retraining the entire model. People forget naturally; AI does not – unless someone makes it ‘forget’.

Machine unlearning is an extremely complex and complicated process, especially when it comes to LLMs. Retraining from scratch or adjusting the model may involve significant resources and be costly as LLMs have millions of parameters. As the Institute of Electrical and Electronics Engineers (IEEE) reports, “Their intricate architectures and nonlinear interactions between components make it hard to interpret the model and locate the specific parameters most relevant to a given data point.”  It is technically difficult to completely ‘erase’ some data (including personal data) that may have been embedded within many of the model’s parameters, and it is also unclear how ‘erasing’ this data influences the model’s predictions.

Even if the raw data is removed from the training set, traces such as a name or a quote can survive in the AI model itself or the model may reflect that data – long after the request for the data to be deleted.

Removing the data from the input does not mean disappearance. Just because something was deleted online does not mean it was erased by the relevant AI model. AI does not memorize the training data like people do – it learns patterns. The learned patterns can retain traces of data and can sometimes generate content that resembles a part of the training data or recreates ‘lost’ posts or behaviours.

Beyond this, copies of deleted AI models may still exist on other servers, in web archives, or within third-party platforms. Microsoft’s AI model WizardLM-2 was taken down a few hours after its launch. Although the model was deleted, it has been spread widely on the internet because several people downloaded and re-uploaded it to other platforms.

Privacy concerns in the age of Generative AI

Many AI tools are trained on personal data to improve algorithms. Since some of the systems process, store, and analyse vast amounts of sensitive data, this raises concerns regarding the Right to be Forgotten (RTBF) and personal data and privacy protection, especially if users do not have full control over their own data, or if it is being used to train AI models without their consent or knowledge.

Tech giants including Meta, Google, Amazon, and LinkedIn have faced increasing criticism and legal action over the use of personal data to train AI models.

The New York Times lawsuit against OpenAI over the use of its content to train AI systems involves, among other things, the ability of LLMs to memorize information. The court ordered OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court”. User privacy and the indefinite retention of conversations raise concerns. Conversations may be stored and used in court, even though users deleted them.

Are users able to fully erase personal or sensitive data?

There is a growing expectation that AI systems should be able to respond to data deletion requests; however, it seems that personal content can no longer be completely deleted from some AI models (such as LLMs).

Many companies are making an effort to improve data privacy and comply with legal regulations, such as the RTBF, and are providing safeguards to give users more control over their data. However, the complete and permanent erasure of sensitive or personal data from all AI systems is not guaranteed, and fully erasing information seems nearly impossible.

As a user, the best one can do in the short term is to be mindful of how much information one shares online or with AI tools. Before using any AI services, check the privacy policy and terms of service to understand what data is collected and on what terms, and stay informed about any updates to those policies. In the absence of better technical solutions, legal frameworks, and improved protection for users’ data, being careful not to share any sensitive or private information remains a fundamental defence against data misuse.

Join our newsletter and get access to all the latest information and news:

Privacy Policy.
Revoke consent.

© Digitalis Media Ltd. Privacy Policy.