Generative AI learns from data in many forms – text, images, videos, or even sound. The larger and more complex the input data, the better the AI model’s performance. Large Language Models (LLMs) are trained on enormous datasets of text, gathered from publicly available sources over weeks/months. This includes websites (both content and code), books, digital historical records and academic research papers, open-access databases, news articles, and social media. AI models learn by scanning the billions of data points in these sources to absorb patterns, grammar, and everything in between to make predictions, solve problems, or create new data.
LLMs are trained on publicly available data, some of this may include sensitive or personal information, which may unintentionally and inadvertently be memorized. This raises questions about data retention and deletion. Can AI ‘forget’ data it has been trained on?
Deleted ≠ Erased
AI’s ‘memory’ is more permanent than any archive. Once the model is trained, a single paragraph cannot be deleted from the training data without retraining the entire model. People forget naturally; AI does not – unless someone makes it ‘forget’.
Machine unlearning is an extremely complex and complicated process, especially when it comes to LLMs. Retraining from scratch or adjusting the model may involve significant resources and be costly as LLMs have millions of parameters. As the Institute of Electrical and Electronics Engineers (IEEE) reports, “Their intricate architectures and nonlinear interactions between components make it hard to interpret the model and locate the specific parameters most relevant to a given data point.” It is technically difficult to completely ‘erase’ some data (including personal data) that may have been embedded within many of the model’s parameters, and it is also unclear how ‘erasing’ this data influences the model’s predictions.
Even if the raw data is removed from the training set, traces such as a name or a quote can survive in the AI model itself or the model may reflect that data – long after the request for the data to be deleted.
Removing the data from the input does not mean disappearance. Just because something was deleted online does not mean it was erased by the relevant AI model. AI does not memorize the training data like people do – it learns patterns. The learned patterns can retain traces of data and can sometimes generate content that resembles a part of the training data or recreates ‘lost’ posts or behaviours.
Beyond this, copies of deleted AI models may still exist on other servers, in web archives, or within third-party platforms. Microsoft’s AI model WizardLM-2 was taken down a few hours after its launch. Although the model was deleted, it has been spread widely on the internet because several people downloaded and re-uploaded it to other platforms.
Privacy concerns in the age of Generative AI
Many AI tools are trained on personal data to improve algorithms. Since some of the systems process, store, and analyse vast amounts of sensitive data, this raises concerns regarding the Right to be Forgotten (RTBF) and personal data and privacy protection, especially if users do not have full control over their own data, or if it is being used to train AI models without their consent or knowledge.
Tech giants including Meta, Google, Amazon, and LinkedIn have faced increasing criticism and legal action over the use of personal data to train AI models.
The New York Times lawsuit against OpenAI over the use of its content to train AI systems involves, among other things, the ability of LLMs to memorize information. The court ordered OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court”. User privacy and the indefinite retention of conversations raise concerns. Conversations may be stored and used in court, even though users deleted them.
Are users able to fully erase personal or sensitive data?
There is a growing expectation that AI systems should be able to respond to data deletion requests; however, it seems that personal content can no longer be completely deleted from some AI models (such as LLMs).
Many companies are making an effort to improve data privacy and comply with legal regulations, such as the RTBF, and are providing safeguards to give users more control over their data. However, the complete and permanent erasure of sensitive or personal data from all AI systems is not guaranteed, and fully erasing information seems nearly impossible.
As a user, the best one can do in the short term is to be mindful of how much information one shares online or with AI tools. Before using any AI services, check the privacy policy and terms of service to understand what data is collected and on what terms, and stay informed about any updates to those policies. In the absence of better technical solutions, legal frameworks, and improved protection for users’ data, being careful not to share any sensitive or private information remains a fundamental defence against data misuse.
Privacy Policy.
Revoke consent.
© Digitalis Media Ltd. Privacy Policy.
Digitalis
We firmly believe that the internet should be available and accessible to anyone, and are committed to providing a website that is accessible to the widest possible audience, regardless of circumstance and ability.
To fulfill this, we aim to adhere as strictly as possible to the World Wide Web Consortium’s (W3C) Web Content Accessibility Guidelines 2.1 (WCAG 2.1) at the AA level. These guidelines explain how to make web content accessible to people with a wide array of disabilities. Complying with those guidelines helps us ensure that the website is accessible to all people: blind people, people with motor impairments, visual impairment, cognitive disabilities, and more.
This website utilizes various technologies that are meant to make it as accessible as possible at all times. We utilize an accessibility interface that allows persons with specific disabilities to adjust the website’s UI (user interface) and design it to their personal needs.
Additionally, the website utilizes an AI-based application that runs in the background and optimizes its accessibility level constantly. This application remediates the website’s HTML, adapts Its functionality and behavior for screen-readers used by the blind users, and for keyboard functions used by individuals with motor impairments.
If you’ve found a malfunction or have ideas for improvement, we’ll be happy to hear from you. You can reach out to the website’s operators by using the following email webrequests@digitalis.com
Our website implements the ARIA attributes (Accessible Rich Internet Applications) technique, alongside various different behavioral changes, to ensure blind users visiting with screen-readers are able to read, comprehend, and enjoy the website’s functions. As soon as a user with a screen-reader enters your site, they immediately receive a prompt to enter the Screen-Reader Profile so they can browse and operate your site effectively. Here’s how our website covers some of the most important screen-reader requirements, alongside console screenshots of code examples:
Screen-reader optimization: we run a background process that learns the website’s components from top to bottom, to ensure ongoing compliance even when updating the website. In this process, we provide screen-readers with meaningful data using the ARIA set of attributes. For example, we provide accurate form labels; descriptions for actionable icons (social media icons, search icons, cart icons, etc.); validation guidance for form inputs; element roles such as buttons, menus, modal dialogues (popups), and others. Additionally, the background process scans all of the website’s images and provides an accurate and meaningful image-object-recognition-based description as an ALT (alternate text) tag for images that are not described. It will also extract texts that are embedded within the image, using an OCR (optical character recognition) technology. To turn on screen-reader adjustments at any time, users need only to press the Alt+1 keyboard combination. Screen-reader users also get automatic announcements to turn the Screen-reader mode on as soon as they enter the website.
These adjustments are compatible with all popular screen readers, including JAWS and NVDA.
Keyboard navigation optimization: The background process also adjusts the website’s HTML, and adds various behaviors using JavaScript code to make the website operable by the keyboard. This includes the ability to navigate the website using the Tab and Shift+Tab keys, operate dropdowns with the arrow keys, close them with Esc, trigger buttons and links using the Enter key, navigate between radio and checkbox elements using the arrow keys, and fill them in with the Spacebar or Enter key.Additionally, keyboard users will find quick-navigation and content-skip menus, available at any time by clicking Alt+1, or as the first elements of the site while navigating with the keyboard. The background process also handles triggered popups by moving the keyboard focus towards them as soon as they appear, and not allow the focus drift outside of it.
Users can also use shortcuts such as “M” (menus), “H” (headings), “F” (forms), “B” (buttons), and “G” (graphics) to jump to specific elements.
We aim to support the widest array of browsers and assistive technologies as possible, so our users can choose the best fitting tools for them, with as few limitations as possible. Therefore, we have worked very hard to be able to support all major systems that comprise over 95% of the user market share including Google Chrome, Mozilla Firefox, Apple Safari, Opera and Microsoft Edge, JAWS and NVDA (screen readers), both for Windows and for MAC users.
Despite our very best efforts to allow anybody to adjust the website to their needs, there may still be pages or sections that are not fully accessible, are in the process of becoming accessible, or are lacking an adequate technological solution to make them accessible. Still, we are continually improving our accessibility, adding, updating and improving its options and features, and developing and adopting new technologies. All this is meant to reach the optimal level of accessibility, following technological advancements. For any assistance, please reach out to webrequests@digitalis.com