One of the many, many, talking points which has arisen since AI became a dominant feature in the tech landscape is where Large Language Models (LLMs) draw their data from. It seems ancient history now to recall that OpenAI’s early ChatGPT models only had access to information published before September 2021, but even without full access to the internet, there was always a question as to the works used to train various models, and to what extent use of third-party content is justifiable.
With the searchable internet now a regular option for LLMs, there have been struggles to determine what content is fair game, or even fair use. News publishers, authors, and other creatives are increasingly seeking to establish rules of engagement that strike the balance between visibility and protection.
Subscribe to view this content
Since the invention of the paywall in the mid-1990s, journalism has had a long history of setting boundaries with the online world. While some forms of publishing have flourished on sharing models, ad revenue, and working with data brokers, other publishers have used alternative methods to receive returns on their output, with varying levels of success. Limited free trials, purchasable articles, and other subscription models are now commonplace to encourage readers to purchase content and to prevent articles on news and magazine sites from being read by those who haven’t paid.
Taking a further step, since 2021 the government of Australia’s News media bargaining code has required tech platforms like Google, Microsoft, and Meta to strike commercial bargains with news publishers for their content to “address bargaining power imbalances between […] news media businesses and digital platforms”. That said, taking a firm position on access to content is not universally popular. A new scheme is in development following mixed reactions from Big Tech platforms, including Meta’s 2024 announcement that no deals would be renewed, but this has yet to be introduced.
Naturally, some users will still avoid or immediately click away from content they can’t read, affecting ad revenue and the perceived ‘usefulness’ of the content. The common advice is that paywalls do not negatively impact Search Engine Optimisation (SEO) to a huge degree, as search engines’ web crawlers can usually still access most of the content to index and return in results for relevant user queries. Indeed, there are paywall software options designed to allow for flexibility for precisely this reason, but the system is not without vulnerabilities. In July 2025, the News/Media alliance announced they had taken down 12ft.io, a site which disguised users as a web crawler, allowing them to avoid ads, trackers, and paywalls.
However, search rankings and social media are only part of the story now. There is a balance to be struck between safeguarding work from unauthorised transformative use and avoiding the snub of omission from the lucrative list of links presented in an AI Overview (AIO). Much as publishers must consider the pros and cons of free articles, trial subscriptions, paywalls, and cookie-based revenue models, so too must they draw their own lines on how AI may or may not peruse their articles.
Just browsing
For some publishers, there is an obvious solution to the problem of models returning their content without having paid for the privilege: let them pay for the access. While the exact terms reached are not available publicly for all licensing deals, some serious journalistic clout has signed up with the biggest players in the AI field to take advantage of the benefits of collaboration. OpenAI, the company behind ChatGPT, has reached agreements with an impressively wide spread of titles, including news heavyweights like the Financial Times, the Associated Press and NewsCorp, household names like Condé Nast, Vox Media, The Atlantic and Time, lifestyle magazines through Dotdash Meredith, and even user-generated platforms like Reddit, Stack Overflow, and Automattic. Tech giants like Google, Microsoft, Meta, and Amazon have also sought out similar partnerships to supplement their offerings. ProRata.ai operates Gist.ai, which claims to be “the first ethical AI search engine”; offering a 50% revenue split and full attribution, it claims over 500 publications as sources.
For others, the path forward has been less clear. Anthropic’s approach of purchasing a library of online material and even physical books to train its LLM Claude was ruled to be legal by a federal judge in June 2025 under fair use, but was complicated by the appearance of pirated materials in its collection. A copyright infringement trial later this year will determine what Anthropic owes in damages.
In the same month, another trial saw a judge back Meta, as he felt the plaintiffs, a group of authors including Sarah Silverman and Ta-Nehisi Coates, had made the wrong case. The decision ruled that the tech platform’s use of their work was transformative under fair use and did not risk “market dilution”, but Judge Chhabria emphasised “This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful […] It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
If trends continue and users carry on engaging with content like non-committal shoppers, securing a good deal currently seems the most straightforward route to securing financial returns, though only time will tell. Not everyone is on board, with multiple publishers, including several news coalitions, taking legal action against AI platforms and Reach holding out so far from entering any deals.
Looking ahead
Data suggests that users are increasingly turning to chatbots rather than search engines, and when users do use traditional search engines, 60% of searches result in zero clicks due to AIOs. A study by Press Gazette demonstrated that AIOs broadly contributed to drops in traffic to publishers. While no-click searches are not a new phenomenon, research suggests AIOs make users less likely to click on links. Google has recently launched a new tool in the UK offering “AI Mode” to generate conversational answers in the vein of these overviews rather than the established “ten blue links”. While the ability of AIOs to accurately parse and summarise sources is not without flaws, there is a clear benefit to being compensated directly for content if the chance of clicks is reduced.
SEO and communications experts have emphasised that optimising content for chatbots and AIOs will continue to place value on factors which are familiar today, such as content written by humans, trusted domains, and websites with clear, navigable design. Digitalis’s research published in June 2024 indicated that ChatGPT 4.0, Perplexity, and Gemini drew a significant amount of information from page one of Search Engine Results Pages; the “ten blue links” might look different in future but haven’t yet lost their influence. However, there is a notable thread of commentary that, intellectual property aside, the search engine model is cannibalising itself by allowing AI elements to threaten the traffic and revenue of sites that feed into its knowledge base. For many, there’s a sense that the symbiotic balance of the search ecosystem is not functioning as intended. The decision of multiple publishers to invest in agreements with tech platforms to secure the way ahead is one solution, but we are likely to keep talking about this creative frontline for quite some time.
Privacy Policy.
Revoke consent.
© Digitalis Media Ltd. Privacy Policy.
Digitalis
We firmly believe that the internet should be available and accessible to anyone, and are committed to providing a website that is accessible to the widest possible audience, regardless of circumstance and ability.
To fulfill this, we aim to adhere as strictly as possible to the World Wide Web Consortium’s (W3C) Web Content Accessibility Guidelines 2.1 (WCAG 2.1) at the AA level. These guidelines explain how to make web content accessible to people with a wide array of disabilities. Complying with those guidelines helps us ensure that the website is accessible to all people: blind people, people with motor impairments, visual impairment, cognitive disabilities, and more.
This website utilizes various technologies that are meant to make it as accessible as possible at all times. We utilize an accessibility interface that allows persons with specific disabilities to adjust the website’s UI (user interface) and design it to their personal needs.
Additionally, the website utilizes an AI-based application that runs in the background and optimizes its accessibility level constantly. This application remediates the website’s HTML, adapts Its functionality and behavior for screen-readers used by the blind users, and for keyboard functions used by individuals with motor impairments.
If you’ve found a malfunction or have ideas for improvement, we’ll be happy to hear from you. You can reach out to the website’s operators by using the following email webrequests@digitalis.com
Our website implements the ARIA attributes (Accessible Rich Internet Applications) technique, alongside various different behavioral changes, to ensure blind users visiting with screen-readers are able to read, comprehend, and enjoy the website’s functions. As soon as a user with a screen-reader enters your site, they immediately receive a prompt to enter the Screen-Reader Profile so they can browse and operate your site effectively. Here’s how our website covers some of the most important screen-reader requirements, alongside console screenshots of code examples:
Screen-reader optimization: we run a background process that learns the website’s components from top to bottom, to ensure ongoing compliance even when updating the website. In this process, we provide screen-readers with meaningful data using the ARIA set of attributes. For example, we provide accurate form labels; descriptions for actionable icons (social media icons, search icons, cart icons, etc.); validation guidance for form inputs; element roles such as buttons, menus, modal dialogues (popups), and others. Additionally, the background process scans all of the website’s images and provides an accurate and meaningful image-object-recognition-based description as an ALT (alternate text) tag for images that are not described. It will also extract texts that are embedded within the image, using an OCR (optical character recognition) technology. To turn on screen-reader adjustments at any time, users need only to press the Alt+1 keyboard combination. Screen-reader users also get automatic announcements to turn the Screen-reader mode on as soon as they enter the website.
These adjustments are compatible with all popular screen readers, including JAWS and NVDA.
Keyboard navigation optimization: The background process also adjusts the website’s HTML, and adds various behaviors using JavaScript code to make the website operable by the keyboard. This includes the ability to navigate the website using the Tab and Shift+Tab keys, operate dropdowns with the arrow keys, close them with Esc, trigger buttons and links using the Enter key, navigate between radio and checkbox elements using the arrow keys, and fill them in with the Spacebar or Enter key.Additionally, keyboard users will find quick-navigation and content-skip menus, available at any time by clicking Alt+1, or as the first elements of the site while navigating with the keyboard. The background process also handles triggered popups by moving the keyboard focus towards them as soon as they appear, and not allow the focus drift outside of it.
Users can also use shortcuts such as “M” (menus), “H” (headings), “F” (forms), “B” (buttons), and “G” (graphics) to jump to specific elements.
We aim to support the widest array of browsers and assistive technologies as possible, so our users can choose the best fitting tools for them, with as few limitations as possible. Therefore, we have worked very hard to be able to support all major systems that comprise over 95% of the user market share including Google Chrome, Mozilla Firefox, Apple Safari, Opera and Microsoft Edge, JAWS and NVDA (screen readers), both for Windows and for MAC users.
Despite our very best efforts to allow anybody to adjust the website to their needs, there may still be pages or sections that are not fully accessible, are in the process of becoming accessible, or are lacking an adequate technological solution to make them accessible. Still, we are continually improving our accessibility, adding, updating and improving its options and features, and developing and adopting new technologies. All this is meant to reach the optimal level of accessibility, following technological advancements. For any assistance, please reach out to webrequests@digitalis.com