Talk AI to me: a linguistic insight into human patterns in GenAI content (and its impact on us)

September 2025

by Chiara Cuccu

Talk AI to me: a linguistic insight into human patterns in GenAI content (and its impact on us)

September 2025

By Chiara Cuccu

It has been less than three years since ChatGPT, currently the most popular generative artificial intelligence chatbot by usage, was released in its first publicly available version. Nowadays, the use of generative AI (GenAI) and Large Language Models (LLMs) has become ubiquitous, with chatbot functionalities and automatic prompts embedded in our daily tasks, our online searches, our workloads, and even our personal messaging apps. In the past year alone, major search engines have observed that human search traffic worldwide is steadily decreasing; users are exponentially choosing to take their queries directly to an LLM chatbot that will present a fully-formed answer, rather than piecing together information from various online results. When performing searches, the convenience of AI overview functions (incrementally omnipresent, as this function is being rolled out to multiple jurisdictions and across continents this year) is also proving to be an obstacle to website traffic and click-through rates (CTRs).

Such a drastic change in our online behaviour in a relatively short period of time is in part driven by the continuous improvements and developments that make generative AI systems increasingly accurate and understanding of human behaviour and speech patterns. To do this, LLMs are consuming vast amounts of written content, from novels and academic papers to online forums, identifying patterns to capture the way that humans communicate and speak to each other. In 2023, OpenAI’s GPT-3 was found to be crawling content from online fan communities and volunteer-run fan-fiction portals, after AI-powered writing tools began returning niche terminology that was specific to particular fandom groups. This also led to wider questions on whether training data needs to account for copyright infringement even for non-commercial content that is readily available online, which may provide an increasing amount of less polished text that is closer to natural speech than published works. More recently, in early summer, authors accused generative artificial intelligence models of using their copyrighted works as training data without their consent.

Understanding human language

The access to more genuine and unedited training data, nevertheless, may have been the crucial turning point, leading to LLMs being able to correctly understand nuanced speech and authentic linguistic patterns. Corpus linguistics studies have identified ChatGPT’s ability to coin neologisms, understand cultural references and specialised vocabulary, use complex sentence structures, and use recognisable patterns to express positive sentiment, such as emphasis and punctuation.

The latest models of these chatbots, currently being tested, have also been partially successful in identifying and correctly parsing ambiguous grammatical structures. While the publicly available ChatGPT 3.5 and 4 models failed to detect ambiguous structures, OpenAI’s o1 model showed its ability to understand recursive clauses (phrases embedded within other phrases with unclear pronoun references, which are not necessarily delineated by punctuation, but understandable through context or overall tone); this has been deemed by the likes of Noam Chomsky as one of the defining features of human language, differentiating us from other animals. This kind of development could turn the tables on the debate on whether generative AI understands language fully, or merely mimics it and its frequent patterns.

Human speech imitating AI

While the human content to generative AI training pipeline is essential for the development and improvement of LLMs, the widespread use of AI chatbots in everyday tasks is also resulting in the reverse effect. Paradoxically, human speech is now mirroring language patterns that are common in AI-generated responses, both in written content, such as academic papers, and in spoken word, like lectures and video essays. Researchers at the Max Planck Institute for Human Development analysed hundreds of thousands of YouTube videos published before and after ChatGPT’s release for public usage, and found that the linguistic cues that AI content tends to showcase (certain vocabulary, sentence structure, and overall tone) were becoming more prominent in human speech too.

Words like “delve”, “realm”, and “adept”, which tend to be flagged as indicators of GenAI usage in academic essays, have seen their frequency in speech increase by over 50% since 2022, as people who regularly use LLM chatbots internalise them in their daily vocabulary.

As writers will know, the beloved em dash (not to be confused with the en dash or hyphen) has also been identified as a very common “GPT-ism”; editors now regularly recommend editing these out, or text will be erroneously flagged as AI-generated, meaning we may see this punctuation fall out of popularity if this guidance persists. Anecdotally, users who have tried to make ChatGPT not use em dashes — using detailed prompts and providing alternative sentence structures — found that eventually, the LLM kept sneaking its favourite punctuation mark into its answers.

Tone can also be a common marker of GenAI use, and users who rely on LLM tools to help them craft messages in text chains and online conversation have been found to mimic the same patterns presented by the chatbot (e.g. muted emotional response, structured sentences, cues of imperfect human language). Even when these are coming from a genuine place, the perception of these patterns and assumption of AI usage in personal conversations can lead to more negative responses from the interlocutor due to the anti-social connotations of LLMs.

Effects of GenAI writing

In professional settings, the consequences can be much more severe; while companies may try to save on their copywriting budget by using generative AI tools to write their website copy, for example, expert writers have found an increased demand for fixing inaccurate, irrelevant, and at times inappropriate outputs generated by chatbots, resulting in an unprofessional impression and potential reputational damage. This can in turn cost companies more than if they had used a human copywriter in the first place, as it may require further expert oversight and rework. Being perceived as having AI-sounding content in one’s professional platforms (such as a company website) can reportedly affect whether content is cited by LLMs, and may cause reactions of distrust or wariness from potential customers or stakeholders, posing a threat to the company’s reputation overall, even if the material is accurate.

While some recent studies have shown early indications that extensive reliance on GenAI chatbots may even impact cognitive abilities, one context in which the use of an LLM as a writing tool appears to have a significant positive impact in learning is in the academic performance of students that have English as a Second Language (ESL). Using these chatbots as a dialogic tool that can provide feedback can in fact be a powerful way to remove the potential language barrier and even out the playing field with their native-language peers. Of course, this is once again a context where GenAI patterns are likely to easily translate into genuine, human speech.

As the usage of GenAI chatbots increases across different fields and affects a variety of human skills, from digital content creation to problem-solving, and these tools are being implemented across sectors as ways to assess human candidates, the line between pure human communication and adapting to LLM patterns may become even thinner.

Back to News

Talk AI to me: a linguistic insight into human patterns in GenAI content (and its impact on us)

Talk AI to me: a linguistic insight into human patterns in GenAI content (and its impact on us)

Join our newsletter and get access to all the latest information and news: