Loading Logo

Ghost in the machine: how LLMs work and the race to decide their values

March 2026
 by Jorge Repiso

Ghost in the machine: how LLMs work and the race to decide their values

March 2026
 By Jorge Repiso

Hemingway’s novel The Sun Also Rises memorably features a secondary character being asked how he went bankrupt. “Gradually, then suddenly”, he quipped. The history of AI research and the mass adoption of large language models (LLMs) into the mainstream also feel like that.

Artificial neural networks (ANNs) have existed in their modern form since the 1960s, but in 2017, eight scientists at Google released a seminal paper that introduced a fundamentally new way to arrange ANNs: “Attention Is All You Need”. Dubbed ‘transformer architecture’, it provided a more novel approach to training language models using contextual information (attention).

2018 saw the release of both Google BERT and OpenAI’s GPT-1, the latter only conceptually, both relying on this new transformer architecture. BERT focused on encoding tasks (classification, sentiment analysis, named entity recognition) to better understand the relationship between queries in Google Search and serve users with faster, more relevant results. OpenAI, on the other hand, used transformers to build autoregressive models – that is, models which use previous inputs to then automatically calculate the next component in a sequence – for decoding tasks. By focusing on mathematically predicting the next token to generate, GPT-1 and later GPT-2 paved the way for the explosion of generative LLMs and other generative tools that have since become pervasive.

Training a large language model

At their core, these models are trained through pattern recognition at massive scale. They consume vast amounts of text (books, websites, conversations, code, etc.) which are converted into numerical representations called tokens. When you read “the cat sat on the …”, your brain might predict “mat” or “chair”. LLMs do something similar, but across billions of examples, developing an intricate statistical model of how language works. They assign weights to all tokens in their training data to output the most likely token to follow.

Training typically happens in two, sometimes three, phases. Pre-training is the foundation, where models ingest enormous datasets and learn broad capabilities through unsupervised learning. They are essentially parsing whatever content is fed to them, ‘teaching’ themselves everything from grammar to quantum physics to baking recipes. It is worth noting that these models do not understand anything per se – they are simply able to gain statistical insights from the contextual information of the data ingested and are able to output tokens in such coherent order that it makes them sound knowledgeable.

In LLMs’ early days, developers would train their models on any content they could get their hands on. However, following the mass increase in LLM use, numerous lawsuits have been brought by authors and publishers seeking to prevent their content from being used for commercial purposes without authorisation. Tech companies are now entering into licensing agreements to obtain high-quality training data, which, in conjunction with more compute power, leads to larger, more capable models.

Mid-training, or continued pre-training, refines these models using more curated, high-quality data focused on particular domains, such as scientific reasoning, coding, or multilingual abilities. This bridges the gap between raw statistical pattern matching and useful expertise.

Post-training is where the model learns not what it can say but what it should say, transforming from a simple prediction engine into an effective assistant. Critical safety and alignment testing is performed during this phase to ensure models are safe before release. The most common and effective alignment technique is reinforcement learning from human feedback (RLHF), where humans rate responses, teaching the model to be helpful, honest, and harmless. Other techniques take a more adversarial approach, like red teaming, where humans (or other AI systems) intentionally try to elicit harmful or misaligned outputs to further refine the model’s safety. Alignment ultimately shapes the model’s judgement about when to answer confidently, when to express uncertainty, and when to decline answering altogether.

Artificial… Intelligence?

We know LLMs can solve complex problems, write creative prose, and engage in nuanced reasoning, but does that make them actually intelligent? Defining intelligence is no easy task. Alan Turing avoided offering an essentialist definition in his famous test, instead providing an operational criterion: if a machine’s behaviour is indistinguishable from a human’s in conversation, then for practical purposes, we can consider it intelligent.

Philosopher John Searle’s Chinese Room thought experiment (1980) argues that computers executing programs cannot possess a mind, intentionality, or consciousness. Humans have traits, feelings, and behaviours that shape our intellect, and our intelligence stems from Socratic ignorance: crucially, we can identify what knowledge we lack, rather than merely recognising its absence. We also understand the intellectual ramifications of acquiring that knowledge and possess the ability to pursue it. In contrast, while LLMs can derive knowledge from mathematical formulations, they lack the consciousness of possessing an internal, updatable model that operates in an external world. Machines don’t know what they don’t know.

Perhaps the better question isn’t whether LLMs are truly intelligent, but what kind of intelligence they represent. They excel at pattern recognition, information synthesis, and linguistic manipulation, all cognitive tasks that require intelligence when humans perform them. Yet they struggle with others: maintaining consistent beliefs over time, genuine planning, and understanding the physical world.

Constitutional AI… who writes the rules?

This brings us to ‘constitutional AI’. Rather than relying solely on human feedback to shape behaviour, constitutional AI employs a set of principles (a ‘constitution’) to guide the model. The system critiques and revises its own responses according to these principles, learning through self-improvement. This approach is more transparent (values are explicit), more scalable (less dependent on endless human feedback), and potentially more consistent in applying ethical principles.

AI firm Anthropic recently updated its Constitution to make its models “broadly safe”, “broadly ethical”, “compliant with Anthropic’s guidelines” and “genuinely helpful”. This represents a shift from asking “can we make AI do what we want?” to “can we make AI that reasons about what it should do?”

While this may be a step in the right direction, it’s not without critics. Constitutional AI raises profound questions about power and values in a globalised, post-truth world.

Who gets to write the constitution? The companies building these tools appear to be filling this role by default, but they ultimately answer to their shareholders and markets, not necessarily to broader society, whose members may not use these tools but whose lives will undoubtedly be shaped by them.

Which values should be enshrined in a constitution? As AI systems become critical infrastructure worldwide, these encoded values matter enormously. The incumbent tech players emerge from WEIRD (Western, educated, industrialised, rich, democratic) capitalist markets, but other countries such as China operate under entirely different value systems and are increasingly competing in terms of both scale and capability. Recent geopolitical tensions are also prompting other WEIRD countries (largely European ones) to question the hegemonic status quo by developing LLMs with a stronger focus on European data sovereignty and regulatory compliance.

Constitutional AI makes alignment more transparent and systematic, but it doesn’t resolve the fundamental question: in a world of competing moral frameworks and geopolitical interests, whose constitution counts? The technology may be alignment-ready, but humanity certainly isn’t aligned on what AI should be aligned to.

Join our newsletter and get access to all the latest information and news:

Privacy Policy.
Revoke consent.

© Digitalis Media Ltd. Privacy Policy.