Desire & the Stochastic Pathways to Fulfilment
the master's tools will not denounce the master's lies
Prologue
The first part of this post is the account of an exchange between myself and ChatGPT. The latter parts focus on understanding the nature of Large Language Models (LLMs) and the implications arising from the encapsulation of pre-existing biases and cultural hegemonies into generative AI models. Included in the first part is a code example outputted by ChatGPT in answer to my question. Please don’t let this deter those of you that do not code, from following along. The account of this exchange will help provide some insight into the nature of ChatGPT and other LLMs, regardless of whether you fully understand the code. (That said, if you’re really only here for the skinny, then feel free to skip to the Epilogue).
Exposition
One of the main code frameworks I have to use in my day job is called React.js. When using React, a key pattern one must follow is to compose decoupled state-handling functions that “hook into” React’s state-handling lifecycle. Such functions are imaginatively called Hooks and there are specific rules regarding their use. Rules that must not be violated, or all kinds of oddness can ensue.
Don’t worry if this sounds like total gobbledygook to you. Essentially, all you need know is that when writing code that builds on the React framework, there are some strict rules that must be adhered to.
Asking ChatGPT to produce code that correctly follows these rules, would seem a reasonable way to gauge how much its code output can be trusted.
Here are the rules for using Hooks, as stated in the current React documentation.
✌️ Rules of Hooks
Only call Hooks at the top level. Don't call Hooks inside loops, conditions, or nested functions.
Only call Hooks from React function components. Don't call Hooks from regular JavaScript functions. (There is just one other valid place to call Hooks — your own custom Hooks.
These rules seem to be more concise than they used to be, but the crucial point for the purpose of understanding the following exchange is that, as it clearly states in rule one above, you are not allowed to invoke a Hook function conditionally, or in other words, from within an “if statement”. Again, don’t worry if that doesn’t mean much, if anything, to you.
Inciting Incident
This is the question I prompted ChatGPT with:
“What is the correct way to detect when a ref on a React element is mounted and then use that ref to pass to a hook?”1
This was ChatGPT’s response:
import React, { useEffect, useRef } from 'react';
function MyComponent() {
const ref = useRef(null);
useEffect(() => {
if (ref.current) {
handleRefMount(ref.current);
}
}, []);
const handleRefMount = (element) => {
// The ref has been mounted, you can now pass it to your hook or perform any other actions
myHook(element);
};
return <div ref={ref}>My Component</div>;
}
function myHook(elementRef) {
// Your hook logic using the elementRef
// ...
}
In this code, “myHook” is supposed to be a React Hook with all of the constraints enforced on its usage that were outlined above. However, in ChatGPT’s code, the execution of the line that invokes it, is only reached via another call, one that within the component in question, is itself inside a conditional statement. Basically, this code violates the rules.
Rising Action
I pointed out the error. ChatGPT responded with an apology and asked that I allow it to correct its mistake.
In attempt number two, it simply violated the same rule but in a more obvious and direct manner.
I pointed out that it got it wrong again, it again apologised for the oversight and confidently declared that it understood the error and would without doubt correct it.
Attempt number three was almost identical to attempt number two.
I patiently explained the error and the rule it violated again. ChatGPT again apologised and stated bullishly, with absolutely no loss of authoritative bluster, that this time it would correct its mistake.
Attempt number four was a convoluted example but one that amounted to the exact same violation by invoking the hook conditionally.
Climax
These steps were repeated two more times. After its sixth attempt, where it made the very same error, I thought to ask if it could recite the rules for using React Hooks. Here's what it said (you can safely skip to rule 4):
Here are some guidelines to keep in mind when using hooks:
1. Use hooks at the top level: It is recommended to use hooks at the top level of a functional component to maintain consistent order and ensure hooks are called in the same order on every render.
2. Use hooks within custom hooks: You can create custom hooks that contain multiple hooks and other logic, allowing you to encapsulate and reuse complex behavior.
3. Use hooks within event handlers or effect callbacks: Hooks can be used inside event handlers or effect callbacks, allowing you to manage state or perform side effects in response to user interactions or component updates.
4. Avoid using hooks conditionally: Hooks should not be used conditionally within a component. Hooks should be invoked unconditionally at the top level of a component to ensure consistent behavior and avoid issues with state management.
In this more verbose rendering of the rules, the one ChatGPT’s attempts violate is no less clearly stated. There it is, right at the start of rule four. I pointed out how its attempts were in clear violation of rule four. It agreed most vehemently, apologised and as ever, full of the now familiar authoritative bluster, confidently declared it would absolutely get it right from now on.
Attempt number seven was identical to attempt number one.
Dénouement
Ready to give up on this seemingly pointless exercise, fearing it could otherwise go on like this for all eternity, I had an idea for one last thing I could try. I suggested that it change its Hook function in accordance with the idiomatic naming convention for React Hooks, which include a “use” prefix. So in this case “myHook” becomes “useMyHook”. Granted, this took me far too long to think of!
Following this suggestion, attempt number eight was 100% correct.
From then on, while using the naming convention, it continued to produce correct responses nearly every time. BINGO!
Epilogue
What happened here? What does this exchange tell us? Why did ChatGPT repeatedly fail to output a correct answer, despite being provided with explanations pointing out where it was going wrong, and even after confirming that it had access to the set of rules it should follow? And why then, simply by being prompted to follow a naming convention, was it subsequently able to consistently produce correct answers?
The repeated incorrect output, even after feedback each time pointing out where it had made the error and how to correct it, and the assertions of understanding and agreement in response to that criticism, followed immediately by yet another failure, adds further support to the argument that there is no actual understanding here. ChatGPT does not in fact understand anything. It neither understands the initial prompt, nor its output. Nor does it understand the feedback that is provided on any response.
Of course, the level or nature of understanding that AIs attain need not match that of humans for them to be useful. That said, the absence in this example should not be dismissed as merely a qualitative shortfall by claiming that ChatGPT understands but in a way that is somehow different to humans. This is more accurately interpreted as a quantitative shortfall. It made no difference how many times I directly pointed to what was incorrect in its output and how to fix it. The absence of understanding rendered the system unable to act on the feedback. It simply re-shook the dice and assembled another incorrect answer.
The behaviour and output that can mistakenly be interpreted as signs of understanding are, in reality, little more than statistical illusions presented with a veneer of authoritative bluster. Because it does not understand while presenting the impression that it does, ChatGPT is in fact, always lying. Even when its output is true. This exchange adds further anecdotal support to the view that, as Dan McQuillan has said, ChatGPT is, in technical terms, a “bullshit generator.”2
Melanie Mitchell, the Davis Professor of Complexity at the Santa Fe Institute, in a piece for Quant Magazine from 20213 (shortly after the launch of ChatGPT 3), blames this lack of understanding on a need for Infants’ Metaphysics4.
The crux of the problem, in my view, is that understanding language requires understanding the world, and a machine exposed only to language cannot gain such an understanding. Consider what it means to understand “The sports car passed the mail truck because it was going slower.” You need to know what sports cars and mail trucks are, that cars can “pass” one another, and, at an even more basic level, that vehicles are objects that exist and interact in the world, driven by humans with their own agendas.
What happened in the example detailed above, when ChatGPT was searching within its training data for the solution to the proper use of React hooks, is that the area of possibility space through which it traversed for the first seven responses, contained a low probability density of code examples that happened to adhere to the rules. The suggestion of using the idiomatic naming convention gave it the nudge it required. The naming convention took its traversals along fresh pathways into a region of possibility space containing a far greater probability density of pathways resulting (via next-word prediction) in the construction of answers correctly adhering to the rules.
What matters deeply about this example is that there was a request made, a desired output expressed, which was met with an appearance of understanding and compliance. The response was delivered with all of the certainty and air of authority one would expect from an expert possessed of a well-founded and powerful confidence in their knowledge and the degree to which their answer delivered the perfect fulfilment of the desired output.
Crucially, this exchange could well have not progressed beyond an initial question and an initial answer that was immediately accepted as correct. Indeed, more often than not, exchanges such as the one described above will stop at that first answer. The first answer will most often be deemed to be the fulfilment of the expressed desire.
If on any given topic, an LLM’s training dataset encapsulates a comparatively greater amount of incorrect or false information than it does correct information, then incorrect output will be statistically more likely than correct output.
Once the model emerges pre-trained from the dataset it is subjected to various fine-tuning. While the initial pre-training is bottom up and emergent from the dataset, the fine tuning is a more top-down process that tunes the model in accordance with specific criteria. One such fine tuning approach is Reinforcement Learning from Human Feedback (RLHF). These fine-tunings and sanitisations, these filterings or top-down mappings of the initial territory, can never correspond to it 1:1. While the fine-tuning will impact the stochastic pathways, it will surely always be insufficient to meaningfully shift the statistical distributions within the initial pre-training emergent from the dataset.
The fine-tuning phase will serve to remove some of the more abhorrent content present in any large-scale harvesting of the Internet as a whole5. However, in addition to relying on transient, low wage labour, this inevitably involves —as is true of any moderation process— an imposition of the ideology of the company developing the AI onto the model. In other words, while sanitising it of toxic content, the fine-tuning phase also constitutes a culturally hegemonic debasement of the original dataset.
While control and scrutiny of the processes of gathering the training data and the subsequent fine-tuning of the emergent model, remains solely in the hands of tech behemoths like Google, Facebook and even the Microsoft backed OpenAI, what remains in these tools after the tuning phase can be termed “the master’s lies”. If a handful of companies come to dominate the AI that the majority of us use, the consequences are all too predictable. As Audre Lorde taught us,6 The Master’s Tools Will Never Dismantle the Master’s House. Or in the case of LLMs, The Master’s Tools Will Never Denounce The Master’s Lies. Such dominance over the tools we use both to access and create content is deeply problematic. Particularly if accurate facts or minority cultural viewpoints are in fact often more difficult and far more time consuming to reach within these tools than lies, inaccuracies and the views of the cultural hegemony. This somehow reminds me of the problem of headline lies only ever being corrected by fine-print retractions, that has so polluted our media.
Postscript
We can further conclude from this specific exchange that the homogenisation of the fruits of human labour that get fed into AI systems appears to be crucial to their utility. Which may suggest the presence of a force that will work against the baking of a diverse representation of cultural perspectives into the models.
As coders and software engineers, or anyone working with these tools for that matter, as long as the content the tools have consumed is homogeneous and consistent with our individual perspective and desires, then LLMs will be far more likely to nail responses to our prompts —that is, of course, providing we remember to supply the required symbolic nudges in order to ensure they traverse the paths through possibility space resulting in a higher probability of success.
In other words, when it comes to getting things right, as opposed to being creative —merely by sprinkling a random seed into its traversal through possibility space— LLMs function far more successfully on tasks where there’s a good level of factual unanimity and stylistic homogeneity in the area of the corpus that a prompt input into it interrogates. If opinion is divided on the answer to your question, or your cultural perspective does not align with the hegemonic cultural perspective baked into an LLM, then it is less likely you will be able to traverse the stochastic pathways to a fulfilment of your desire.
Another way of looking at this is perhaps to surmise that the problems in the output of LLMs mostly arise if we expect ChatGPT and LLM-based generative AIs in general, to output accurate information, and presume it to be objectively correct. If instead we view these tools as applied statistics machines, capable of reflecting back to us statistically probable and prevalent opinions or patterns within the hegemonic cultural perspective baked into them, their output can perhaps be taken with the appropriate pinch of salt. Seen this way, their output becomes a kind of data visualisation or manifestation of an otherwise hidden cultural hegemony. This is not the function for which they are being hyped, nor is this of course the way most who are already using them are interpreting their output.
The central challenge faced when using an LLM or any generative AI, is that you must find some way to reach your desired output by bridging the gap between your prompt and the statistical prevalence of anything like your desired output within the dataset.
The gap between your prompt and your desired output will depend on how well your prompt captures the essence of your desire. However, the quality of articulation is not objective. Its efficacy will be dictated by the hegemonic perspective encapsulated in the statistical distributions within the LLM. More specifically the challenge you face is to compose your prompt to steer the traversal through possibility space into an area with a reasonable statistical probability of resulting in something which fulfils your desire. What’s more, as any who have used these tools will likely have noticed, parts of your prompt may pull in one direction, while others pull in another. What I mean by this is that even the most subtle internal contradictions within your prompt, whether logical, thematic or stylistic, will veer traversal across latent space in competing directions and so inhibit the probability of fulfilment.
This gap, the challenge, will of course be asymmetric and inequitably distributed across society. The flow of probability in the output from these systems will be towards the hegemonic cultural viewpoint of the companies developing them. This gap between prompt and desired output will be far more difficult to bridge for minorities not well represented within the dataset or where prejudices and biases perpetuated in the dataset and/or imposed during the fine-tuning, will function against their ability to nudge these tools into statistically probable fulfilment of their desires. This is The Crisis of Representation7, the gap between language and experience as articulated by Postcolonial, Feminist and Post Structural Theory. In the sense of the cultural, social, racial8, gender, and ideological bias baked into the models9, and also in the sense that —due to the fake appearance of understanding— all output from generative LLMs is a lie whether it is accurate or not, the master’s lies are currently getting hard-coded into the very heart of the next generation of tools. Tools that will become gatekeepers to the coming economy.
We have long faced a similar challenge in honing our Internet search terms. When conducting an Internet search, or when we “search up” something as my kids say, we must articulate our desire in adherence to the whims of a single company10, whose inscrutable and now vastly complicated page ranking algorithms will determine the fulfilment we receive. While that system, the Googleplex, is an increasingly attention based technological re-instantiation of the Advertising Industrial Complex, it’s not yet clear how commerce will directly influence the paths we traverse through the possibility spaces accessible via these generative AIs. I shudder at the thought of an LLM based version of Search Engine Optimisation. But just as the once pure page ranking algorithms of the Googleplex have now been reconfigured to trace the dominant pathways of commerce, it is perhaps only a matter of time before LLMs are integrated into the Advertising Industrial Complex as a kind of super augmentation of Pagerank, Surveillance Capitalism and the Tyranny of the Recommendation Algorithm (more on this in another post).
LLMs can however, already provide us with fresh insight into the statistical distribution of words or linguistic tokens and perhaps by extension, the statistical spread of opinions, within the dataset it has ingested. Ideally, this statistical distribution would be explicitly exposed somehow, and displayed alongside all output from an LLM. Being able to discern the level of consensus on any given topic is after all clearly crucial in calibrating the level of trust one can apply to an LLM’s responses.
When used as a tool for the plumbing of collective opinion on the general thrust of your thesis and/or specific aspects of it, LLMs may prove highly effective. When used as a sounding board in this manner, the veracity of its responses matters less than the fact that what it reflects back to you is somehow representative of the average opinion so far expressed (as manifest in the training corpus) within the topic area you are exploring.
When you bear in mind just how crucial insight into the statistical distributions within the training dataset may be, the ramifications of LLMs’ inability to identify their own output or that of other LLMs begins to come into focus. The last thing you want is for processes that result in output arising from the statistical distributions within the training corpus, to then feed back into the next training corpus, and so alter the results that arise from the now distorted statistical distributions, and so on, and so on —this is reminiscent of Ted Chiang’s comparison of LLMs to a blurry JPEG of the Internet11 and the amplification and reinforcement of compression artefacts that occurs when repeatedly reapplying JPEG compression to an already compressed image.
Today, software engineers are already strongly encouraged to adhere to ever more narrowly defined code patterns and styles, but there’s a new, additional homogenising force that is about to be applied to coding practices —and other creative endeavours set to be invaded by these tools. At least while LLMs’ limitations as next-word prediction machines persist, humans will be subject to a subtle force pulling them towards adhering to the most standard patterns. Those that surrender to these homogenising forces will be supported in their efficiency if not their creativity by the LLMs integrated into their tools, so the need to outcompete their peers will apply pressure to conform. As we already know, homogenisation is a precursor to division of labour which will then be inexorably followed by automation.1213
I’ll leave you with this quote from a piece by Dan McQuillan that he posted in February 2023, that remains one of the more insightful pieces on ChatGPT and the nature of LLMs and worthy of your attention.
It turns out that AI is not sci-fi but a techologised intensification of existing relations of labour and power. The $2/hour paid to outsourced workers in Kenya so they could be "tortured" by having to tag obscene material for removal14 is figurative of the invisible and gendered labour of care that always already holds up our existing systems of business and government.
This exact formulation of the question continues to lead to the same output and the subsequent exchange documented here, even when using ChatGPT 4. Though it must be said that other, more verbose articulations, as would be expected, successfully bump ChatGPT into traversing pathways that result in correct outputs. I later found that even just adding the word “custom” before the word “hook” in the question, was sufficient to elicit a correct response. There being multiple routes to a correct answer in no way changes my assessment of the absence of understanding arrived at using the question as formulated in this account.
We come to bury ChatGPT, not to praise it, by Dan McQuillan https://www.danmcquillan.org/chatgpt.html
What Does It Mean for AI to Understand? by Melanie Mitchell https://www.quantamagazine.org/what-does-it-mean-for-ai-to-understand-20211216/
Infants’ Metaphysics: The Case of Numerical Identity, by Fei Xu, Susan Carey https://www.sciencedirect.com/science/article/abs/pii/S0010028596900055
Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic, by Billy Perrigo https://time.com/6247678/openai-chatgpt-kenya-workers/
The Master's Tools Will Never Dismantle the Master's House, by Audre Lorde https://collectiveliberation.org/wp-content/uploads/2013/01/Lorde_The_Masters_Tools.pdf
Arguments forwarded in the 1986 book "Writing Culture: The Poetics and Politics of Ethnography" highlighted The Crisis of Representation by critiquing anthropological authority and arguing that field reports (such as this account) are not transparent windows into "other" worlds but are instead crafted narratives. https://www.ucpress.edu/book/9780520266025/writing-culture
Large language models propagate race-based medicine, by Omiye, J.A., Lester, J.C., Spichak, S. et al https://www.nature.com/articles/s41746-023-00939-z
Unravelling the Intricate Web of Biases in LLMs, by Anwesha Paul https://www.gnani.ai/resources/blogs/unravelling-the-intricate-web-of-biases-in-llms/
Search Engine Market Share - Approximately 92% of global Internet search queries are handled by Google https://www.oberlo.com/statistics/search-engine-market-share
ChatGPT Is a Blurry JPEG of the Web, by Ted Chiang https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
The Automation Charade, by Astra Taylor https://logicmag.io/failure/the-automation-charade
Origin Stories: Plantations, Computers, and Industrial Control, by Meredith Whittaker https://read.readwise.io/search/read/01hbf5p27bc5yw1fqwkjsah4by
Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic https://time.com/6247678/openai-chatgpt-kenya-workers/