Writing with Large Language Models ⁂ starbreaker.org

Despite being a software developer by trade, I am generally against the very existence of large language models like OpenAI’s GPT for several reasons:

These models are trained on copyrighted work without the informed consent of copyright holders.
Authors whose copyrighted work is used as training material are not compensated.
They don’t cite their sources so that we can double-check their output.
Their output is no better than a first draft written by an indifferent middle school student who would rather be out with their friends.
They will bullshit you instead of admitting to being unable to answer a question.
It seems they are being used to automate and devalue human creativity.
The richest among us cannot manage to conceal their eagerness to use them to eliminate paying jobs and suppress wages.

Of course, nobody asked me. And I will no doubt be obliged to adapt to the existence of LLMs if I wish to remain employable as a software developer. Nevertheless, I would not want to use one to build a personal website, write my for my site, or write fiction.

The LLM in My Head

I already have, embedded in the creases of my brain and baked into neuron upon neuron, something akin to a large language model. I built it over years and decades of reading, beginning when I taught myself to read as a toddler. And with that corpus of text I have developed other qualities that — I hope! — distinguish me from the babbling ghosts in OpenAI’s machines: discernment, discretion, and a sense of pride in craftsmanship that demands that I provide proper attribution when I quote the writings of somebody more accomplished than me.

I did not build mental model by mass automated copyright infringement, ingesting the text of hundreds of thousands of books, novels, poems, and papers over the course of mere hours. I read, and reread, a work at a time. I acquired every text lawfully by buying new copies, buying used copies, borrowing copies from a public library, receiving them as gifts, or providing a home to books whose prior owners could or would no longer keep them. I thought upon what I read, making the author’s ideas my own when I agreed with them and discarding old, once-cherished ideas should I happen across facts or new ideas that made better sense. I thus expanded my own mind, remembering all the while that I am part of a wider culture, that my ideas are not wholly own, but developed in a silent dialogue with other writers speaking to me through their books across space and time.

Admittedly, the process was not so swift or dramatic as when Frankenstein’s creature — poor, abandoned, misshapen thing he was — taught himself how to be human by reading from a chest of tomes found abandoned by the roadside, but I suspect Mary Shelley herself knew better than to think that this was how people educated themselves. I think that what she had meant to show was that this ‘monster’ became a man by exposing himself to the thoughts of other human beings and integrating them into his own psyche.

This slow accretion of ideas, acquired from the culture of which one is part, is part of what shapes and grows a human soul. At least, this is what I believe, for no religion deigns to explain in detail the means by which God endows us with souls. If we are indeed born with them, they are as unshaped as raw stone hewn from the mountainside, and only take their final forms after years of being sculpted by education and experience.

However, Molly White suggested a use for LLMs in her blog post Rubber duck editing with LLMs that I find rather less objectionable. She had experimented with using a LLM, not to create new text from a prompt that she had supplied, but to analyze text she had written herself and improve upon it.

Ms. White’s account of her experience suggests a use for AI that does not replace writers, or even editors. LLMs, I suspect, would be a reasonable step up from existing functions in word processors and text editors to check spelling and grammar. They are still not a substitute for working with an experienced human editor, but perhapse they could be used to create a better draft less likely to waste such an editor’s time.

The Virtual Editorial Assistant I Want

Frankly, I would love to be able to use in good conscience tech like GPT-4o or the upcoming ‘Apple Intelligence’. However, I don’t want to feel like a heel afterward. I would like to retain the ability to take pride in my work.

As I wrote earlier, I don’t want a virtual assistant that can ghostwrite for me, given a prompt. Nor do I want one that can write ‘boilerplate’ code for me. What I want as a writer, instead, is something of which I could ask questions like:

Given the following passages, what meaning might a reasonably intelligent person infer?
Given the following passage, can you find any metaphors or allusions likely to go over an American reader’s head?
Given the following dialogue, point out instances where the characterization is inconsistent.
Given the following scene, what might a reader reasonably expect to happen next?
Given that $CHARACTER is from $REGION, am I getting their dialect/slang right?

Why these questions in particular? If I cannot convey my intended meaning, then I have failed as a writer. I figure that if a machine can make a show of ‘understanding’ my intended meaning, then other human beings might reasonably be expected to do so as well.

“Dude, just use ChatGPT.”

Perhaps GPT-4o can already do these things tolerably well, but there is the provenance of its training data to consider. I do not consider OpenAI or its competitors to be any more ethical than Facebook, Google, Amazon, Microsoft, Disney, Exxon-Mobil, the Roman Catholic Church, the Republican Party, or NAMBLA. I don’t trust anything that comes from Silicon Valley and is backed by venture capital.

As far as I’m concerned, the only difference between Sam Altman and Sam Bankman-Fried is that the latter was charged, tried, and convicted by a jury of his peers. The law in all its majesty must consider Altman innocent until proven guilty, but l’État, ce n’est pas moi. I have no legal or moral authority to deprive others of life, liberty, and property — and thus no obligation to presume innocence.

My requirements for an ideal virtual assistant are fairly stringent, and might not be practical.

Its initial corpus must be exclusively public domain work.
It must be Free Software.
It must run locally on GNU/Linux on an Intel i7 processor in no more than 24GB of RAM.
It must be accessible over a local area network, without depending on an active Internect connection.
It must be capable of accepting new public-domain works as their copyrights expire.
It must be capable of adding my writing to its corpus, should I elect to permit this.
It must remember prior interactions with me unless I instruct it otherwise.
It must not be subject to outside control by its developers, the US government, or God Himself.
It must not presume to know better than me, or consider my inputs immoral, unethical, problematic, or otherwise objectionable because it is not the business of a computer to judge its operator.
It must integrate with GNU Emacs, or at least provide a POSIX-friendly command-line interface.

I don’t expect such a virtual assistent to ever come to market. It would most likely not be profitable. Nevertheless, I have no intention of settling for what the likes of OpenAI are currently offering. As far as I am concerned, anything I might create with the assistance of such products would be fruit of the poisoned tree, no different from a prosecutor’s case built on illegally-obtained evidence. I would not be able to claim such work as my own, or make a credible claim to originality.

In the meantime I will do the best I can on my own, without friends, without community, and without a LLM. As long as I can read and think, I can write. It would be better this way, I think. Alone, relying solely on myself, this is my work done my way.