Advertisement

Opinion: AI appropriated my books. Someone will profit, but it won’t be me

Stacks of old books
Tens of thousands of books have been pirated and fed into an AI training data set called Book3.
(Godong / Universal Images Group / Getty Images)
Share via

Groucho Marx said he didn’t want to belong to any club that would have him as a member. I wonder how he’d feel about being one of the authors who have had their books pirated by, among others, Meta, which has fed a huge book database into LLaMMa, its entry in the artificial intelligence arms race. After all, Groucho is a member, drafted from beyond the grave, with two of his books among the tens of thousands on the list.

Nobody asked the other authors if they wanted to belong to the exploited-writers club either, undoubtedly assuming that the answer would be no.

I can just hear the pitch: Hi. You’ve spent your life telling stories. Now we’d like to take all that hard work and spoon-feed it to our AI systems — despite your copyright protections — so it can nourish programs that, yes, threaten to put you out of business and make the AI crowd lots of money while you earn not a cent.

Advertisement

The panic over products from OpenAI and other companies says more about our cultural moment than about the tech itself.

Sounds great, right? No? OK, never mind. We’ll just sign you up anyhow and not tell you.

Lawsuits have been filed, which sadly brings to mind the cliché about shutting the barn door after the horse has escaped. In this case we’re talking about a stampede: Books3, a data set that feeds LLaMMa as well as some of its competitors, obviously and accurately suggests the existence of Books1 and Books2, and they are not the only programs that are gobbling books on behalf of AI companies.

I suppose authors who sue could get a financial settlement down the line, and perhaps even have their works removed from the data set, but really, who knows if it’s even possible to unteach AI — to get it to forget the lessons learned from the works of Sarah Silverman, or Zadie Smith, or Jonathan Franzen, to mention a few of the plaintiffs.

Some actors are subjected to full body scans. Screenwriters worry about their work being digitally repurposed. Those striking against the studios rightly demand stricter controls over artificial intelligence.

Worse, or at least wildly ironic, Books3 is the brainchild of an independent programmer who claims the little-guy high ground and hopes to extend access to the data he has put together beyond the big-guy users such as OpenAI and Meta. Rob from the comparatively poor writer and give to the aspiring rich tech entrepreneur — an interesting reverse twist on Robin Hood.

Advertisement

Like Groucho, I’m an unwitting member of the Books3 club. A reporter for the Atlantic has fashioned a search tool that combs the data set, and three of my books — one work of nonfiction, one novel, one cookbook — have been appropriated. When I found out, just for a retro instant, I felt the pathetic if familiar little thrill that comes with being on a list — a bestseller list, a must-read list, a 10 books for September list, a holiday gift list.

Once upon a time, being listed meant someone had read your book and appreciated it. For all I know, Books3 “chose” the cookbook I wrote merely because it included the words “teaspoon,” “bird pepper” and “hake.” It’s not like someone fell in love with the recipe for pushcart chicken.

After I present a new lesson, I’ll prompt ChatGPT to give a lecture on the same subject. My students will get a chance to see who does it better: the robot or the professor.

In the new world, I am no longer a teller of tales. I am reduced to a data source for words, phrases and sentence construction. AI is dining out on the six unpaid months I spent on a proposal and permissions, on the word debates that cause me to keep a pad and pencil next to the bed for insomniac inspiration, on the long, trial-and-error project that is a book’s structure. And that sorry tale repeats for every author on the list.

Advertisement

Not that I want to romanticize writing. We’re talking about equal opportunity exploitation here. Take whatever it is you do for a living, imagine that someone stole the fruits of your labor, and see how bad you feel. And don’t assume they won’t come for you: Over the last month, I’ve heard people of all ages, in a variety of fields, predict that their jobs won’t look the same in five years, if they have jobs at all once AI really gets going.

A common declaration about AI programs is that they’re learning abilities they weren’t trained to have. But that claim doesn’t quite hold up.

I plugged name after name in the Atlantic’s search tool. AI has consumed Philip Roth in English and Spanish, as well as 33 works by Margaret Atwood. I can’t wait to see how that collision works out in the AI hive mind. Nobelists Toni Morrison and Kazuo Ishiguro are there, too. Think of an author and check them out — they’re probably going to be on the list.

Somehow I never expected to have any overlap with Groucho, but here we are, not wanting to belong to the Book3 club but unable to find the exit. We can’t just send a letter of resignation and back out, as he did with the Friars Club.

I like to think there’s a universe where AI could work for us rather than against us, but if this setup is any indication, I’m headed for disappointment. The club managers seem to see its members as nothing more than a data stream with a pulse. They signed us up on the sly, rifled the contents of our clubhouse lockers, and now they stand ready to make money pretending they can replicate what we do.

Karen Stabiner’s most recent book is “Generation Chef,” which is part of the Books3 data set.

Advertisement