Phil vs. LLMs

Last year, my friend Ed made a post regarding the nascent proliferation of ChatGPT and competitors into various search engines and other products. With a moment’s contemplation after reading it, I just realized how spectacularly bad this could go if, for example, you went to do a search for an chemical’s Material Safety Data Sheet (MSDS) and a Large Language Model (LLM) gave you back some bullshit advice to take in the event of hazmat exposure or fire.

NOTE: I refuse to use the term AI or even generative AI to describe LLMs. They are glorified versions of Dr. Sbaitso at best.

Your vanilla search for a normal MSDS will return several of varying qualities, that you then read and glean information from. Because an MSDS is primary information, it is authoritative. LLM generated instruction is secondary, theoretically deriving from those primary sources, but also prone to fabrication in places where it doesn’t know enough or doesn’t recognize the presentation. The format of an MSDS has a regulatory mandate behind it, though that varies by jurisdiction. The varying quality of MSDSs usually comes in sins of omission, which ChatGPT abhors, not fabrication, which ChatGPT does as a feature. An MSDS may not tell you what respirator to use; ChatGPT will specify a typical filter that is blatantly incorrect through hallucination. So, a LLM return on that same search will give you advice on how to work with that material that may be very, very wrong. 

It’s nice that I have a new thing to add to safety training now, that people should absolutely not use any conversational LLM generated advice unless they are actively seeking a Darwin Award. What happens when you turn this loose on budding makers starting to tinker in their garage, trying to figure things out, and then gets handed some complete LLM garbage in their search? Sure, they could already get human generated garbage on forums & reddit but they may actually be more reliable now by comparison. I shared the mere concept of this with my favorite industrial hygienist. She said “I have enough nightmares already” and closed the Zoom on me.

As it was topical at the time, I fed “how to respond to a vinyl chloride fire” into ChatGPT and it told responders to use a water fog on the water reactive chemical. This would have changed a train derailment/hazmat spill/fire emergency into a detonation/mass casualty/hazmat emergency. A+ performance, ChatGPT, you would have obliterated a town. In fairness, enough water fixes most any firefighting problem but at that point you’ve flooded what remains of a town that has been levelled by explosion and fire.

Human brains melt at the concept of what to do with chemical incompatibles and water reactive substances during a fire. An LLM has no concept of chemical incompatibility, just how to make an answer that is MSDS shaped. Machine learning will train on the typical response, what to do 99% of the time, except sometimes the sound of approaching hoofbeats is not a horse, zebras are more common than you think, and they will kill you. However, an LLM trained to think about zebras is going to return garbage to you most of the time because it has no way to know better because it’s not thinking. The very first thing we teach students to do before they do an experiment is “check the literature” and step one is almost always hit up Google. I grimly await a lab blowing up due to LLM advice thanks to Google’s garbage automatically generated and promoted output.

I’m sure this seems a bit extreme but I want you to think for a minute about something much more mundane, something that happens thousands upon thousands of times a day: a Poison Control call. Except people don’t use the phone much any more, do they? So, they search for the answer of what to do when their kid has swallowed some sort of chemical. If they’re lucky “CALL POISON CONTROL AT THIS NUMBER” will be the first result. If they’re not and they get some prime health care advice from an automatically generated answer, lives are in the hands of an LLM. I can also absolutely foresee a LLM product being sold to emergency dispatch centers to generate fast answers for what to do while waiting for paramedics to arrive.

Anyway, this is what I think about as various companies hitch their wagons to LLMs for no goddamned good reason. Okay, I’m done ranting for now.