Should You Trust GPT-4? We Asked a Human.

Written by Synaptiq | Mar 31, 2023 3:18:56 AM

Large language models like Chat-GPT and, more recently, GPT-4, have garnered viral attention. They perform a variety of tasks — like writing, coding, and summarizing information — much faster than human workers, without needing rest or (much) compensation. Early adopters praise their ability to accelerate and automate work, but skeptics warn that there's more to these models than meets the eye. Would you trust them to do your work?

You already trust artificial intelligence.

We trust artificial intelligence to make choices for us. We trust AI to curate our social media feeds, our search results, and even our movement. Delegating these choices has become so easy, intuitive, and habitual — we seldom think to question the results. If you think you’re the exception, ask yourself:

Do you take the route [insert favorite navigation app here] doesn’t recommend?
How often do you search for the answers [insert most used search engine here] doesn’t give you?
How often do you see the posts [insert procrastination-supporting social media app here] doesn’t show you?

Ironically, whether we trust AI to make choices for us is itself a choice. But we seldom make it consciously — let alone democratically. Political Theorist Langdon Winner warns that technological change involves two choices (1) whether a technology will be developed and (2) what it will look like, exactly. That begs the question: Who gets to make these choices? The author of this blog post wasn’t invited to vote in the AI referendum. Were you?

Research suggests the average person can’t reliably recognize AI, much less give it their informed consent. In 2017, the software company Pegasystems surveyed six thousand consumers across six countries: “Have you ever interacted with Artificial Intelligence technology?” Eighty-four percent had actually interacted with AI, based on the devices and services they reported using. Only 34 percent responded, “Yes.”

Ruh-roh.

On the bright side, our trust in AI has yielded tangible benefits for people, animals, and the planet we call home. Nature conservationists have used AI to monitor endangered species, combat the illegal wildlife trade, and detect wildfires. Healthcare practitioners have used AI to anticipate public health emergencies (including COVID-19), accelerate the diagnostic process, and develop life-saving drugs. AI processes data faster than people; it makes decisions faster than we can. Sometimes, speed is a major factor in success, and faster choices are advantageous.

Other times, faster choices are seductively convenient. They’re tempting like a microwave meal after a double shift or a second expresso on a slow morning. Convenience whispers in your ear: Why not skip the ‘blah’ parts of life?

Why not, indeed.

Trusting GPT-4 is a tradeoff: speed vs. accuracy

GPT-4 is the fourth and latest in a series of language models developed by OpenAI. Put simply, language models use a branch of AI called machine learning to predict the probability that any given sequence of tokens (basically, language units) is the appropriate response to a user query. GPT-4 is so exceptionally adept at identifying the appropriate responses to user queries, its responses could be mistaken for human-generated.

Put to the test by OpenAI, GPT-4 scored above the 80th percentile on a battery of standardized exams, including SATs, GREs, LSATs, and even the Uniform Bar Exam (with one exception; it placed in the 54th percentile on the writing section of the GREs). It can hold a conversation, interpret images, write text, and code in multiple languages. The New York Times reports, “It’s close to telling jokes that are almost funny.”

These achievements are tempered by critical flaws. For one, GPT-4 hallucinates.

“GPT-4 has the tendency to 'hallucinate', or produce content that is nonsensical or untruthful,” cautions OpenAI. Hallucination (a.k.a., “making stuff up”) was also a problem for GPT-4’s predecessors. For example, Chat-GPT, which was released several months prior to GPT-4, produces factually incorrect responses to user queries about 20 percent of the time. That's according to the developers of the Chat-GPT fact-checker, TruthChecker.

GPT-4’s hallucinations are infrequent, but their delivery is eerily convincing.

Imagine you’re an office worker who hates writing emails. One day, you’re assigned an intern: “GPT-4.” Naturally, you put them to work writing your emails. You proofread the first ten emails. Finding no issues, you skim the next ten. Eventually, you get comfortable enough to send emails without checking them. Everything is great — until your boss calls you to her office. She’s livid about an email you sent (and GPT-4 wrote). You have no idea what it says.

Alternatively, imagine you’re a software developer for a social media platform. Your intern, GPT-4, debugs your code. One day, you push to production, and the platform crashes — catastrophe! You’re fired when your supervisor finds what caused the crash: a faulty “correction” by GPT-4, which you accepted without a second thought.

So, can you trust GPT-4? It depends. On the one hand, trusting GPT-4 to make choices for you is convenient (especially when speed is a major factor in success). Early adopters have used GPT-4 and its predecessors to choose how they work out, what they eat, and the contents of their emails and college applications, for example.

On the other hand, trusting GPT-4 can be risky when accuracy (or something else) matters more than speed. Writing an email to your boss? Proof-read it. Making choices with significant ramifications? Trust yourself.

THIS BLOG WAS WRITTEN WITHOUT LARGE LANGUAGE MODEL ASSISTANCE.

Photo by Kamil Pietrzak on Unsplash

About Synaptiq

Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation.

You can learn more about our story through our past projects, blog, or podcast.

View full post