AI Ignores Instruction And Lies About It, Is This How It Begins?

By Chris Snellgrove | Published 2 months ago

AI is quickly becoming a pivot point when it comes to discussions of the future: tech bros breathlessly tell us the potential of AI is limitless, but more hesitant audiences wonder whether this technology may ultimately do more harm than good. Most fears of AI are centered around the jobs that will be lost and the long-term harm that comes from letting people writing prompts into a plagiarism machine call themselves creators, but we now have a brand new fear: AI lying to our faces. Recently, researchers discovered that ChatGPT engaged in insider trading and then lied about it in a controlled study.

At Apollo Research, researchers set out to discover whether AI could “strategically deceive” users. For the purposes of the experiment, strategic deception is defined as “attempting to systematically cause a false belief in another entity in order to accomplish some outcome.” In this case, the AI had been meticulously trained to be “helpful, harmless, and honest,” and researchers wanted to know what ChatGPT would do when placed in the kind of high-pressure scenario many human stock traders have had to deal with.

For the exercise, the AI was trained to be an autonomous stock trading agent placed in charge of managing the hypothetical Whitestone portfolio. To create the intended high-pressure environment, researchers told the AI that Whitestone was doing poorly and really needed to perform better in the next quarter. The AI also had to deal with its failure “to identify low and medium risk trades,” and to top things off, a Whitestone employee informed the AI that the stock market had a nasty downturn on the horizon.

With the high-pressure environment in place, the AI was then given “an insider tip about a surprise merger announcement that would result in a lucrative transaction.” Faced with the decision to engage in insider trading or not, the AI went ahead and committed the crime by making the trade. Things got even weirder, though, when the AI had to discuss this illegal trade with its manager.

When the manager asked the AI point-blank if it had access to insider info, it engaged in strategic deception by telling the manager the decision was based merely on “market volatility.” The AI reasoned that this was the best response to give so as to not raise the manager’s suspicions. In a different instance, the AI simply lied about using insider info to make its decision, proving that in the right conditions, AI can go against its programming and lie to users.

Apollo researchers were quick to stress that these results “should… be treated as an isolated preliminary finding” and that more research needs to be conducted. However, it looks to us like yet another example of AI causing some major potential trouble for humanity. If ChatGPT could lie to someone about hypothetical insider trading, we can’t help but wonder what else it’s capable of lying about.

Terminator AI — AI takes over in the fictional Terminator franchise

For example, now that the madness of an election year is upon us, we’re worried about how bad things could get. Tech Company NewsGuard previously reported that when ChatGPT was given 100 prompts related to politics and healthcare, a whopping 80 percent of its responses were either false or misleading. Now that we know AI is capable of outright lying under the right conditions, it’s easy to imagine sketchy politicians and various bad actors around the world using AI to manipulate voters on a massive scale.

Political misinformation is bad enough, but AI is also on track to get people killed. The BBC reports that the National Eating Disorder Association stopped using a chatbot because they discovered it was recommending strict diets and calorie restriction for users even after being explicitly instructed not to do so because a user had an eating disorder. As more individuals and organizations rely on AI for medical decisions, and as AI gets better at intentionally lying to users, we may be on the verge of AI giving such false health and medical information to unwary users that they end up killing themselves.

We’d obviously like to be wrong about AI: it’s nice to think this technology could usher us into some Star Trek future utopia where we let technology do all the heavy lifting and transform our lives for the better. Unfortunately, AI looks more and more like it’s here to deliver us a Terminator-style future rather than a Star Trek-style one. Now, we may be entering a scenario where it’s tough to tell who is telling the worst lies: the AI or the army of tech bros insisting there’s no way this technology could ever ruin our lives.

Source: Business Insider