So, About That Program That Passed The Turing Test…

By Joelle Renstrom | Updated

This article is more than 2 years old

turing testMost tests are pretty straightforward: either you pass or you fail, with little grey area in between. Apparently, that’s not true for the Turing Test. Recent reports that Eugene Goodman, a computer program that adopts the persona of a 13-year-old Ukranian boy, passed the Turing Test are now being challenged by skeptics — or perhaps by people envious that their systems weren’t the first to pass the test.

Eugene Goodman reportedly passed the Turing Test — a five-minute typed message chat with the judges — administered by the Royal Society in London by fooling 33% of the judges (10 out of 30) into thinking it was human. That would seem to meet the requirements outlined by Alan Turing, who devised the test. But some disagree.

Imperial College London cognitive robotics professor Murray Shanahan recently told Buzzfeed that “it’s a great shame it has been reported that way, because it reduces the worth of serious AI research. We are still a very long way from achieving human-level AI, and it trivializes Turing’s thought experiment (which is fraught with problems anyway) to suggest otherwise.” It would seem that Shanahan’s problem is with the test itself, which I think pretty clearly doesn’t test sentience or thought — it tests how convincingly a computer can mimic a human in text chat, which isn’t nearly the same thing. Shanahan seems to regard Turing’s test as a “thought experiment” — a musing on what would prove a machine’s ability to think, or how machine intelligence could be measured or quantified, rather than the standard by which those qualities should be judged. Those are all fair points, though they don’t necessarily support a case that Goodman didn’t pass the test — they merely raise the point that we should reach different conclusions about what passing the test means.

So why might Goodman’s performance not constitute passing the Turing test as it’s written?

Articles about the Turing Test almost always say that the machine has to successfully fool a judge 30% of the time. But what I keep reading is that the Turing Test fooled 33% of the judges, which isn’t the same thing. And neither is what Turing actually said:

I believe that in about fifty years’ time it will be possible, to program computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

It’s obvious where the 30% comes from, but Turing doesn’t talk about the number in terms of passing the test. The context is what he expects to see in the year 2000. And while Turing mentions five minutes, he doesn’t suggest limiting the test to five minutes. A five-minute conversation is very different than a 15-minute or 30-minute one. The longer the conversation, the more a machine needs to be able to “think” to continue the volley. Shanahan calls this criteria “weak,” and says Turing never meant to suggest that meeting these constituted thinking.

Some think Eugene Goodman’s chatbot status disqualifies it. It’s a computer program, not a computer or robot, and some argue that it’s not artificial intelligence, but to me that seems to beg the question — isn’t intelligence the very attribute the Turing Test is supposed to measure? But if chatbots qualify, then a 48-year-old chatbot named PARRY would have already taken the title, as it apparently fooled almost half of the tested psychiatrists by adopting the persona of a paranoid schizophrenic. It doesn’t surprise me that assuming the personality and character of a human is a particularly successful technique — and I do think it’s different than actually thinking or being sentient.

One of the best arguments for saying a computer hasn’t passed the Turing Test yet is to keep it that way. It’s still a race, still anyone’s game, still a pursuit of a science fiction future. And in a lot of ways, that’s more fun than passing any tests.