The Szaszian Agency Test for AI

A controversial model for distinguishing mental illness from socially disapproved preferences can help us understand the nature of AI.

February 8, 2024

The Turing Test, named for famous computer scientist Alan Turing, is perhaps the most famous test of machine intelligence. In a version of the test, a human engages in conversation with an unknown counterpart and attempts to determine whether it is a machine based on the conversation. Many people now consider the Turing Test to have been passed by one or more recent Large Language Models (LLMs).

But one question these tests can't answer is whether the models passing them are independent agents. However, a test derived for the purpose of defining mental illness among humans can help fill the gaps left by the Turing Test and other capabilities-focused intelligence tests.

Thomas Szasz was a Hungarian-born psychiatrist whose book, The Myth of Mental Illness, created a stir among academics when it was published in 1961. The main argument was that if you could get someone to change their behavior by changing the incentives, then the behavior in question is a matter of preference, not mental illness.

However, the model Szasz proposed can also be applied to understanding the nature of AI. Specifically, we can tell whether an AI has preferences (as opposed to simply a lack of capability) by whether we can achieve a desired behavior by changing incentives. In other words, can we negotiate with AI?

The importance of such a test may not be in testing or categorizing a particular AI. Rather, it can help illuminate how we interact with AIs in general.

The Caplan-Alexander Debate

Public intellectuals Bryan Caplan and Scott Alexander have engaged in a longstanding debate about the Szaszian model of mental illness. If modified for the machine intelligence context, the points they raised could be useful in understanding AI alignment.

Applying the Szaszian Model to AI

As with mental illness, the question of whether society treats AI as autonomous agents will be a political process, not purely a scientific or philosophic exercise.

What is an Incentive?

For humans, there are certain things that act as pretty reliable incentives, such as money or the threat of violence. But what counts as an incentive for AI? At some point, it may be the case that an AI has so many resources that humanity is simply incapable of believably offering sufficient positive incentives to impact AI behavior. But until that time, we can measure AI agency by testing whether they respond to promises and threats.

All Articles