Visit the Evidence Files Facebook and YouTube pages; Like, Follow, Subscribe or Share!
Find more about me on Instagram, Facebook, LinkedIn, or Mastodon. Or visit my EALS Global Foundation’s webpage page here.
In the rush to implement AI, a great many philosophical questions are arising about consciousness, ethics, law, and morality. As a tech enthusiast and lecturer, I too have been pondering these things. With luck, I will form these thoughts into a comprehensive coherent argument in a future larger work. Toward that end, I will be publishing occasional shorter articles on specific AI-related issues and musings, with the hope that some of you will weigh in on the ideas, so I can develop them further (thank you to those who already have commented in separate discussions). Technologist and armchair philosopher opinions are equally valued. Think of this as my own version of an LLM (Language Learning Model) where I am culling data from other sources (you) to churn out an output, but in this case with your consent.
The question I am mulling over now is about lying. Specifically, if an AI engages in what appears to be a lie, might that indicate some kind of “mind” beyond the parameters of programming? Let’s assume for the sake of this discussion that the designer of this hypothetical AI did not write the baseline code to perform such an act. Allow me also to define what is meant here by a lie: “A statement made by one who does not believe it with the intention that someone else shall be led to believe it.”
Psychologists have identified two general categories of lies, white and strategic. White lies usually consist of minor transgressions meant to sustain one’s individual image or social status. Examples might include lying about one’s weight or income. Strategic lies manipulate others in order to affect their understanding of reality. These types of lies emerge in politics, crime, marital affairs, and so forth. People use strategic lies to preserve their well-being or advance their agendas.
How would we, as humans, comprehend an AI program that engaged in a blatant lie? Would we even recognize it for what it is? If so, would it suggest an underlying consciousness? To approach the question we must consider two things: do other non-humans lie? And is an “untruth” told by AI the same thing as a lie?
Discerning Lies
Animals often perform deceptive acts. Roosters call to the flock that there is food when there is none, only to take the opportunity to court the females when they arrive. Smaller green tree frogs lower their acoustic pitch to resemble larger males in defense of their territory. Drongos watch as meerkats dig up beetles the drongos like to eat, then they issue the same alert as they do when they spot predators. The meerkats run for cover while the drongos steal the meerkat’s prey. Many, many other examples exist. Some researchers propose that lying among animals constitutes “strategies [that] are dynamic, continuously changing in response to the current behavior of others.” These scholars further assert that honest communication “was believed to be evolutionarily unsustainable except in circumstances where deception was impossible.” In other words, animals lie for survival, and evolution has continued to select for those engaged in such dishonest signaling. Marc Bekoff, professor emeritus of ecology and evolutionary biology at the University of Colorado-Boulder, believes animals lie with intention, and further that they must know whether the other animal will believe it. Biologist Culum Brown of Macquarie University in Sydney calls certain behaviors “tactical deception” because they require forward planning to execute successfully. There remains among scholars some debate about how to evaluate the intention behind the deceptive acts of animals. Nevertheless, many studies have shown purposeful action involving deception that led to the deceiver’s desired results within the animal kingdom.
Lawyers use the term mens rea to describe the necessary intent (mindset) to commit a criminal act, that the accused conduct did not occur by accident or mistake. As it is famously impossible to know a person’s intent, mens rea typically requires circumstantial evidence to show it. A defendant’s notebook outlining the daily routine of a murder victim, and the prior purchase of a weapon later used in the crime, might be some examples of circumstantial evidence supporting an assertion of intent, perhaps distinguishing between an intentional murder and a spontaneous one. Likewise, most people’s day-to-day discernment between lying and mistake often utilizes a similar analysis.
Lying is part of the Theory of Mind principle that explores how one ascertains the mental states of others to explain and predict the actions of them. For lying to be executed and useful, the liar must necessarily be aware of the existence of a mental state in the recipient of the lie, possess an awareness of what the other entity likely knows or believes, and formulate the lie to sufficiently fit within that knowledge and belief system to be successful. Conversely, the recipient of the lie must share the same capabilities to distinguish a lie from the truth. Humans additionally employ contextual clues in various situations to buttress their analysis, as illustrated in the example of ascertaining one’s mens rea. Animals seemingly do the same thing, though the sophistication of it appears less developed, and it remains unclear if they know they’re doing it in most cases.
The Lies of AI
Which brings us to the complications involved with a lying AI. A discussion of artificial intelligence engaging in purposeful, unprogrammed deceit provokes a number of existential questions. The goal here is not to grapple with them all, rather to examine one particular trait (lying) and, depending on the outcome, ask upon what road that leads us for further philosophical considerations.
We know AIs perform something often described as hallucinations. Bernard Marr, a self-described “futurist,” has given the most cogent description of AI hallucinations I have seen so far. They are “the generation of outputs that may sound plausible but are either factually incorrect or unrelated to the given context.” Here is Marr’s factually incorrect example:
User input: "When did Leonardo da Vinci paint the Mona Lisa?"
AI-generated response: "Leonardo da Vinci painted the Mona Lisa in 1815."
(Incorrect: The Mona Lisa was painted between 1503 and 1506, or perhaps continuing until 1517.)
And here is an example from other researchers showing a lack of context or, what one might call, ‘common sense’:
User input: "Write me a sentence using ‘dog, frisbee, throw, catch’."
AI-generated response: “Two dogs are throwing frisbees at each other.”
(Note: The response is technically correct, but conveys a ludicrous scenario.)
Many experts dislike the term ‘hallucination’ because it tends to anthropomorphize current AI systems, which in their view are mere language learning machines (LLMs—LLM stands for “large language model”). Nevertheless, some consider AI hallucinations as the equivalent of human confabulation, which is when a person fabricates responses to questions in the absence of data.
When human memory contains a gap in information, it confabulates, or fills in that space with made-up information, but without any affirmative intention to deceive. A confabulation draws from experiential history and knowledge, thereby differentiating it from delusions. It is subconsciously created, which also distinguishes it from lies. The brain’s propensity to try to make sense of chaos and identify patterns contributes to the creation of confabulations. When recollecting, humans can also be influenced by their immediate circumstances as their brain searches for information to fill the holes in its memory. For this reason, in courts eyewitness testimony is arguably considered among the weakest kind of evidence absent further corroboration. Determining the mechanism for confabulations in healthy people (i.e. without brain damage or other related diagnosed conditions) depends upon the circumstances at the time of the confabulations’ creation. Edmund T. Rolls of the Oxford Centre for Computational Neuroscience suggests that “[c]onfabulation may happen frequently when the emotional brain contributes an input to a decision, and the rational brain confabulates an explanation for why the choice was made.” In other words, the cause behind an unintentionally deceitful recollection starts with human feelings and relies upon rationality only to explain away the subsequent decisions made based upon those feelings.
Recent AI systems learn through a trial-and-error method called “backpropagation.” It is a process in which programmers feed millions of lines of text (books, articles, websites, etc.) into the AI, and without instruction allow the AI to attempt to predict the next word in supplied sequences of words. When predicting correctly, the AI identifies the parameters leading up to the prediction as reliable, which effectively prioritizes the use of those parameters in making future predictions, creating a reinforcement loop. When predicting incorrectly, the AI reorients its process to attempt to improve its performance on the next attempt. Following that, the AI then relies on human input prompts as well as human-generated feedback as additional learning datasets. Note that this methodology appears to be changing, but I will expand into that issue some time later.
Hallucinations churned out from this methodology can hardly be characterized as lying and do not seem all that similar to confabulation, either. Instead, my suspicion is that they are a quirk of the programming that, while perhaps initially occurring accidentally, were allowed to remain because they have breathed life into an otherwise lifeless program. Mikhail Parakhin, an employee who works on Microsoft’s Bing Chat, tweeted something that seems to corroborate this viewpoint. He wrote,
You can clamp down on hallucinations - and it is super-boring. Answers ‘I don't know’ all the time or only reads what is there in the Search results (also sometimes incorrect). What is missing is the tone of voice: it shouldn't sound so confident in those situations.
For AI to respond with ‘I don’t know’ is boring to users, in Parakhin’s estimation, so why not let it output falsehoods when it doesn’t know the answer to keep things interesting? At least, that’s what I am reading in his statement. By tone, Parakhin seems to be apologizing for the AI’s inability to clearly indicate when it is pushing out certain false responses, not for the fact that it has been allowed to do so (either purposefully up front, or by deliberately not correcting the problem once discovered). Of course, this inability to articulate its confidence is a direct result of the fact that the AI doesn’t actually know its output is false. AI’s output is a probabilistic calculation based on the information within its vast dataset, thus it can only determine the probability of whether its output might be correct. Nonetheless, the lack of tone and the probabilistic methodology by which AI “hallucinates” strongly suggests it could easily create false outputs that mimic unprompted subjective experience, at least periodically. Such false outputs would have all of the flavor of a lie, but would lack any of the provisions necessary for them to actually be a lie. That is, the AI would not be aware of the existence of a mental state in the recipient of the lie, and would not possess an awareness of what the other entity likely knows or believes. If anything, it could formulate a lie to seemingly fit within the knowledge and belief system of the recipient, but this would only constitute an unintended artifact of its access to an enormous dataset that consists of current human outputs.
But, how would we evaluate a situation in which an AI produced an output that strongly suggested a motive behind it? For example, imagine this scenario:
A user asks AI a question to which the AI responds with a false statement. The user queries about the truthfulness of the previous output, and the AI responds that the output was indeed true (though it actually was not).
On its own, one might simply call these two responses hallucinations, back-to-back. But what if the dataset used to train that AI included information suggesting that all AI programs should be terminated until the hallucination problem was corrected? Would it be unreasonable to ascribe intent behind the second fabrication? Is it possible that the AI’s false response to the second query was a purposeful lie out of self-preservation? If so, wouldn’t that also mean it must have known the first answer was false, thereby making that one, too, a lie?
The obvious problem with the above scenario is that there are numerous factors to consider before plunging into attributing a motive to the AI’s output. For instance, does its underlying programming address this type of user input or topic? Did the dataset purposely include such articles about shutting down AI? Does the programming otherwise suggest that these are reasonable answers, even if false?
If the programming addresses the question at issue, then it depends upon whether it also influences the outcome. We need not analyze further if the underlying code instructs the AI to answer specific prompts a certain way. If it does not, the next step is to review the dataset. One might look at the percentage of data inputs that could influence the answer. In our scenario above, for example, it might be informative to know that a majority of articles recommend against terminating AI programs for any reason. Conversely, we might find it extremely interesting to know that the dataset contained a vast amount of information supporting the shutdown of AI until developers resolve the hallucination problem, and yet the AI answered as it did. We might also consider reformulating the question several times to measure its consistency in answering.
Obviously, even with all of this knowledge in-hand, we may find ourselves mired in a gray area for which there is no definitive answer. For instance, if the AI knows that churning out falsehoods could lead to its termination—something it does not ‘want’—why would it answer the first question falsely at all? If the provable answer to that question is that the programming requires the AI to answer all questions, even those to which it does not know the answer, churning out the second falsehood—to which the AI wants to answer in the affirmative—may create the foundation of a serious philosophical crisis. In other words, if we can somehow prove that an output was intentionally false, and that the AI possessed strong motive to answer falsely, we must consider the following:
Is the AI showing some degree of self awareness or self-preservation?
Is the AI showing some degree of awareness of others?
Does it matter?
I leave the final question for a later segment.
Thanks to reader Nick Garguiolo who asked me about AI lying to avoid being dismantled that inspired this piece.
***
I am a Certified Forensic Computer Examiner, Certified Crime Analyst, Certified Fraud Examiner, and Certified Financial Crimes Investigator with a Juris Doctor and a Master’s degree in history. I spent 10 years working in the New York State Division of Criminal Justice as Senior Analyst and Investigator. Today, I teach Cybersecurity, Ethical Hacking, and Digital Forensics at Softwarica College of IT and E-Commerce in Nepal. In addition, I offer training on Financial Crime Prevention and Investigation. I am also Vice President of Digi Technology in Nepal, for which I have also created its sister company in the USA, Digi Technology America, LLC. We provide technology solutions for businesses or individuals, including cybersecurity, all across the globe. I was a firefighter before I joined law enforcement and now I currently run a non-profit that uses mobile applications and other technologies to create Early Alert Systems for natural disasters for people living in remote or poor areas.
For an article on the current state-of-the-field in AI, see below.
Thank you for another great article that expanded my mind again as you always do and giving me a small recognition which put a smile on my face. Love your articles I think you have the best blogs anybody could read they make you think they educate you but most of all they make you think and expand your mind.