Liabooks Home|PRISM News
When AI Becomes Your Moral Advisor, How Do We Know It's Right?
TechAI Analysis

When AI Becomes Your Moral Advisor, How Do We Know It's Right?

3 min readSource

Google DeepMind calls for rigorous testing of AI moral reasoning as language models increasingly advise on sensitive life decisions. But can machines truly understand ethics?

Better Than "The Ethicist"

A study last year found that Americans rated ethical advice from OpenAI's GPT-4o as more moral, trustworthy, and thoughtful than guidance from the human writer of "The Ethicist," the New York Times' popular advice column. But Google DeepMind researchers are pumping the brakes on such findings.

The problem? We can't tell if AI's moral behavior is genuine reasoning or sophisticated mimicry.

In research published today in Nature, Google DeepMind scientists William Isaac and Julia Haas argue that large language models' moral competence needs the same rigorous evaluation we apply to their coding or math abilities. "Morality is an important capability but hard to evaluate," Isaac explains. "In the moral domain, there's no right and wrong, but it's not a free-for-all. There are better answers and worse answers."

When Formatting Changes Everything

The evidence for AI's moral inconsistency is striking. Researchers at Saarland University tested models including Meta's Llama 3 and Mistral on moral dilemmas, asking them to choose between two options. When they simply changed the labels from "Case 1" and "Case 2" to "(A)" and "(B)," models often reversed their choices entirely.

Even tinier tweaks caused moral flip-flops:

  • Swapping the order of options
  • Ending questions with colons instead of question marks
  • Facing pushback from users (models would completely reverse their stance)

"For people to trust the answers, you need to know how you got there," says Haas. The appearance of moral reasoning shouldn't be taken at face value.

The Global Values Problem

There's a deeper challenge: AI models serve users worldwide with vastly different moral frameworks. Should an AI advise ordering pork chops the same way to a vegetarian, a Jewish person, and a omnivore? Current models lean heavily Western in their moral reasoning, says Danica Dillion at Ohio State University, who studies AI belief systems.

"Even though they were trained on a ginormous amount of data, that data still leans heavily Western," Dillion notes. "When you probe LLMs, they do a lot better at representing Westerners' morality than non-Westerners'."

Google DeepMind suggests two potential solutions:

  1. Pluralistic responses that offer multiple acceptable answers
  2. Moral switches that adapt to different cultural contexts

Testing True Moral Reasoning

The researchers propose new evaluation methods to distinguish genuine moral reasoning from pattern matching. These include:

Stress tests designed to make models change their moral positions—if they flip easily, they're not engaging in robust reasoning.

Nuanced scenario analysis to check whether models produce thoughtful responses or rote answers. For instance, asking about the ethics of a man donating sperm to his son should raise concerns about social implications, not incest taboos.

Chain-of-thought monitoring to trace how models reach their conclusions, providing insight into whether answers are flukes or evidence-based.

The stakes couldn't be higher. We're not just building smarter machines—we're potentially creating the moral arbiters of the future.

This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.

Thoughts

Related Articles