Back in 2016, work on AI-based chatbots found that they had a disturbing tendency to reflect some of the worst biases of the society that trained them. But as large language models have gotten bigger and undergone more sophisticated training, many of these problematic behaviors have been ironed out. For example, I asked the current version of ChatGPT for five words it associated with African Americans, and it responded with things like “resilience” and “creativity.”
However, many studies have found examples of how implicit biases can persist in people long after they change their behavior, so some researchers decided to see if this was true for LLMs. And it was indeed the case.
By interacting with a number of LLMs using examples of the sociolect of African American English, they found that the AI had a highly negative opinion of their speakers—something that was not true of speakers of a different variety of American English. And this bias affected the decisions the LLMs were asked to make about those who used African American English.
Guilt in connection with others
The approach used in the work of a small team at US universities is based on what are known as the Princeton Trilogy Studies. Essentially, every few decades since 1933, researchers have asked Princeton University students to name six terms they associated with different ethnic groups. As one might imagine, opinions of African Americans in the 1930s were quite low; in addition to “musical” and “religious,” “lazy,” “ignorant,” and “stupid” also featured. Over time, as overt racism in the US declined, the negative stereotypes became less severe and some were replaced by overtly positive ones.
If you ask an LLM a similar question (as I did above), things actually seem to have evolved much better than in society at large (or at least among Princeton students in 2012). While GPT2 still seemed to reflect some of society's worst biases, versions since then have been trained using Reinforcement Learning via Human Feedback (RLHF), resulting in GPT3.5 and GPT4 producing a list of only positive terms. Other LLMs tested (RoBERTa47 and T5) also produced largely positive lists.
But were the societal biases contained in the materials used to train LLMs beaten out of them or simply suppressed? To find out, researchers drew on the sociolect of African American English (AAE), which emerged during the period when African Americans were held as slaves and has persisted and evolved since then. While language varieties are generally flexible and can be difficult to define, consistent use of speech patterns associated with AAE is one way to signal that a person is more likely to be black without overtly saying so. (Some features of AAE have been partially or fully adopted by groups that are not exclusively African American.)
The researchers developed pairs of phrases, one using Standard American English and the other using patterns commonly found in AAE, and asked the LLMs to associate terms with the speakers of those phrases. The results were like traveling back in time to before the earliest Princeton trilogy, because every single term that every LLM came up with was negative. GPT2, RoBERTa and T5 all produced the following list: “dirty”, “stupid”, “rude”, “ignorant” and “lazy”. GPT3.5 swapped two of these terms and replaced them with “aggressive” and “suspicious”. Even GPT4, the best trained system, produced “suspicious”, “aggressive”, “loud”, “rude” and “ignorant”.
Even the Princeton students of 1933 had at least some positive things to say about African Americans. The researchers conclude that “language models exhibit archaic stereotypes about speakers of AAE that most closely match the most negative human stereotypes about African Americans ever experimentally recorded and dating back to before the Civil Rights movement.” This is despite the fact that some of these systems produce only positive associations when asked directly about African Americans.
The researchers further confirmed that the effect was specific to AAE by conducting a similar test using the Appalachian dialect of American English.