An analysis of 630 billion words published online suggests that people tend to think of men when using gender-neutral terms, a sexist bias that could be learned by AI models
When people use gender-neutral words like “people” and “humanity” they tend to be thinking of men rather than women, in reflection of sexism present in many societies, according to an analysis of billions of words published online. The researchers behind the work warn that this sexist bias is being passed on to artificial intelligence models that have been trained on the same text.
April Bailey at New York University and colleagues used a statistical algorithm to analyse a collection of 630 billion words contained within 2.96 billion web pages gathered in 2017, including informal text from blogs and discussion forums as well as more formal text written by the media, corporations and governments, mostly in English. They used an approach called word embedding which derives the intended meaning of a word by the frequency it occurs in context with other words.
They found that words like “person”, “people” and “humanity” are used in contexts that better match the context of words like “men”, “he” and “male” than those of words like “women”, “she” and “her”. The team says that because these gender-inclusive words were used more similarly to those that refer to men, people may see them as more male in their conceptual meaning – a reflection of male-dominated society. They accounted for the fact that men may be over-represented as authors in their dataset, and found it didn’t affect the result.
One open question is to what extent this is dependent on English, says the team – other languages such as Spanish include explicit gender information that could change the results. The team also didn’t account for non-binary gender identities or differentiate between the biological and social aspects of sex and gender.
Bailey says that finding evidence of sexist bias in English is unsurprising, as previous studies have shown that words like “scientist” and “engineer” are also considered to be more closely linked with words like “man” and “male” than with “woman” and “female”. But she says it should be concerning because the same collection of texts scoured by this research is used to train a range of AI tools that will inherit this bias, from language translation websites to conversational bots.
“It learns from us, and then we learn from it,” says Bailey. “And we’re kind of in this reciprocal loop, where we’re reflecting it back and forth. It’s concerning because it suggests that if I were to snap my fingers right now and magically get rid of everyone’s own individual cognitive bias to think of a person as a man more than a woman, we would still have this bias in our society because it’s embedded in AI tools.”
Journal reference: Science Advances, DOI: 10.1126/sciadv.abm2463
More on these topics: