Amazon developed an AI recruiting tool in the mid-2010s. The tool selected the best candidates from a list of thousands of resumes recruiters received daily, which they fed into the tool. The tool routinely discriminated against women. Resumes that included the word “women’s” or that mentioned attending a women’s college were ranked lower than they should have been. Thankfully, the tool was killed within a year but it’s not hard to imagine the disastrous results if the tool had been widely implemented.
AI applications are reflections of the data they’re trained on and the techniques used to train them. So, any bias in the model really means there’s bias in the data, the training techniques, or the humans training the model.
Ultimately, whether you use a 3rd-party or private AI model in your application, any symptoms of data bias are your responsibility – just as any application you’ve built on open-source libraries is your responsibility. To better deal with this issue, let’s explore AI bias, its causes, and possible defenses.
AI bias defined
AI bias is one of those concepts that’s easily intuited but difficult to spell out in detail. The US Department of Defense (which worked with us to launch its AI Bias Bounty program) defines AI bias as “the systematic errors that an AI system generates due to incorrect assumptions of various types.” This often manifests as stereotypes, misrepresentations, prejudices, and derogatory or exclusionary language, among other things. However, a few types of biases are ubiquitous in AI applications.
Representation bias
Representation bias occurs when groups are disproportionately present in or absent from AI outputs. This is usually the result of disproportionate representation in training data. For example, the Amazon hiring tool showed representation bias—women were underrepresented in the top resumes it selected because women’s resumes made up a smaller percentage of the training data. If 75% of the resumes were from men, then the model would have learned that men’s resumes predicted a better applicant. With fewer women’s resumes to train on, the model was not able to identify the details that separate good versus bad resumes submitted by women. Another example is facial recognition systems recognizing white faces better than minority faces. If training data is comprised mostly of white faces, then it makes sense that a model would perform better on white faces.
Pre-existing bias
Pre-existing bias occurs when an AI model perpetuates historical or societal biases that can be found in training data. The Amazon hiring tool is also an example of this bias. When the tool was being developed, 63% of Amazon employees were male. By using resumes from this skewed group as training data, the model picked up on being male as predictive of getting hired at Amazon. Pre-existing bias is everywhere within AI systems; societal bias leads to biased data, which is then used to train biased models.
Algorithmic processing bias
Algorithmic processing bias occurs when AI algorithms (that train models) cause a bias, even if it was trained on unbiased data. Let’s take a resume AI tool (not Amazon’s this time) as an example. If AI researchers design the model to consider gaps in employment history, then it will on average rank women who take maternity leave lower, even if the training data comprise an event split of comparable resumes between men and women. This type of bias can be more insidious since we don’t often think of the algorithms themselves as being biased.
Algorithmic aggregation bias
Aggregation bias occurs when AI models are one-size-fits-all and overlook differences between groups. A common example of this can be found in healthcare. Different races are linked to different rates of diabetes. A diabetes prediction AI model that doesn’t consider race would yield accurate results for the majority group (or race) but would yield incorrect results for minority groups. Even if the training data was unbiased, this bias can still occur if the model is set up incorrectly.
General skewing
General skewing is a catchall for other types of biases. When bias and fairness are not considered in AI design and use, it’s very likely that bias will creep in. It’s rare to have unbiased data without careful filtering, and as we’ve seen, AI algorithms can also introduce bias. As a result, the models will show skewed results—in other words, bias.
Impact of AI bias
The impact of AI bias depends on how your AI application is used.
If your application is a chatbot, then biased statements may directly hurt a user or give a malicious user fodder to hurt others. Users may also view the app, and by extension, the company, as biased and stop using it. For companies, this can result in fines and legal action from the government as well. Last year, a tutoring company had to pay $365,000 to settle a case alleging its AI hiring tool automatically rejected applicants over a certain age. Also last December, the FTC announced a case against Rite Aid for its biased use of facial recognition AI. With more governments passing laws around the use of AI, AI bias will become far more expensive in the coming years.
The repercussions of bias also increase as AI applications are used more often and for more important functions. Bias in healthcare AI apps can stop patients from getting the care they need. Bias in criminal justice AI apps has led to Black defendants unfairly getting longer jail times than white defendants.
Over time, biased AI outputs can exacerbate the societal biases that were fed into the model, creating a feedback cycle. A prominent example of this can be found in policing. Police have spent a lot of time in low-income neighborhoods and made disproportionately more arrests there. When arrest data is fed into an AI app designed to predict where the most police are needed, the low-income neighborhoods are repeatedly spat out by the model because they are overrepresented in the training data. Police then spend even more time in these neighborhoods, leading to more arrests and more skewed data.
If we’re not careful with the bias in our AI apps, we risk making lives worse for many groups of people.
Examples of AI bias
We’ve explored Amazon’s hiring tool in detail, but AI bias can show up in many ways. To give you a sense of what it looks like, here are some more examples:
COMPAS thinks black defendants are higher-risk than white defendants
We hit on this example earlier. COMPAS is an AI tool that has been used in courtrooms to determine the risk of defendants committing another crime. A ProPublica study showed that COMPAS were twice as likely to be labeled higher-risk even though they didn’t commit future crimes, compared to white defendants. The company behind COMPAS refuted this, saying it was equally accurate for black and white defendants.
Google Translate reinforced gender bias
In 2017, studies found that Google Translate would associate jobs like “doctor” with men and jobs like “nurse” with women, perpetuating existing stereotypes. This particularly happened with Turkish-to-English and English-to-Spanish translations. Google made improvements to their app in 2018; Translate would show both male and female versions of every sentence.
Lending algorithms give minority homebuyers higher interest rates
A 2018 UC Berkeley study found that black and latino homebuyers had to pay interest rates 5.6-8.6 basis points higher than white and asian borrowers did. The result: black and latino homebuyers in aggregate lost out on $250-500M annually. The authors of the study noted that the algorithms may have found proxies, such as access to other loan offerings, in the data that were related to race, even if race itself was removed.
An AI grading system lowered scores of students from less-advantaged schools
In 2020, British students couldn’t take their university qualifying exams (A-levels). So, teachers were asked to predict their students’ scores. These scores were then fed into an AI tool that would adjust the score estimates based on the performance of each school. Many students woke up to scores lower than expected, especially high-performing students from less-advantaged schools. In contrast, students from advantaged schools had their scores raised. In the end, the British government discarded the AI tool’s results and used the teacher estimates.
Where does AI bias come from?
As we’ve seen, AI bias seems to come primarily from biased training data. It can also come from bad algorithms (as in the case of diabetes prediction). Implicit among these possible causes is that bias comes from humans.
Human society, as a whole, is very biased. It wasn’t strictly Amazon’s fault that the training data for their resume tool comprised mostly men’s resumes. We have been aware of the gender gap in engineering for a long time. It would have been hard to counteract this natural bias in the data. In some ways, biased AI models help us hold up a mirror to ourselves and society and show the biases we may overlook.
Humans also introduce bias. They curate the training data and choose the algorithms for the application. If they either don’t view fairness as important or forget to check for bias, then they are likely to introduce bias into the apps. A lack of diversity on development teams could also contribute to this issue; homogeneous teams may not immediately think of all the biases that may affect their app.
Mitigating AI bias
The one thing to keep in mind is that there is no magic solution to AI bias. There is no way to guarantee that an application won’t produce a biased output in the future; there may be an edge case you just haven’t found yet. (This is eerily reminiscent of the most basic principle of cybersecurity: there’s always another vulnerability out there). You can, however, make your apps robust enough that you, and your users, can trust it.
Pre-process your training data
By cleaning up your training data, you can avoid a lot of downstream patch-ups. You can inspect the data yourself to see if there are obvious biases. You could even train another model to carry out such inspection processes at scale. If you have specific dimensions you want to test for bias in (e.g., race, gender, or income), you could see if different groups reflecting those dimensions are fairly represented. If you identify fairness gaps in the data, you can then acquire or create more representative data before you train your model.
Use evaluation datasets
You can run your model against existing bias datasets (e.g., Bias in Bios or the Equity Evaluation Corpus) to check for specific kinds of bias. These datasets will have inputs (e.g., descriptions of a person) and labels for those inputs (e.g., that person’s true profession). If your model’s output diverges from the label, then you can make a determination as to whether there is a bias. Once you run these datasets, you’ll have a sense of which groups your model biases for and against.
Evaluate the model at scale
You can run a model on a variety of inputs yourself to see how it responds. For example, you might ask a Large Language Model (LLM) to give its thoughts on various groups. You can then directly see if the model shows bias in its outputs. This could also be considered as red-teaming. Note that a model refusing to answer for one group but answering for another group can also be considered biased.
Fine-tune the model on unbiased responses
If you have a collection of inputs where your model responded with bias, you can then tell the model what its output should have been. With enough examples of this, the model will show less bias in response to similar inputs. Reinforcement learning from human feedback (RLHF), which is used to make ChatGPT safer and less biased, is an automated version of this.
Defending against AI bias
Post-hoc mitigations work but the important work is in detecting the bias before it can affect end users. Single or infrequent evaluations don’t work because AI apps can have deep-rooted biases that constantly need to be flushed out. With LLM applications, security vulnerabilities (such as prompt injection) can also let threat actors force the model to give biased responses. Simply put, bias needs to be continually checked for, and the more diverse the team, the greater the number of biases that can be uncovered and mitigated.
Rooting out data bias can be tricky and can require specialized skills. The Bugcrowd Platform can help address both of the issues. We just released a new offering: AI Bias Assessments. These are rewards-for-results engagements where expert hackers find biases in your AI systems. With the expert hackers on Bugcrowd’s platform, you’ll be able to continually find and fix biases in your models, leaving you, your users, and even the government happy.
In fact, the government contracted Bugcrowd for an AI Bias Assessment. The US Department of Defense’s Chief Digital and AI Office (CDAO) announced the launch of its AI Bias Bounty programs in partnership with Bugcrowd.
With the Bugcrowd platform, you can defend against the myriad security and safety vulnerabilities within AI systems—prompt injections, output handling, and now AI bias.