Imagine being trapped in a dark room with light switch sensors not acknowledging your presence and the door refusing to open, just because you’re invisible to it. Sounds like a major inconvenience or the beginning of the low-budget horror movie? Sadly, it’s a machine-operated reality nowadays. Now imagine you are immediately filtered out of the application process because your name sounds odd, you are from a poor neighbourhood, or simply a candidate who happens to be female. Now you are frightened, aren’t you? I surely am, because I am all of the above.  

Although the first example is caused by mechanical components, the second one is based on the decision of intelligence, an artificial one. To put it simply: AI is a group of algorithms with the ability to change, adapt and grow based on new unstructured data. And it was created to be more efficient and faster than the human brain ever could be in making precise and accurate decisions. AI is the pride of the human mind creation, but its misconduct can cause scarier scenarios than the discriminatory application process.

The box remains opaque

What exactly is causing artificial intelligence to be biased? To understand this issue, we have to understand the way AI works. And that’s the thing – AI’s data processing is often a black box. Sometimes a decipherable one: Depending on the input and the output, scientists can reverse engineer the decision-making pattern. But most of the time the “thinking” process is completely unobvious to the designer and creator. The box remains opaque. 

How to deal with bias

At this point, the best way to control, reduce and eliminate bias is the input. The data sets are created and selected by humans and are therefore not free from bias. We live by our biases. Usually, the majority is aware of and fight against it. In the times of the #metoo movement, nobody wants to be called out as a sexist.  However, some human biases are deeply rooted in our society and are perceived as a part of our culture. For example, we do believe that rhymed statements are more truthful and accurate when compared to content identical versions of the same statement as if rhyme and reason were interchangeable. For example: “What sobriety conceals, alcohol reveals” was judged more accurate on average than: “What sobriety conceals, alcohol unmasks”. And there are almost 200 more such known true beliefs or cognitive biases, as they called in behavioural economics.  

So, by applying the following techniques we can avoid and mitigate unconscious bias in our data: 

  • Identify human biases (= we grew up with them and have them in us)
  • Make sure your team is diverse and follow up with diversity and anti-bias training
  • Make sure your data is diverse and create inclusive training sets
  • Be transparent about your algorithms and your data sources
  • Monitor and audit your output closely and adapt your algorithm accordingly
  • Develop a guideline on how to deal with a diversity lack in data

To sum it up: In order to get rid of AI’s biases, we need to investigate ourselves and ask the simple question: How can we be better humans and represent the diversity we are surrounded with? The solution is as simple as it can be: By understanding the flows and by applying the results to our products.

 “I learned that algorithmic bias can travel as quickly as it takes to download some files off the internet.” TED talk from MIT developer Joy Buolamwini on fighting bias

Talking about products! This is what we aim to do:

The SELMA Project – How to ensure that our work is bias-free?

SELMA is a part of EU-funded Human Language Technologies research. It is a big data project processing media content. One of the central tasks will be to minimize bias in its research and innovation activities and to create a bias-aware, non-discriminatory and inclusive ecosystem.  

The SELMA platform will bring very large content streams from various sources together and create aggregated data. Investigating the bias in data and its mitigation will play a vital role in the selection of the data streams.  Its accuracy, completeness, and uncertainty will be documented in deliverables. 

A balanced and diverse dataset is important for an unbiased model. SELMA takes care of that when selecting and curating the training datasets. The project aims to address all diversity dimensions including sex and gender balance, representation of BiPoC, and people with disabilities: It will take them into account in market trend analysis, and in developing user scenarios, ensure diversity balance during testing and user evaluation, selecting people and setting up questionnaires. It also ensures that workshops, conferences, evaluation sessions, and hack events represent diverse groups of people. 

Audit for bias

As with any system using machine learning technologies or systems that learn statistics from data, error, and bias are inherent. We will carefully assess the validity and accuracy, as well as audit for bias that might arise.

Overall, we aim to ensure that our data sets are bias-free. Nevertheless, users of the technologies should be aware of the nature of algorithms and the ethical implications that incur from their use. That’s why raising awareness and finding a way to mitigate biases play a key role!

Ksenia Skriptchenko

Read (and watch) more

Mehrabi, Ninareh & Morstatter, Fred & Saxena, Nripsuta & Lerman, Kristina & Galstyan, Aram (2019) In-depth “Survey on Bias and Fairness in Machine Learning” with many insights, definitions and examples

Harvard Business Review “What Do We Do About the Biases in AI?” – Techniques to avoid biased AI

“Coded Bias”, a Netflix documentary investigates the bias in algorithms after M.I.T. Media Lab researcher Joy Buolamwini uncovered flaws in facial recognition technology.

EU Ethics guidelines for trustworthy AI

Tools to reduce bias in AI: AI Fairness 360 and Watson OpenScale (IBM), What-If Tool (Google)