Systemic Bias in Algorithms as well as Data

By David Magerman PhD

Facebook recently announced a new program called Casual Conversations, which offers an open source data set of over 40,000 unscripted videos from a diverse set of Facebook users. This announcement is significant, because it represents an acknowledgement of the reality that the vast majority of data sets available to AI researchers for facial recognition and other video processing algorithms underrepresent women, people of color, and other minorities. The data set hopes to right the wrong perpetrated against these group by the introduction and use of AI systems that are far less accurate for non-white non-males and which draw less accurate conclusions about women and people of color.

This program is a welcome step in reversing the damage done to the reputation of AI systems, but, unfortunately, it is only addressing one piece of the problem. Diversified training data is an important part of ML-based AI systems, but it is only a piece of the puzzle. Training models on biased data is likely to lead to biased predictions and biased behaviors. However, training models on unbiased and diversified data may not lead to much better results, in terms of biased performance. The reason has to do with how ML-based algorithms are developed.

If machine learning was simply a matter of math, there would be no issue. Parts of the algorithm are just math. When you are inverting a matrix or maximizing or minimizing an algebraic function, it doesn’t matter what the training data looks like. The mathematical algorithms for solving those problems are just math. There are lots of ways of implementing those algorithms, but they will all give you the same answer.

However, machine learning is not just math. It’s also, in fact mostly, heuristics. Most of the optimization problems being solved in machine learning algorithms are intractable and their solutions can only be approximated. And artificial intelligence systems combine different machine learning processes and models, combined with implementations of (sometimes biased) real-world assumptions, knowledge, and reasoning. AI research involves iterating on initial training and test sets to test different combinations of these components, with different decisions made along the way, to yield a software system that can be applied to larger data sets to train the overall system to perform some real-world task.

Once that process is completed, the structure of the machine learning system is largely written in stone. Applications of that system to new data sets may yield different results, but the biases that were encoded in the training and test sets that were used for the ML and AI software development will carry through to all future uses of those systems.

A great example of this phenomenon in action is in human resources technology. Over the past few years, software companies have developed different AI-driven algorithms for solving problems in HR tech: triaging resumes, rating interview videos, evaluating candidate fit for teams, managing professional development, etc. In the past year, it has been discovered that many of these systems are biased in the ways discussed above, largely because of biases in the data these algorithms are trained on. The problem is, retraining these same algorithms on diversified data hasn’t led to much better performance in terms of bias. The reason, likely, is that the choices made in the design of these software systems encoded the bias of the original data sets in the training algorithms and in the ways the trained models are deployed to make decisions. The only way to truly fix those systems and remove their performance biases would be to wipe the slate clean and rebuild the algorithms from scratch. Simply updating model coefficients and weights using new data won’t undo the systemic bias in the software itself.


In contrast to that, consider one of our portfolio companies, Knockri. Knockri was founded by three co-founders who experienced racism and bias in their job searches. They set out to build an AI-driven system for automating the interview process that was designed to be unbiased. They started with diversified training data that contained objective information that represented the universe of job-relevant behaviors and supplemented it with colloquial behavioral statements taken from a diverse set of applicants. Once they had this diverse data set, only then did they start making the heuristic design decisions that hard-coded the characteristics of their initial training and test data into their algorithms. As a result, Knockri’s human resources solution performs exceptionally well on evaluations that measure bias in real-world applications. This solution shows the importance of understanding how bias baked into training data sets can influence the training process.

Facebook’s Causal Conversations data set is a huge step in helping AI researchers build unbiased, fair software systems to solve real-world problems that don’t disenfranchise large swaths of society. However, it isn’t enough. AI researchers and engineers need to go back to first principles, wipe the slate clean, and build new algorithms developed without the intuitions gleaned from the past decade of machine learning and AI research. In fact, the best thing we can do is find brilliant engineers and scientists who don’t know anything about how to do AI and ML on human behavioral data and throw them at the problem. Without the knowledge of the biased decisions their predecessors made, they can go where the unbiased data takes them and build much fairer algorithms.

Until the AI/ML community takes this radical step, we will be hamstrung by the design decisions we made on biased data, and we will continue to produce flawed software that reinforces the systemic racism that brought us to where we are now.


More News & Insights


Previous
Previous

On the Record with Patricia Thaine, CEO at Private AI

Next
Next

On the Record with Sharon Zhang, CTO at Human AI Labs