Systemic Bias in Algorithms as well as Data
By David Magerman PhD
Facebook recently announced a new program called Casual Conversations, which offers an open source data set of over 40,000 unscripted videos from a diverse set of Facebook users. This announcement is significant, because it represents an acknowledgement of the reality that the vast majority of data sets available to AI researchers for facial recognition and other video processing algorithms underrepresent women, people of color, and other minorities. The data set hopes to right the wrong perpetrated against these group by the introduction and use of AI systems that are far less accurate for non-white non-males and which draw less accurate conclusions about women and people of color.
This program is a welcome step in reversing the damage done to the reputation of AI systems, but, unfortunately, it is only addressing one piece of the problem. Diversified training data is an important part of ML-based AI systems, but it is only a piece of the puzzle. Training models on biased data is likely to lead to biased predictions and biased behaviors. However, training models on unbiased and diversified data may not lead to much better results, in terms of biased performance. The reason has to do with how ML-based algorithms are developed.
If machine learning was simply a matter of math, there would be no issue. Parts of the algorithm are just math. When you are inverting a matrix or maximizing or minimizing an algebraic function, it doesn’t matter what the training data looks like. The mathematical algorithms for solving those problems are just math. There are lots of ways of implementing those algorithms, but they will all give you the same answer.
However, machine learning is not just math. It’s also, in fact mostly, heuristics. Most of the optimization problems being solved in machine learning algorithms are intractable and their solutions can only be approximated. And artificial intelligence systems combine different machine learning processes and models, combined with implementations of (sometimes biased) real-world assumptions, knowledge, and reasoning. AI research involves iterating on initial training and test sets to test different combinations of these components, with different decisions made along the way, to yield a software system that can be applied to larger data sets to train the overall system to perform some real-world task.
Once that process is completed, the structure of the machine learning system is largely written in stone. Applications of that system to new data sets may yield different results, but the biases that were encoded in the training and test sets that were used for the ML and AI software development will carry through to all future uses of those systems.
A great example of this phenomenon in action is in human resources technology. Over the past few years, software companies have developed different AI-driven algorithms for solving problems in HR tech: triaging resumes, rating interview videos, evaluating candidate fit for teams, managing professional development, etc. In the past year, it has been discovered that many of these systems are biased in the ways discussed above, largely because of biases in the data these algorithms are trained on. The problem is, retraining these same algorithms on diversified data hasn’t led to much better performance in terms of bias. The reason, likely, is that the choices made in the design of these software systems encoded the bias of the original data sets in the training algorithms and in the ways the trained models are deployed to make decisions. The only way to truly fix those systems and remove their performance biases would be to wipe the slate clean and rebuild the algorithms from scratch. Simply updating model coefficients and weights using new data won’t undo the systemic bias in the software itself.
In contrast to that, consider one of our portfolio companies, Knockri. Knockri was founded by three co-founders who experienced racism and bias in their job searches. They set out to build an AI-driven system for automating the interview process that was designed to be unbiased. They started with diversified training data that contained objective information that represented the universe of job-relevant behaviors and supplemented it with colloquial behavioral statements taken from a diverse set of applicants. Once they had this diverse data set, only then did they start making the heuristic design decisions that hard-coded the characteristics of their initial training and test data into their algorithms. As a result, Knockri’s human resources solution performs exceptionally well on evaluations that measure bias in real-world applications. This solution shows the importance of understanding how bias baked into training data sets can influence the training process.
Facebook’s Causal Conversations data set is a huge step in helping AI researchers build unbiased, fair software systems to solve real-world problems that don’t disenfranchise large swaths of society. However, it isn’t enough. AI researchers and engineers need to go back to first principles, wipe the slate clean, and build new algorithms developed without the intuitions gleaned from the past decade of machine learning and AI research. In fact, the best thing we can do is find brilliant engineers and scientists who don’t know anything about how to do AI and ML on human behavioral data and throw them at the problem. Without the knowledge of the biased decisions their predecessors made, they can go where the unbiased data takes them and build much fairer algorithms.
Until the AI/ML community takes this radical step, we will be hamstrung by the design decisions we made on biased data, and we will continue to produce flawed software that reinforces the systemic racism that brought us to where we are now.
More News & Insights
Differential goes on the record with Moshe Hecht, an award-winning philanthropic futurist and innovator, reshaping the world of giving through technology and data solutions. The founder and CEO of Hatch, he is a dedicated philanthropist and has been published in Forbes, Guidestar, and Nonprofit Pro.
The WorkplaceTech Spotlight host Hadeel Al-Tashi sits down with Lizzy Kolar, Co-Founder and CEO of Scope Zero to dive into how Scope Zero's Carbon Savings Account (CSA) empowers employees to make affordable home technology and transportation upgrades while aligning with corporate sustainability goals. They discuss how the CSA not only supports environmental and financial wellness for employees but also strengthens a company's commitment to sustainability. Don't miss this opportunity to learn how integrating green benefits can drive meaningful impact within your organization.
Hatch AI, a groundbreaking intelligence platform for nonprofits, announced a $3 million raise in seed funding, led by Differential. Read the full press announcement at the link below.
MIT News: Alumni-founded Pienso has developed a user-friendly AI builder so domain experts can build solutions without writing any code.
On the Record with Nate Cavanaugh, CoFounder & Co-CEO of FlowFi.
In 2021, Nate co-founded of FlowFi, a SaaS-enabled marketplace that connects startups and SMBs with finance experts. FlowFi has raised $10M from top VC firms including Blumberg Capital, Differential Ventures, Clocktower Ventures and Precursor Ventures, and generated 7-figures of annual recurring revenue in its first year.
Nate was nominated to the Forbes 30 Under 30 list for Enterprise Technology.
TECHCRUNCH: FlowFi, a startup creating a marketplace of finance experts for entrepreneurs, closed on $9 million in seed funding.
Blumberg Capital led the investment and was joined by a group of investors including Parade Ventures, Differential Ventures, Precursor Ventures, Special Ventures, 14 Peaks Capital and Cooley LLP.
NASDAQ: Nasdaq TradeTalks: 2024 Cybersecurity Budget Outlook with Almog Apirion, Cyolo.
FINSMES: Retrocausal, a Seattle, WA-based platform provider for manufacturing process management, raised $5.3M in funding.
The round was led by Glasswing Ventures, One Way Ventures, and Indicator Ventures, with participation from existing investors Argon Ventures, Differential Ventures, Ascend Vietnam Ventures, Incubate Fund US, SaaS Ventures, Hypertherm Ventures, Stage Venture Partners, and Techstars.
AI and the Future of Work Podcast: Entrepreneurs wonder what it’s like to be a VC. And VCs without an operating background often don’t understand the grit required to turn an idea into a successful business. The best investors have been successful operators first.
Today’s guest is one of those. Nick Adams founded Differential Ventures in 2017 to invest in B2B, data-first seed-stage companies. Since then, Nick and the team have invested in an impressive group of companies including Private AI, Ocrolus, and Agnostiq.
On the Record with Elissa Ross, CoFounder & CEO of Metafold. Elissa Ross is a mathematician and the CEO of Toronto-based startup Metafold 3D. Metafold makes an engineering design platform for additive manufacturing, with an emphasis on supporting engineers using metamaterials, lattices and microstructures at industrial scales. Elissa holds a PhD in discrete geometry (2011), and worked as an industrial geometry consultant for the 8 years prior to cofounding Metafold. Metafold is the result of observations made in the consulting context about the challenges and opportunities of 3D printing.
Nick Adams on PM360: To get a better grasp on what eventual AI regulations could and should look like, PM360 spoke with Nick Adams, Founding Partner at Differential Ventures. In addition to starting the venture capital firm focused on AI/machine learning in 2018, Adams is also a member of the cybersecurity and national security subcommittee for the National Venture Capital Association and recently briefed members of Congress on AI policy and potential regulation.
BETAKIT: Metafold 3D, which wants to make it easier for manufacturers to design and 3D print complex parts, has secured $2.35 million CAD ($1.78 million USD) in seed funding.
Toronto-based Metafold was founded in 2020 by a group of math, geometry, and architecture experts in CEO Elissa Ross, CTO Daniel Hambleton, and COO Tom Reslinski. Born out of Hambleton’s geometry-focused consulting agency, Mesh Consultants, Metafold sells design for additive-manufacturing software to sportswear and biopharmaceutical companies.
Nick Adams on TECHBREW: For all the pixels spilled about the promises of generative AI, it’s starting to feel like we’re telling the same story over and over again. AI is serviceable at document summarization and shows promise in customer service applications. But it generates fictions (the industry prefers the euphemistic and anthropomorphizing term “hallucinates”) and is limited by the data on which it’s trained.
ATLANTA and TEL AVIV, Israel, June 29, 2023 /PRNewswire/ -- Mona, the leading intelligent monitoring platform, unveils a new monitoring solution for GPT-based applications. The free, self-service offering provides businesses with granular visibility into GPT-based products and valuable insights into costs, performance, and quality.
David Magerman on THEINFORMATION: OpenAI’s stated goal is to develop and promote a software system capable of artificial general intelligence. Toward that end, the company has released systems based on large-language models, which can respond to prompts with fluent conversation on many subjects. ChatGPT, Microsoft’s Bing chatbot and other new systems based on OpenAI’s GPT-3 and GPT-4 models are truly incredible and perform far beyond previous attempts at achieving AGI.
BUSINESSWIRE: Morgan Stanley at Work and Carver Edison, a financial technology company, announced today that Shareworks has joined Equity Edge Online® in offering Cashless Participation® to U.S.-based corporate clients. Since the initial launch of Cashless Participation® on Equity Edge Online®, stock plan participants have purchased more than one million shares1 with Cashless Participation®. Now that Shareworks has also launched the tool, a wider cohort of Morgan Stanley at Work corporate clients will have access.
FOX5 WASHINGTON DC: Nick Adams discusses the pros and cons of Artificial intelligence.
PULSE 2.0: Differential Ventures is a seed-stage venture capital fund that was founded by data scientists and entrepreneurs for data-focused entrepreneurs. To learn more about the firm, Pulse 2.0 interviewed Differential Ventures’ managing partner and co-founder Nick Adams.
IoTForAll: Golioth, a leading developer platform for the Industrial Internet of Things (IIoT), announced open access to a library of new reference designs for embedded engineers to accelerate their time to market, the launch of a Select Partner Program for energy and construction developers, and the completion of a $4.6M round of seed funding led by Blackhorn Ventures and Differential Ventures with participation from existing investors, Zetta Venture Partners, MongoDB Ventures and Lorimer Ventures.
VENTURE BEAT: Data privacy provider Private AI, announced the launch of PrivateGPT, a “privacy layer” for large language models (LLMs) such as OpenAI’s ChatGPT. The new tool is designed to automatically redact sensitive information and personally identifiable information (PII) from user prompts.
DIGINOMICA: What can an early-stage investor tell enterprises about the nascent quantum market?
The quantum tipping point – that fabled moment when quantum technologies break through to commercial adoption at scale – has been questioned in a previous diginomica report…
ENTER QUANTUM: Experts agree that commercial quantum computing at scale could be as much as 10 years away, but this hasn’t stopped investors from betting on it turning a profit in the near future. U.S. tech venture capital company Differential Ventures led the recent $6 million seed extension round for quantum software company Agnostiq which it will use to accelerate further development and commercialization of its enterprise-grade quantum and high-performance computing platform Covalent.
In this Q&A, Differential founding partner David Magerman explains why investors are throwing their weight behind commercial quantum now.
On Tuesday, April 25th, 2023, Differential Ventures hosted a webinar on “Banking in Venture Capital & the Tech Industry”. The panel was moderated by David Magerman, Managing Partner of Differential Ventures, and joined by guest speakers Michael Crook (Chief Investment Officer, Mill Creek Capital Advisers), Samir Kaji (CEO & Cofounder, Allocate), and Matt Streisfeld (General Partner, Oak HC/FT).
AICamp: Augment is a 3 month long accelerator program run by Betaworks, aimed at bringing together the most creative pre-seed & seed stage companies building software powered by AI to augment human activity.
Quantum computing startup Agnostiq Inc. said today, April 5, 2023, it has closed on a seed funding round worth $6.1 million to help accelerate the development of its enterprise-grade quantum and high-performance computing platform.
Sand Hill Road Podcast: Nick Adams joined the Sand Hill Road podcast to discuss the way startups can survive a downturn.
UniteAI: There’s no question that machine learning operations (MLOps) is a burgeoning sector. The market is projected to reach $700 million by 2025 – almost four times what it was in 2020.
Still, while technically sound and powerful, these solutions haven’t generated the expected revenue, which has raised concerns about future growth.
On Thursday, March 16th, Differential Ventures hosted our first webinar entitled Implications of Generative AI in Different Parts of Industry + Society. The panel was moderated by David Magerman, Managing Partner of Differential Ventures, and joined by guest speakers Sharon Zhang (Cofounder & CTO, Personal.ai), Erik Bernhardsson (Founder, Modal Labs), and Adam Oliner (Founder & CEO, Graft).
Welcome to the "Private Placement Perspective," a new pod storm series hosted by Matt Brown. In this first series of 2023, Matt dives deep into the world of venture capital and investing, speaking with investors and CEOs who have successfully helped scale start-ups.
TECHCRUNCH: How to pitch me—7 investors discuss what they’re looking for in March 2023