“There’s no gender bias in our process for extending credit,” Goldman Sachs CEO David Solomon insisted in a recent TV interview. “We don’t ask, when someone applies, if they’re a man of a woman.”
The company’s role in the Apple Card’s launch triggered an investigation by Wall Street regulators, following viral tweets from men alleging that they were offered far greater credit limits than their spouses.
Apple Co-Founder Steve Wozniak himself reported having this experience, and the tweets in question have since been shared thousands of times.
“The Apple case is a good one,” says Dan Power, MD, Data Governance at financial services firm State Street. “You have to treat the algorithm in just the same way that you would debug it in order to debias it.”
Solomon says the bank worked with a third-party consultant to ensure there were no unintended biases on its credit platforms. But for many data and analytics leaders, this story serves as a timely reminder that data bias can have serious consequences.
Now, many are looking again at the impact the phenomenon may be having on their own data models and algorithms.
The Unintended Consequences of Data Bias
Data bias can be easily misunderstood, as it can be caused by several phenomena and means different things in different contexts.
Sampling bias occurs when the dataset used to generate a particular insight isn’t representative of the population it’s supposed to generate insights about. Meanwhile, building feedback loops into models or omitting key variables that would influence the outcome of a model can also be a source of bias.
Importantly though, the issue often doesn’t lay with the models themselves. Algorithms frequently function as mirrors, reflecting back societal biases that are baked into the data that underpins them.
“The problem with some of these models is, they’re complicated enough that no one really knows how they work,” Power quips. “Just as we say, ‘Garbage in, garbage out’, it’s, ‘Bias in, bias out’.”
The Apple Card example shows how these biases can lead to gender discrimination, and it’s easy to see how similar issues could lead to racial profiling, age discrimination and more besides.
The impact of these hidden biases could have implications for virtually any machine assisted decision that’s made in the financial services industry.
Power concludes: “With Apple and Goldman Sachs, if they’d built a ‘gender’ field that they didn’t allow the model to see and used it at the end to confirm that their decisions weren’t biased, that would have been helpful.”
How to Combat Bias in Data Science
Many data scientists believe it’s just a matter of time until governments around the world pass new regulations to ensure the ethical use of data. Some countries and states are already legislating to ensure transparency about how certain algorithms arrive at their decisions.
However, cases like the Apple Card launch illustrate why enterprises shouldn’t wait for new laws before they act. It’s becoming increasingly clear that regulatory compliance and ethicality are two different things, and unethical behavior can hurt a company’s reputation.
“Because we can build all these things, we will build all these things,” says Dan Costanza, Chief Data Scientist in the investment bank at Citi. “How do we get the regulatory environment in step with that?”
He adds: “There are a number of cases where debiasing is extremely important, but the regulatory framework actually blocks you from being allowed to do it.”
Practically all big datasets will contain some kind of bias. But how problematic those biases are will depend on what the data is used for.
“You must treat the algorithm in just the same way that you would debug it to debias it” – Dan Power, MD, Data Governance, State Street
The first step towards eliminating significant biases is often to identify potential ‘gaps’ in a given dataset that can be balanced out to make it more representative of the population it’s supposed to depict.
“In the cases of, for example, facial recognition algorithms, they’re pretty good at white males,” Power explains. “They’re not so good at women of color. So, in that case, the answer is actually that you need more data.”
It’s also important to consider a plurality of perspectives and views while developing a predictive model and take steps to search for and eliminate biases that manage to creep in. This can help staff to identify their blind spots and avoid building their own biases into the software.
“The team needs to be diversified,” says Connie Zhang, US Data and Analytics Officer, Agricultural Bank of China. “Unfortunately, a lot of data people are [of] a similar type.”
Finally, teams must test and retrain data-driven models regularly to ensure their continued accuracy over time.
“We’re going to reach a place where it won’t matter if it was deliberate or not,” warns Power. “There will be regulations that say before you put a new product out on the marketplace, if it’s of ‘this’ type and it’s affecting ‘these’ constituencies, then you have to have a debiasing step.”
This is an extract from our brand new report, Financial Services Data 2020. Claim your copy today to discover how the world’s top financial services data leaders are advancing their organizations’ analytics capabilities.