Why Data Scientists Should Make a Commitment to Diversity

Machine Learning & AI

A more fair and safe society, as well as better products and services, can be enabled if the data science industry makes a commitment to hiring and cultivating diverse talent.

Guest Commentary, Guest Commentary

December 7, 2018

6 Min Read

Scandals at Facebook, Google, and other multinationals have prompted a lot of hard questions about the ethics around the use of data. It’s easy to understand the concern. A staggering, rising amount of our personal data is being fed into algorithms that determine which campaign ads we’re shown, who is picking us up in their cars, and which potential mates we’re matched with on dating apps, which means the potential to misuse data is increasing by the day.

While politicians point fingers and business executives issue mea culpas, there’s one group of people at the true center of this issue: data scientists, whose job it is to build algorithms that turn data into recommendations, products, and services. As a data scientist myself, and one who helps train new data scientists, I’ve seen how the next generation of data science talent is set to tackle our society’s greatest challenges. I’ve also seen how our industry’s incredible potential for good can be twisted, often unintentionally, by the lack of representation in our workforce.

Given the critical role data scientists play in how data is used (or misused), it’s imperative that we fairly represent everyone impacted by our work, which is to say, all of us. In order to ensure that this happens, the data science industry must be absolutely committed to hiring and developing diverse talent. Let’s jump into the reasons why.

1. Diversity enables better representation of the broader population. Gender doesn’t determine whether someone is a better programmer, and ethnicity should have no bearing on whether someone is a better hedge fund manager, so the disproportionate skewing of who gets access to these high income careers reflects a societal bias. Here’s a stark example: when blacks make up 13% of the US population but represent just 1% of hires at our leading tech companies (look no further than Google and Amazon for examples), something’s wrong.

Of course, tech companies aren’t responsible for geographic segregation, poor school systems, gaps in generational wealth accumulation, and influential stereotypes that lead children to like or dislike academic areas, all of which feed into this lack of diversity. But the culture of tech is one that famously fights (and wins) against structural hurdles like decades-old regulations or social expectations. If we as an industry can “move fast and break things” when it comes to new features and their bottom line, we should try to do the same when it comes to diverse representation.

When it comes to data scientists, we’ve held the title for “best job in America” for three years in a row, so we have a special obligation to lead the way. An ideal to aim for would be a population of data scientists that is roughly proportionate in gender, ethnicity and other demographic measures with the broader US population.

2. Diversity improves the products and services we create. What data scientists create inevitably becomes embedded in our everyday social lives, from Uber and Lyft pick-up paths to the algorithms that return answers when you ask Alexa for help with a voice command. It’s not hard to see that producing useful social products and services requires us to have a social understanding of those who use them. As a rule, this understanding is improved when diverse people representing diverse perspectives are brought into the development process.

There have been more than a few high-profile screw-ups highlighting how a lack of diversity damages not only products and services, but also the reputations of the companies that created them. Google Maps, for example, used to pronounce Malcolm X Street “Malcolm the Tenth,” as if he were a British monarch, reflecting the Eurocentric worldview of those behind the platform. Facebook’s name policy locked the accounts of trans and gender non-conforming users, including at least one Facebook employee. Today, the long history of racial and gender bias in face recognition technology persists. These are just a few highlights (or rather lowlights), but they illustrate that the social impact of each could have been improved through diversity.

Let’s zoom out for a second and consider the bigger picture. It’s in any organization’s interest to prioritize diversity. Better products and services that better reflect a broader population reach more people, which means a better reputation, more revenue, and higher impact. Additionally, as our products and services become even more social, ensuring our data science workforce is diverse at every level -- from the people writing the code to those deciding which new features to develop and beyond -- will only become more important.

3. Diversity helps counter algorithmic violence. There’s a hidden danger lurking within the work data scientists do: algorithmic violence. The term, coined by Mimi Onuoha, refers to the ways that algorithms or automated decision-making systems inflict harm by preventing people from meeting their needs. Civil and mechanical engineers commonly take courses on the ethical challenges involved in designing physical things, so why shouldn’t data scientists learn to think critically about how our work can lead to harmful consequences too?

It can be hard to conceptualize algorithmic violence, so let’s look at an example. Ex-Googler Guillaume Chaslot, who worked on YouTube’s massive recommendation engine, explains how vulnerable populations can be harmed by the video platform’s seemingly objective automated recommendations. Tasked with maximizing users’ viewing time, the recommendation engine ignores the effects of disturbing videos on vulnerable people, including inappropriate content targeted at children or created to promote conspiracy theories. These can be damaging at the micro-level (messing with kids’ brains) and macro-level (influencing elections).

Minority populations have high concentrations of vulnerable people, so it’s not difficult to figure out that algorithmic violence can disproportionately affect them. In fact, one of the most insidious things about algorithmic violence is that its effects are often hidden except to those on the receiving end, meaning companies without a diverse workforce are unable to recognize and prevent it. Harm brought about by “objective” automated systems is a problem that is virtually guaranteed to grow as data science becomes more powerful, so prioritizing diversity is one of the many steps we need to take to counter it.

A fairer society, better products and services, and the prevention of harm to the most vulnerable among us: All of these and more can be brought about if we as an industry make a commitment to hiring and cultivating diverse talent in data science. With the eyes of the world on us, there’s no better time to start than now.

Sophie Searcy is a Senior Data Scientist at Metis, which provides full-time immersive bootcamps, part-time professional development courses, online learning, and corporate programs to accelerate the careers of data scientists.

About the Author(s)

Guest Commentary

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT professionals in a meaningful way. We publish Guest Commentaries from IT practitioners, industry analysts, technology evangelists, and researchers in the field. We are focusing on four main topics: cloud computing; DevOps; data and analytics; and IT leadership and career development. We aim to offer objective, practical advice to our audience on those topics from people who have deep experience in these topics and know the ropes. Guest Commentaries must be vendor neutral. We don't publish articles that promote the writer's company or product.

See more from Guest Commentary

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice