“Models do not just predict, but they can make things happen,” data scientist Rachel Schutt told The New York Times technology reporter Steve Lohr recently, adding “That’s not discussed generally in our field.”
Schutt (firstname.lastname@example.org) has since taken a senior research scientist position at Johnson Research Labs, a start-up focused on data science research, consulting and teaching “with an aim to have a positive impact on the world.” She was a senior statistician at Google Research when CIMS IMR interviewed her, as well as an adjunct professor at Columbia University, where she taught an introductory course in data science last fall. She is co-authoring Doing Data Science, scheduled for Spring 2013 publication, and holds a Ph.D. in statistics.
In her Dec.30, 2012 interview with Lohr, Schutt asserted that model makers would better serve society if they considered the ethical dimensions of their work and not only the math. That led CIMS IMR to ask her to elaborate on the ethical dimensions of work in Big Data:
First of all, the models may either end up being built into public policies, or key decisions might be made based on mathematical models. Second, sometimes they’re built as part of a product that’s data driven so it could be a credit scoring model or a model that determines whether or not someone gets health care or a mortgage, or an online recommendation system.
These models are actually determining parts of human lives, and I think anytime you’re doing anything that impacts human beings you should be considering the ethical implications, like what impact is this going to have on people’s lives, what impact is it going to have on the world?
Could you give an example of a specific model that made things happen but society would have been better served if the people making the model had taken such ethical dimensions into account?
I’m not an expert on the financial crisis, but it’s clear that a lot of mathematical modeling is involved in trading and its underlying algorithms. Often there are certain metrics that are being optimized for, and those metrics may be profit, for example, and that sort of fills the goal without consideration for any other types of metrics that in some cases are hard to measure, like the impact on people’s lives or jobs lost or foreclosures, or a number of other dimensions you might want to minimize.
It’s a complicated story so I can’t say there’s an exact cause and effect, but it’s pretty clear that ethics were not at the forefront of the decision-making process. It’s not what people are optimizing for, that’s for sure.
What exactly would you like a modeler in one of those areas do—how might she bring ethics into a financial model, for example?
It may be that the ethical aspects can’t be included in the model but need to exist outside. In that case, the outcome of the model and the ethical dimensions need to be weighed, and this would be at the discretion of the decision-maker, who may not be the modeler. But it would be the responsibility of the modeler to point out any ethical dimensions he or she feels are not being captured by the model.
Also, it seems plausible to me that a modeler could create a variable or metric that captured some sort of ethical rating (much like bond ratings if done properly) for companies. Of course, this would be at human discretion. There may be some measurable aspects to this—giving to charity comes to mind—but those would be hard to measure and could be potentially gamed if not done carefully.
I presume you discussed issues like this in the course you taught last semester. How did your students respond?
Students are usually fairly idealistic and so it resonates with them. I think in a lot of cases what bothered them was that people who study data science or statistics or math, or who have these quantitative abilities, end up going into fields they don’t think are necessarily doing good for the world. Why, for instance, aren’t they going off and curing cancer rather than going into finance or calculating click-through rates for online advertising?
We all need to pay the rent. Students were concerned about the trade-off between using data to make the world better versus jobs that pay the bills. There is a perception that right now the jobs that are out there and use data are not necessarily jobs that do good for the world.
Do the modelers at Google take those ethical dimensions into account?
It’s a big part of the Google philosophy ¾ don’t be evil ¾ and I honestly do feel that it goes all the way from the upper-rank executives down to how we make decisions. I can’t give specific examples, but there are cases where I’m sure you could make more money but it’s not good for the user.
In your final semester blog you urged the future data scientists in your class to “embrace the practical and the profound.” What did you mean by “the profound?”
I mean some of those philosophical issues we’re discussing—for example, the ethical dimensions. But also that data science itself could be a more profound scientific discipline, where data is the basic unit of analysis, in the sense that atoms or particles are basic units in other disciplines.
Data science has been this new emerging field and in some ways it was easy to define it by a set of practical skills you need to know, like programming, but there are much deeper issues around the philosophy and the science of data. Statistics and math and computer science have very deep aspects to them; those deep aspects could be considered part of data science as well.
You also said you wanted your course to cultivate the more artistic side of data science.
Yes, because when you’re analyzing data there’s an infinite number of choices you end up having to make and there’s an art as much as science to what you’re doing: decisions about the model, decisions about how you’re going to evaluate your model, decisions about how your model is going to impact the world. Sometimes it’s not a rigorous science; it involves a human being making decisions based on some type of intuition. It’s a creative process, much like an artist.
“Research at Google” talked with you about the field of data science and what motivates you to educate the next generation of problem solvers (research.google.com, Oct. 25,2012). In the interview you said, “We need a new generation of problem solvers and scientists who know how to handle and find meaning in massive datasets, and do so ethically and with integrity.” You added that you were proud to be educating them.
Was the course you taught part of Columbia’s new Institute for Data Sciences and Engineering?
Not directly because there’s no academic program yet offered by the Institute, so you could think of mine as a pilot class. But I’m affiliated with the Institute as part of the education committee.
In that case I would expect ethics to become part of the Institute’s curriculum.
Oh, yes. We’d like to see that happen; I’d like to see that happen.
Final question, where does your passion for ethics and integrity in this field come from? Was there something in your background or your experience?
I’m not sure. My mother is a social worker; her father was a social worker; my grandmother was a teacher, and so maybe it’s just that the values they’ve had have gone to me to try to make the world better. Also, I’m Jewish, and one of our philosophies is Tikun Olam: to heal the world. Maybe that’s part of it.
But I never studied ethics specifically. I don’t consider myself an expert in ethics by any means. It’s just more an approach to life, or how I would like to live.