Bracing for the data deluge

Rice’s data science initiative includes faculty hires, new minor, programs

As Tropical Storm Harvey was finishing its 13 trillion-gallon downpour, Rice University leaders made a crucial decision to survey faculty, staff and students to find out whether they had been flooded and if they needed help. In the wake of Harvey’s devastation, the simple, seven-question survey didn’t seem momentous, but it was in two ways: It marked the first time the university had used data science to respond to a crisis, and it came while Rice was in the midst of a $43 million strategic initiative for data science.

In welcoming more than 400 people to Rice’s first Data Science Conference five weeks after Harvey, Provost Marie Lynn Miranda explained how the needs-assessment survey allowed Rice to “live its values” as the university responded to the storm.

graphic for Rice data science initiative“It’s a funny thing that you can collect seven small pieces of information on most of the community, and it can shape a whole series of different programs and policies, all of which were designed to deliver, what we call at Rice, our culture of care,” Miranda said. “The questions were: What’s your name? What’s your address? Do you have children? Do you have power? Do you have internet connection? Has your house been damaged by the hurricane? Has your car been damaged by the hurricane?”

By design, it took about 90 seconds to complete the survey on mobile devices, tablets and PCs. Within three days, 88 percent of the off-campus Rice community had responded. As a direct result, the administration created a housing matching program, a carpooling service, a temporary on-campus child care program that operated until schools reopened and a financial assistance program for students, faculty and staff.

Marie Lynn Miranda

Marie Lynn Miranda

“We are increasingly in a world where there is so much data that’s relevant to the kinds of questions that we have to answer, to the kinds of decisions we have to make,” Miranda said. “But we also, at the same time, need to remember what our core values are, especially in a mission-driven institution like Rice.”

Rice’s data science initiative, which began in 2015, includes hiring new faculty, establishing a new undergraduate minor and fostering programs like the Data Science Conference, which was organized and sponsored by the Ken Kennedy Institute for Information Technology, and the Dec. 12 Summit on Technology and Jobs that Kennedy Institute Director and computer scientist Moshe Vardi is organizing in Washington, D.C.,  “to put the issue of technology and jobs on the national agenda in an informed and deliberate manner.”

Jan Odegard

Jan Odegard

Kennedy Institute Executive Director Jan Odegard said data science is not a new term, and it isn’t easily defined. “The definition I like is ‘Data science is what you can do with data.’ And it doesn’t have to be a lot of data. You can do many different things if you have a lot of data, but you can still do data science if you have just a little bit of data. The key is having the right data, and bringing it together in a way yields new knowledge. Having the right data is often the key to unlocking other, richer data.”

By that reckoning, a few dozen Rice faculty members in engineering, social sciences and other schools have conducted data science research for decades. The urgency around data science — the reason that Rice, the city of Houston and virtually every Fortune 500 company are tooling up for data science — is due to the deluge of data that’s expected within the next decade.

Market intelligence firm IDC has predicted that by 2025 more than 152,000 internet-enabled devices will go online each minute. In that year alone, all the world’s devices will create a staggering 180 trillion gigabytes of new data, and of that, 44 trillion gigabytes — a greater amount than all the data created by all the world’s devices this year — will be analyzed and acted on in real time.

flood damage from Harvey

Rice’s use of data science helped shape a series of programs and policies that allowed university employees to recover and help one another after Tropical Storm Harvey. (Air National Guard photo by Staff Sgt. Daniel J. Martinez)

IDC and other experts point out that the only way companies and organizations will be able to manage is by ceding the task to intelligent machines, self-trained computer programs that learn to sort, process and act on data simply by being exposed to it. Creating those systems requires training in the latest techniques from machine learning and deep learning, and one result of the hoopla over data science is that companies are hiring anyone they can find with those skills.

Psychology Professor Fred Oswald, an organizational psychologist and big data expert who frequently collaborates on research with industry, said “Businesses I work with often express a strong interest in data science yet still do not know what it is. Clearly, they see an opportunity for a competitive edge. They know they need to participate in some way and figure it out as they go.”

The management consulting firm McKinsey and Co. projected that by 2018, the United States will have a shortage of data science talent that includes a need for 140,000 people with deep analytical skills and 1.5 million managers and analysts with the know-how to use data science in their decision-making.

That kind of demand has students rushing to take classes that can prepare them for data science careers. Ankit Patel’s course, Introduction to Deep Learning, which had just 40 students in fall 2016, has 140 this year. The course has attracted graduate students and undergrads from many schools at Rice as well as several dozen students from institutions across the Texas Medical Center.

Deep Learning course at Rice

Ankit Patel’s Introduction to Deep Learning course had 40 students in fall 2016 and has 140 this year. (Photo by Jeff Fitlow/Rice University)

“I definitely didn’t expect a threefold increase,” said Patel, an assistant professor with joint appointments in Rice’s Department of Electrical and Computer Engineering and Baylor College of Medicine’s Department of Neuroscience. “The course is cross-listed with computer science this year. That’s new and many of the students are in computer science, but bottom line is it’s one of the only deep learning courses at Rice, and (artificial intelligence) and deep learning are in extremely high demand now.”

Oswald, who provided input for a recent report from the National Academy of Sciences about data science education, and Computer Science’s Devika Subramanian are co-chairs of the committee that is developing Rice’s undergraduate minor in data science.

“It will likely require new courses that Rice doesn’t currently offer,” he said. “These and other details have to be firmed up and go through the undergraduate curriculum committee before the Faculty Senate can then review and approve the minor. We also have to have people lined up to teach those courses on a consistent basis. Basically, we’re getting our ducks in a row to get all of this solidified and ensure high quality.”

He said ironing out the structure of the minor was complicated by the fact that data science “is not one thing.”

Fred Oswald

Fred Oswald

“In some disciplines, data science can be more about working with data more systematically and intensively,” Oswald said. “In other disciplines, like statistics and computer science, data science can be more about developing new algorithms to handle complicated and messy data structures.

“As far as the undergraduate minor goes, our committee had to answer a key question related to this disciplinary difference: Do we have separate minors for people who are technically focused and want to create new algorithms versus people who want to use the tools and incorporate data science into whatever they’re doing substantively? We went back and forth — quite literally, on paper — before deciding on a single track. We want it to be integrated, and we want the standards to be high.”

Miranda recently announced the first four faculty hires under the data science initiative: Assistant professors Lydia Beaudrot of BioSciences, Anastasios Kyrillidis of Computer Science, Meng Li of Statistics and Akane Sano of Electrical and Computer Engineering.

Identifying and interviewing candidates fell to a data science faculty search committee co-chaired by Sociology’s Rachel Kimbro and Computer Science’s Keith Cooper. In the past academic year, the committee reviewed more than 450 applications, sent more than 150 dossiers to 17 departments and interviewed more than 30 candidates.

Rachel Kimbro

Rachel Kimbro

More hires are expected in the coming year, and Kimbro and Cooper said they expect another heavy workload for the committee. But they and Miranda pointed out that new hires in data science aren’t limited solely to positions created with money from the universitywide initiative. Thanks to hires for open lines in departments across campus, Miranda said, since 2015 Rice has hired “more than a dozen people who are real, card-carrying data scientists.”

Kimbro said, “I think departments have realized that going in the data science direction is going to be strategic for them. They see this as an area of growth for Rice.”

The data science initiative can also help Rice enhance research achievement and reputation – one of the goals of the Vision for the Second Century, part two.

Data science and analyses are key components of Kimbro’s own research into poverty and children’s health, and she said data science involves more than number crunching.

“It’s easy to do bad data science,” she said. “There can be lots and lots of information, but if you don’t know how to parse it out meaningfully, then it’s worthless. We want to make sure that our students are graduating with the tools and the basic data literacy to do good, high-quality, replicable data science. But another part of that is thinking about the ethical implications of the decisions that you’re making.”

For example, she said, banks are increasingly using data science in mortgage lending, and their algorithms can inadvertently discriminate against people if they’re poorly designed. “Should race be used as a predictor for loan defaults?” Kimbro said. “What about gender or family background? It’s important to think through what you are doing with your data.”

Kimbro said Rice’s data science initiative has the potential to offer something unique in this regard.

“A lot of the data science initiatives that we’ve seen that are springing up around the country have no social science or humanities involvement at all,” she said. “By taking a more interdisciplinary approach, Rice is well-positioned to prepare its students to do good data science, both from an ethical and a quality perspective.”

About Jade Boyd

Jade Boyd is science editor and associate director of news and media relations in Rice University's Office of Public Affairs.