Rice University computer scientists receive NSF grant to develop cloud-computing tools
Cloud computing is at the core of a new grant to Rice University computer scientist Christopher Jermaine, who plans to develop tools that will allow research and industry to make better use of massive data sets without having to rely on supercomputers.
The National Science Foundation (NSF) will send $1.2 million to Jermaine’s Rice group over three years to help optimize statistical machine learning techniques for distributed computing systems.
Machine learning is a branch of computer science concerned with building computational systems that can learn from data rather than follow explicit instructions. Applications include natural language processing, computer vision, social network analysis and online advertising. Jermaine is particularly interested in applications in the medical domain, such as analyzing records to build models for negative events like hospital readmissions.
“A lot of people are interested in ‘big data,’ and in the computer science area, we’re looking for the correct programming model and implementation platform,” said Jermaine, an associate professor of computer science who joined Rice five years ago. “These days, machine learning and statistical models power a lot of what goes on on the Internet and beyond.”
Jermaine argues that machine learning systems need not be tied to supercomputers. For many – and perhaps most – applications, hundreds or even thousands of small, connected systems can do the job more economically.
“The thesis behind this grant is that there’s really not a need to go in and invent radical new programming models and platforms,” he said. “A lot of what people want to do with statistical processing can be done with these classical relational database systems. The advantage is that a lot of the world’s data actually sits in these systems already.
“It also turns out that what the research computer scientists have been doing for the past 30 years is immediately applicable, because people know how to make these systems scale. They know how to make them handle large data sets, so we don’t have to reinvent the wheel.”
The Jermaine group’s work will focus on extending Rice’s SimSQL platform for stochastic analytics and will be open-source. In his presentations, Jermaine explains that programmers shouldn’t have to worry about how to program for a specific result; the system should do that part. The programmer should only specify what the result will be. The same code, he said, should run efficiently on one computer or a cluster of 1,000.
“The NSF is interested in is having prototype software that we can release, a proof of concept and papers that argue this is the way people should be building their big processing systems. That’s the end goal,” he said.
Jermaine sees the move toward cloud computing as inevitable for many applications, like the medical records analysis he’s been doing with Texas Medical Center colleagues utilizing Amazon’s Web-based Elastic Compute Cloud.
“There are a lot of attractions to this cloud-based model,” he said. “In the startup world, these cloud-based computing platforms are revolutionary, because you don’t have to go and ask a venture capitalist for $10 million to buy a big machine that’s going to be obsolete in a few years. You can ask for money just to pay for what you’re going to use. If you’re done for the day with your computations, you just turn it off and stop paying.
“It’s opened the door to a lot of things people could never have done before because of the high startup cost.”