Rice University leads $11 million effort in big data software analytics
Writing computer programs could become as easy as searching the Internet. A Rice University-led team of software experts has launched an $11 million effort to create a sophisticated tool called PLINY that will both “autocomplete” and “autocorrect” code for programmers, much like the software that completes search queries and corrects spelling on today’s Web browsers and smartphones.
“Imagine the power of having all the code that has ever been written in the past available to programmers at their fingertips as they write new code or fix old code,” said Vivek Sarkar, Rice’s E.D. Butcher Chair in Engineering, chair of the Department of Computer Science and the principal investigator (PI) on the PLINY project. “You can think of this as autocomplete for code, but in a far more sophisticated way.”
Sarkar said the four-year effort is funded by the Defense Advanced Research Projects Agency (DARPA). PLINY, which draws its name from the Roman naturalist who authored the first encyclopedia, will involve more than two dozen computer scientists from Rice, the University of Texas-Austin, the University of Wisconsin-Madison and the company GrammaTech.
PLINY is part of DARPA’s Mining and Understanding Software Enclaves (MUSE) program, an initiative that seeks to gather hundreds of billions of lines of publicly available open-source computer code and to mine that code to create a searchable database of properties, behaviors and vulnerabilities.
Rice team members say the effort will represent a significant advance in the way software is created, verified and debugged.
“Software today is far more complex than it was 20 years ago, yet it is still largely created by hand, one line of code at a time,” said co-PI Swarat Chaudhuri, assistant professor of computer science at Rice. “We envision a system where the programmer writes a few of lines of code, hits a button and the rest of the code appears. And not only that, the rest of the code should work seamlessly with the code that’s already been written.”
He said PLINY will need to be sophisticated enough to recognize and match similar patterns regardless of differences in programming languages and code specifications. The system will have to explore different ways of interweaving code retrieved through search into a programmer’s partially completed draft program and analyze the resulting code to make sure that it does not have bugs or security flaws.
The core of the system will be a data-mining engine that continuously scans the massive repository of open-source code. The engine will leverage the latest techniques in deep program analyses and big-data analytics to populate and refine a database that can be queried whenever a programmer needs help finishing or debugging a piece of code.
“The engine will formulate answers using Bayesian statistics,” said co-PI Chris Jermaine, associate professor of computer science at Rice. “Much like today’s spell-correction algorithms, it will deliver the most probable solution first, but programmers will be able to cycle through possible solutions if the first answer is incorrect.”
“This is a dream team that combines Rice’s traditional strengths in programming language research with our new capabilities in big-data analytics,” Sarkar said. “Add to that our world-class experts from U. Wisconsin, UT-Austin and Grammatech and we have an exciting four years ahead of us as we embark on addressing this DARPA hard challenge.”
DARPA Distribution Statement “A” (Approved for Public Release, Distribution Unlimited)