Evolution software looks beyond the branches

Rice University program models more detailed evolutionary networks from genetic data

The tree has been an effective model of evolution for 150 years, but a Rice University computer scientist believes it’s far too simple to illustrate the breadth of current knowledge.

Rice researcher Luay Nakhleh and his group have developed PhyloNet, an open-source software package that accounts for horizontal as well as vertical inheritance of genetic material among genomes. His “maximum likelihood” method, detailed this month in the Proceedings of the National Academy of Sciences, allows PhyloNet to infer network models that better describe the evolution of certain groups of species than do tree models.

Graphic

Phylogenetic networks depict the movement of genetic sequences from one species to another as a means of showing where horizontal gene transfer may have taken place. Software by scientists at Rice University aims to reveal far more about species' evolutionary histories than traditional tree models are able to. Courtesy of Luay Nakhleh

“Inferring” in this case means analyzing genes to determine their evolutionary history with the highest probability – the maximum likelihood – of connections between species. Nakhleh and Rice colleague Christopher Jermaine recently won a $1.1 million National Science Foundation grant to analyze evolutionary patterns using Bayesian inference, a statistics-based technique to estimate probabilities based on a data set.

To build networks that account for all of the genetic connections between species, the software infers the probability of variations that phylogenetic trees can’t illustrate, such as horizontal gene transfers. These transfers circumvent simple parent-to-offspring evolution and allow genetic variations to move from one species to another by means other than reproduction.

Biologists want to know when and how these transfers happened, but tree structures conceal such information. “When horizontal transfer occurs, as with the hybridization of two species, the tree model becomes inadequate to describe the evolutionary history, and networks that incorporate horizontal gene transfer become the more appropriate model,” Nakhleh said.

Nakhleh’s Java-based software accounts for incomplete lineage sorting, in which clues to gene evolution that don’t match the established lineage of species appear in the genetic record.

“We are the first group to develop a general model that will allow biologists to estimate hybridization while accounting for all these complexities in evolution,” Nakhleh said.

Most existing programs for phylogenetics (the study of evolutionary relationships) ignore such complexities. “They end up overestimating the amount of hybridization,” Nakhleh said. “They start seeing lots of complexities in the data and say, ‘Oh, it’s complex here; it must be hybridization,’ and end up inferring too much. Our method acknowledges that part of the complexity has nothing to do with hybridization; it has to do with other random processes that happened during evolution.”

Luay Nakhleh

Luay Nakhleh

The Rice researchers used two data sets to test the new program. One, a computer-generated set of data that mimics a realistic model of evolution, allowed them to evaluate the accuracy of the program. The second involved multiple genomes of mice found across Europe and Asia. “There have been stories about mice hybridizing,” Nakhleh said. “Now that we have the first method to allow for systematic analysis, we ran it on a very large amount of data from five mouse samples and we detected hybridization” – most notably in the presence of a genetic signal from a mouse in Kazakhstan that found its way to mice in France and Germany, he said.

Nakhleh hopes evolutionary biologists will use PhyloNet to take a fresh look at the massive amount of genomic data collected over the past few decades. “The exciting thing for me about this is that biologists can now systematically go through lots of data they have generated and check to see if there has been hybridization.”

Co-authors of the paper are Rice postdoctoral researchers Yun Yu and Jianrong Dong and Kevin Liu, a former Rice postdoctoral researcher and now an assistant professor of computer science and engineering at Michigan State University. Jermaine is an associate professor of computer science at Rice.

The National Science Foundation, the National Institutes of Health’s National Library of Medicine and the Keck Center of the Gulf Coast Consortia supported the research.

About Mike Williams

Mike Williams is a senior media relations specialist in Rice University's Office of Public Affairs.