Rice shares grant for AI-driven COVID-19 research

Computer scientist Todd Treangen to help determine how virus evolves

Todd Treangen

By Heather Ferreyra
Special to the Rice News

Brown School of Engineering computer scientist Todd Treangen has received a C3.ai Digital Transformation Institute Award for computational biology research applying artificial intelligence (AI) models to COVID-19 mitigation.

Todd Treangen
Todd Treangen

Treangen is developing novel bioinformatics algorithms and driving comparative genomic analyses to determine how SARS-CoV-2 is changing over time.

C3.ai is a research consortium of universities and technology companies funding scientists in a coordinated effort to curb current and future pandemics. After a rigorous peer review process of more than 200 proposals from the world’s leading scientists, 26 projects were awarded more than $5.4 million in cash. The recipients included multidisciplinary and multi-institution projects taking a novel approach to their research. Scientists will also be given access to massive sets of unified coronavirus data and cloud, software and supercomputing resources from the National Center for Supercomputing Applications (NCSA).

Treangen's accepted proposal, titled "Mining Diagnostic Sequences for SARS-CoV-2 Using Variation-Aware, Graph-Based Machine Learning Approaches Applied to SARS-CoV-1, SARS-CoV-2, and MERS Datasets,” is a collaboration with researchers from the University of Illinois at Urbana-Champaign. They include Nancy Amato, computer science department head and professor of engineering, as well as Lawrence Rauchwerger, a professor of computer science. The $225,000 award is to be used over a 12-month period. Rice will receive $75,000 of that money.

Treangen’s research will focus on studying viral mutations within a single patient to see what they reveal.

"When a person is infected with a virion, infected cells are coerced into allowing the virus to rapidly begin copying itself within the host,” he said. “A single infected cell can produce several hundred thousand copies of itself. From there, SARS-CoV-2 population is established containing a vast number of SARS-CoV-2 infected cells."

The higher the population of cells, the higher the host’s viral load, Treangen said. Furthermore, as these cells replicate, they also can mutate. "We want to understand what’s going on behind the scenes within a single COVID-19 positive patient,” he said.

To that end, his research approach will be unique because it focuses on the entire population of SARS-CoV-2 viruses rather than just what’s known as the “consensus genome” that can be thought of as an amalgamation of all of the SARS-CoV-2 copies within a single person. Treangen describes this as "an underexplored avenue for what might be driving biological differences of COVID-19 across different hosts.”

Treangen wants to know what combination of host and viral factors drive the biology behind COVID-19. After exploring the intrahost comparative genomic analyses of SARS-CoV-2, he and his team will then compare findings to and conduct the same analyses on SARS-CoV-1 and MERS-CoV to determine how they differ.

While viruses change over time, most COVID-19 testing relies on exact matches of common signatures of SARS-CoV-2 for detection. Unfortunately, patients sometimes get false negative results when those tests don’t exactly match the regions targeted by tests due to changes in SARS-CoV-2. If two patients share unique genome mutations, that new information could determine whether they had direct transmission and help track and prevent additional spread. Research to expand the basic understanding of the virus’ biology could address high false negative rates, improve testing and potentially have an impact on the development of vaccines.

One of the computational challenges of the project, according to Treangen, is the vast amount of data involved. In March, there were only a few hundred SARS-CoV-2 genomes, but soon, there will be over 100,000 genomes and sequencing datasets to investigate. His research team is looking to combine machine learning with novel bioinformatic methods that scale up to the vast number of incoming SARS-CoV-2 genomes and provide efficient intrahost and interhost comparisons. These tools will allow researchers to do a deep dive into the intrahost population and track interhost transmission.

Given the urgency of the pandemic, all results, science and findings from C3.ai-awarded projects will be open-source. That way, other scientists can also build upon the Treangen team’s discoveries.

– Heather Ferreyra is a publicist and marketing specialist in Rice's Department of Computer Science.