SYDNEY: Over 160,000 previously unknown viruses have been identified using a specialized artificial intelligence (AI) program in what is the largest study of its kind.
This research underscores the vastness of the virosphere—the multitude of viruses inhabiting various environments on earth. The study also utilized an AI program named LucaProt, which uncovered previously unrecognized RNA viruses from databases containing genetic material gathered from ecosystems worldwide. RNA viruses, including coronaviruses, are characterized by their single-stranded ribonucleic acid (RNA), unlike DNA viruses, such as herpes viruses, which have double-stranded DNA.
According to virologist Eddie Holmes from the University of Sydney, who co-led the research, this study demonstrates how “transformational” AI has become for scientists aiming to identify protein structures and discover diverse viruses. The LucaProt algorithm operates similarly to the AlphaFold system, which was awarded this year’s Nobel Prize in Chemistry, and AI advancements were also recognized in the Nobel Prize for Physics.
AI Helping Scientists to Go Viral Like Never Before
Holmes and his team described LucaProt as a tool that navigates through the “dark matter” of genetic information. They began their analysis with “metagenomic” samples, a mix of genetic data from plants, animals, fungi, bacteria, and non-living materials like viruses. Within these known DNA segments, they found lines of unknown code that did not match anything in existing databases, which they termed “dark matter.”
The researchers trained the LucaProt algorithm to predict which segments of this dark matter originated from viral RNA species. From a single 50-gram metagenomic sample collected from an agricultural site south of Sydney, they discovered over 1,600 new viruses. In total, they analyzed more than 10,000 similar samples, leading to the identification of 161,979 potential RNA virus species and 180 RNA virus supergroups.
However, these 160,000 viruses represent only a small fraction of the total; the authors suggest that this may be less than 0.1% of all viruses yet to be discovered, hinting at the true enormity of the world’s virosphere.
Ben Longdon, an evolutionary biologist at the University of Exeter, UK, noted that LucaProt is an invaluable tool for identifying viruses and that he is already using it in his research on emerging viral diseases. Longdon emphasized that AI is enabling researchers to uncover “tons of information” about viruses, often outpacing our ability to catalog and name them.
Thousands of New Viruses, But Humans Likely Safe
Regarding potential threats to humans, Holmes stated that the study likely does not reveal new viral risks, as the identified viruses are probably incapable of infecting humans. “Of the 160,000 new viruses, none are closely related to mammalian viruses, and I doubt any would infect humans,” he said. Furthermore, even if some could potentially infect humans, there’s no evidence suggesting they would be harmful.
Understanding the existence of these viruses is crucial, according to Longdon. “To grasp emerging infectious diseases, we need to know what viruses exist, how they are transmitted, and what factors influence their ability to jump between species,” he explained. He further said that these findings are a step toward understanding viral diversity and how viruses can evolve to become more or less infectious.