Many are bizarre and live in salt lakes, hydrothermal vents and other extreme environments.
They developed a model, called LucaProt, using the ‘transformer’ architecture that underpins ChatGPT, and fed it sequencing and ESMFold protein-prediction data. They then trained their model to recognize viral RdRps and used it to find sequences that encoded these enzymes — evidence that those sequences belonged to a virus — in the large tranche of genomic data. Using this method, they identified some 160,000 RNA viruses, including some that were exceptionally long and found in extreme environments such as hot springs, salt lakes and air. Just under half of them had not been described before. They found “little pockets of RNA virus biodiversity that are really far off in the boonies of evolutionary space”, says Babaian.