Many are bizarre and live in salt lakes, hydrothermal vents and other extreme environments.
The AI model incorporates a protein-prediction tool, called ESMFold, that was developed by researchers at Meta. A similar AI system, AlphaFold, was developed by researchers at Google DeepMind in London, who won the Nobel Prize in Chemistry this week.
They developed a model, called LucaProt, using the ‘transformer’ architecture that underpins ChatGPT, and fed it sequencing and ESMFold protein-prediction data. They then trained their model to recognize viral RdRps and used it to find sequences that encoded these enzymes — evidence that those sequences belonged to a virus — in the large tranche of genomic data. Using this method, they identified some 160,000 RNA viruses, including some that were exceptionally long and found in extreme environments such as hot springs, salt lakes and air. Just under half of them had not been described before. They found “little pockets of RNA virus biodiversity that are really far off in the boonies of evolutionary space”, says Babaian.
AI data is not always correct. Especially if it has been fed the wrong data to extrapolate from.
reply