At the DOE Joint Genome Institute (JGI), a national user facility at Berkeley Lab, researchers—working closely with Science IT and leveraging high-performance computing (HPC) resources—are developing powerful tools to analyze the vast amounts of data generated by genomic sequencing. The explosion of data—often reaching terabytes per study—requires scalable computing solutions to keep up with the rapid pace of discovery.
One initiative is GenomeOcean, an AI-powered model designed to learn the “natural language” of genomes. This model was the result of a collaborative effort between researchers from Berkeley Lab and Northwestern University, and it was originally developed using NERSC’s Perlmutter supercomputer, where its large-scale GPU capabilities enabled pretraining. Once the pretraining phase was complete, the Science IT team built the necessary infrastructure to support AI-powered genomic predictions, deploying the model using the LBNL institutional Lawrencium cluster.
As more genomes are sequenced, the demand for advanced computational tools continues to grow. The ability to catalog, analyze, and predict genomic functions will be essential for future discoveries in medicine, agriculture, and environmental science. By combining AI, big data, and high-performance computing, scientists are unlocking new possibilities in understanding life itself. Learn more in this IT News story.