Unlocking the intricate details of life’s genetic blueprint has long been a significant challenge in genomics. Enter GenomeOcean, a powerful generative model now available to the global research community, able to not just analyze but also create DNA sequences mimicking an array of microbial life. By leveraging massive user-generated datasets at the JGI and other publicly available data, GenomeOcean employs artificial intelligence for cutting-edge genomic research — and is capable of discovering and modelling complex biological sequences such as biosynthetic gene cluster structures.

Zhong Wang, a computational biologist at the Joint Genome Institute, led the project. He explained that large language models—artificial intelligence trained to understand, generate and manipulate the human language—are well positioned to help enhance our understanding of life’s genomic code, advancing DOE efforts of achieving a predictive understanding of biological systems.

“If we could model the genome as if it were a language, then we can take advantage of existing methodologies developed for natural languages to study genomes,” Wang explained. 

Read more on the JGI’s website.