Biosciences Area

  • About Biosciences
    • Leadership
    • Area Operations Centers
    • Strategic Plan and Reports
    • Strategic Programs Development Group
    • Contact Information
  • Our Science
    • Area Programs
    • Strategic Initiatives
    • Biological Systems and Engineering
    • Environmental Genomics and Systems Biology
    • Molecular Biophysics and Integrated Bioimaging
    • DOE Joint Genome Institute
  • Media and Events
    • News
    • Announcements
    • Behind the Breakthroughs
    • Events Calendar
    • Seminar Series
  • Staff Resources
    • Commonly Used Acronyms
    • Communications
    • Hiring and Recruitment
    • Hybrid & Telework Resources
    • IDEA
    • Intellectual Property, Industry Engagement, and Entrepreneurship
    • LDRD Information
    • Logos and Templates
    • Mentoring Program
  • Search

JGI Helps Develop Metagenomic Clustering Algorithm Powered by HPC

March 12, 2018

Proteins from metagenomes clustered into families according to their taxonomic classification. (Georgios Pavlopoulos and Nikos Kyrpides, JGI)

On a social network like Facebook, each user (person or organization) is represented as a node and the connections (relationships and interactions) between them are called edges. By analyzing these connections, researchers can learn a lot about each user. In biology, similar graph-clustering algorithms can be used to understand the proteins that perform most of life’s functions. Today, advanced high-throughput technologies allow researchers to capture hundreds of millions of proteins, genes and other cellular components at once and in a range of environmental conditions. Clustering algorithms are then applied to these datasets to identify patterns and relationships that may point to structural and functional similarities. Though these techniques have been widely used for more than a decade, they cannot keep up with the torrent of biological data being generated by next-generation sequencers and microarrays. In fact, very few existing algorithms can cluster a biological network containing millions of nodes (proteins) and edges (connections). That’s why a team of Berkeley Lab researchers, including scientists at the Joint Genome Institute (JGI), took one of the most popular clustering approaches in modern biology—the Markov Clustering (MCL) algorithm—and modified it to run quickly, efficiently and at scale on distributed-memory supercomputers. In a test case, their high-performance algorithm—called HipMCL—achieved a previously impossible feat: clustering a large biological network containing about 70 million nodes and 68 billion edges in a couple of hours, using approximately 140,000 processor cores on the National Energy Research Scientific Computing Center’s (NERSC) Cori supercomputer. A paper describing this work was recently published in the journal Nucleic Acids Research. Read the whole story on the Berkeley Lab Computational Research Division site.

Was this page useful?

Send
like not like

About Biosciences

  • Leadership
  • Area Operations Centers
  • Inclusion, Diversity, Equity, and Accountability (IDEA)
  • Contact

Divisions & User Facility

  • Biological Systems and Engineering
  • Environmental Genomics and Systems Biology
  • Molecular Biophysics and Integrated Bioimaging
  • DOE Joint Genome Institute

Resources

  • A-Z Index
  • Phonebook
  • Logos
  • Acronyms
  • Integrated Safety Management
Questions & Comments
Follow us: Mastodon LinkedIn YouTube