Computer code co-developed by a scientist from Lawrence Berkeley National Laboratory (Berkeley Lab) and embraced by the global science community over two decades has been hailed by Nature as one of “ten computer codes that transformed science.”
Fernando Pérez, currently a faculty scientist in Berkeley Lab’s Computational Research Division and an associate professor in statistics at the University of California, Berkeley (UC Berkeley), developed a way to use Python, a computer coding language, to iterate and explore scientific data while he was a graduate student. Now known as IPython, this Python interpreter laid the foundation for Project Jupyter and is used today by the Department of Energy (DOE) Systems Biology Knowledgebase (KBase).
Launched in 2011, KBase aims to accelerate the discovery, prediction, and design of biological functions. It provides a collaborative, open-source environment for the scientific community to access integrative data analysis and modeling tools supported by the DOE’s world-class computing resources. KBase was one of the first big scientific platforms to have Jupyter notebooks at the center of its design, and the impact of this collaboration has reverberated across the field of systems biology research.
“KBase is meant to be a little disruptive in the way that it operates,” said Adam Arkin, KBase co-principal investigator and senior faculty scientist in the Environmental Genomics and Systems Biology Division. “We want to make the field of biological systems research as open, transparent, reusable, and interoperable as possible.”
With Pérez’s help, the KBase team was able to realize their vision of collaborative science. By leveraging Jupyter notebooks, they built a system that packages scientific data and automatically documents all of the codes and order of operations that the scientists used to achieve their results. And it’s all backed by DOE supercomputing resources. With one click, scientists can publish their notebooks, and then request a digital object identifier (DOI).
Going forward, Pérez sees opportunities for open-source tools like Project Jupyter, combined with cloud computing and supercomputing, to empower large-scale “communities of practice” where distributed research collaborations gather and grow to tackle big common problems like climate change or the global COVID pandemic.
Read more on the Computing Sciences website.