The use of artificial intelligence and machine learning in the biological sciences is increasing every day. But according Harshi Mukundan, a microbiologist conducting infectious disease research in the Biological Systems and Engineering (BSE) Division and Lead for the Chemical and Biological Technologies group in the Lab’s Office of National and Homeland Security, the field lacks comprehensive standards that address variability around the use of such methods. In a paper published in Nature’s Scientific Reports, Mukundan and her collaborators identify the principal determinants of how interpretable, reproducible, and relevant the use of artificial intelligence and machine learning can be in biological research, and calls out major aspects that influence outcomes in the absence of appropriate standardization.
In this audio interview, listen as Mukundan and Carrie Manore, a theoretical and computational biologist at Los Alamos National Laboratory who is a principal collaborator on this work, suggest key guardrails for helping biological researchers take full advantage of applying artificial intelligence to biological problems and derive relevant, reproducible results. “Artificial intelligence and machine learning bring a unique opportunity for us to address variability in biological systems and derive more crosscutting solutions,” Mukundan said. “That’s a huge opportunity space, and if we do right by it, I think it can really change the game in terms of what we can do for health and for biotechnology growth.”
Harshi Mukundan is the director of Convergence, the Lab’s strategic venture to bridge disparate scientific disciplines, technologies, and capabilities to drive toward solutions expediently. Mukundan also currently heads up the Department for Biomedical Sciences and Bioengineering (BSE) within the Biosciences Area, as well as several research projects. Before coming to Berkeley Lab, she was a scientist and group leader at Los Alamos for over 17 years. Carrie Manore is a group leader in the Earth and Environmental Sciences Division of Los Alamos National Lab. She is a mathematician by training and works on computational modeling of biological systems.
Mukundan, Manore, and their collaborators at both national labs have teamed up to develop computer models for representing the body’s innate immune response, with the ultimate goal of developing strategies for quickly responding to biological threats like pandemics. While working on this modeling project, they recognized the severe lack of standardization for using artificial intelligence in biological research, and turned their attention to addressing the gap. “We’re interested in building broader synergies between Los Alamos and Berkeley Lab, because we have distinctive capabilities and expertise, from clinical relevance to complex data integration. Bringing those together adds a new dimension to the kind of science we can achieve,” said Mukundan.
Expert Interview: Harshi Mukundan and Carrie Manore on Standards for Using Artificial Intelligence in Biological Research


Read the transcript
AI Standards for Biologists
Interview of Harshini Mukundan and Carrie Manore
Conducted by Maritte O’Gallagher
Maritte O’Gallagher:
Hi, this is Maritte O’Gallagher with Berkeley Lab’s Biosciences area. I’m here with Harshi Mukundan who leads a biosecurity research development program in Berkeley Lab’s office of National Homeland Security, as well as a research group in the Biosciences Area. She’s here with her collaborator, Carrie Manore, who leads a research group at Los Alamos National Lab. Welcome, Harshi and Carrie. I’m looking forward to hearing more about your recent work, devising standards for biologists working with artificial intelligence. What was your motivation for exploring this topic?
Harshini Mukundan:
I’m actually an infectious disease microbiologist by trade. This effort actually has been something that spun out of decades of work that I was doing at Los Alamos with the goal of developing pathogen-agnostic strategies in order to counter the next emerging threat, whether it’s a pandemic or an outbreak or a biological threat or what have you. So how do we develop that early warning or early action system has been the goal of a lot of the science that we’ve been doing at Los Alamos and now here at Berkeley Lab as well. And the inspiration for a lot of that work has been our own innate immune response, and our innate immune response is a pathogen-agnostic system.
Every time you get sick with the pathogen, you are coughing, you’re sneezing, you’re developing a reaction to it, and it’s capable of recognizing all pathogens, bacterial or viral or whatever they may be. Even those that are not in existence today, even those that may only evolve in the future, we are capable of recognizing it. How does it do that? It does this by a very elegant pattern recognition framework of looking at different signatures that are evolutionarily conserved and going after it. So early on we started embarking on how can we mimic this type of pattern recognition framework in the laboratory to create universal diagnostics. And this complexity is too challenging to be solved manually. And in that process we started thinking, can we actually combine Carrie and her team and the theoretical expertise that we have with our microbiology and infectious disease expertise to model innate immunity.
MO:
Carrie, can you clarify what your area of expertise is as a theoretical and computational biologist?
Carrie Manore:
So, the theoretical side is saying how do we translate all this knowledge that’s coming from data in the field and experiments and subject matter expertise into actionable models in the computer that can predict what are scenarios or answer questions that we want to answer?
MO:
I see. So this collaboration combined theory and computation with infectious disease biology to model innate immune response and develop flexible strategies for countering the next emerging biological threat.
CM:
That’s right.
HM:
And when we embarked on that journey, we started generating data. We started looking at patents. We started looking at physiological relevance and the translational value of everything that we were generating. And we realized the lack of standardization and validation in the use of machine learning. And because of that, there is an open game in terms of how we employ these tools to drive towards the outcomes or the results that we want. So this paper between Los Alamos and us is an attempt to demonstrate that there is a need for this type of standardization because currently, even with the small existing data set that we generated in house, it’s open game in terms of what type of methods I would use. And sometimes I have the luxury of picking a combination of methods that give me the kind of results that I want to see rather than what is actually accurate or relevant in that portfolio. And that I think is going to be a very important component in looking at machine learning and biology because biological systems are intrinsically variable and there’s a lot of factors that govern the outcomes that you get. And it’s important for us to keep that domain-specific knowledge in place as we apply these types of complex computational tools to drive biological predictions or biological outcomes. And that’s the goal of our work.
MO:
So you mean as biologists who understand how biology works, you need to tap into that understanding and that context in order to inform the decisions that you make when you’re designing these experiments with AI and machine learning?
HM:
That’s exactly right. Because the kind of data that we’ve used together, the kind of systems or models that we use to evaluate the data, there’s variability, there’s a lot of factors that determine them. They all cannot be seamlessly integrated to create larger data sets all the time. So having that knowledge of the problem that we are addressing is going to be important at least in the early stages of using artificial intelligence and machine learning and biology.
MO:
I’m curious if you can talk a little bit about the moment you think we’re in with AI and machine learning and from your perspective—why it’s important to tackle the questions that are addressed in this paper.
HM:
I think we are in a huge opportunity space, but there’s also obviously some risks associated with it, but that’s true of any new disruptive technology that might come to the world. And so with the advent of AI and machine learning and with collaborations with people like Carrie and the team at Los Alamos, we actually have the opportunity to see if these advanced computing modalities can actually get to the bottom of the uncertainty that we see in this type of a data set in a way that can guide decision making. We don’t know exactly how specific we will be able to go. We don’t know what the discriminative power will be, but it gives us a potential to dissect these types of complex questions. The risk that we have there in terms of what I call translational risk is the fact that we need to understand the data. We need to have the data and we need to be able to have ensemble data in an AI ready manner and in a reproducible manner for us to be able to drive at physiologically relevant conclusions. And we need to have ways of validating those conclusions. And again, as with any new technology, there’s always another risk of people using it for nefarious activities. And the more we understand the boundary conditions of how these work, the better we’ll be able to protect ourselves or secure ourselves against those.
CM:
These kinds of questions that we’re asking are really high consequence questions, and so we need a lot more rigorous analysis around our certainty and what the algorithms will output. And so we’re talking about do you have a deadly pathogen or not? How should we treat it? We want to be sure we need machinery around it to actually understand why it’s working, to be able to interpret it if we need to, and to be able to quantify how certain we are at the output of that model so we can use it for these situations where we’re making important decisions.
MO:
Part of the power of these tools is that it looks at the data and it doesn’t need to understand the system in order to pull out things in the data that are relevant, but then we kind of need to come back around and bring the understanding of the system back in, in order to really make sense of what it’s spitting out and to inform the decisions that we make on the front end of how we’re designing the experiments of what the tools we’re choosing to use.
CM:
That’s right. You have it exactly right. Yep.
HM:
I think that the second part is very important and a lot of the work is happening without that second part that could be dangerous because it’s definitely going to spit out some conclusions. But are they relevant? That is the billion dollar question that we’re asking. The other part of this is essentially you have all of these capabilities and each one of us is working on different aspects, but bringing that all together can actually create a unique landscape for us to be able to avail of the power of biology. And if I can also just throw in a pitch for national labs, I think the whole concept of Lawrence’s team science and actually enabling multidisciplinary folks to come together to solve a problem is something that I think is down at the core of this project and all of us have something new to learn and by extension have something significant to contribute.
MO:
In the paper, you talk about how the complexity of biological systems makes them potentially more vulnerable. Could you speak a little bit more about that?
HM:
Right. I mean, in mathematics two plus two is always four, right? But if you take a COVID-19 and throw it at a human population, one person does not get sick, a couple others are asymptomatic, a few are coughing and sneezing, some are hooked up to the ventilator, and some succumb to it.
Everything’s going to impact outcomes. So you cannot just say that I’m pulling the data associated with say, influenza infection and expect it all to kind fall into place together. The complexity of biology is what makes this type of process more challenging. And that’s why we said earlier that inclusion of that domain expertise is going to be important in order to derive relevance from the use of these type of tools, which clearly have a huge opportunity and we are all for that. But making sure that it is relevant is the key question that we’re asking.
CM:
There’s an opportunity and a challenge there that are basically hand in hand with these complex systems. I mean, it’s kind of funny because they explicitly lend themselves to this kind of approach. And at the same time, it makes it challenging to quantify the certainty we have in whatever the algorithm tells us is going on.
MO:
And the challenge is essentially that you need to kind of be careful about what you’re feeding into the algorithm and make sure you’re looking at apples and apples and not apples and oranges. You have to be able to say, okay, well this study looked at this and this study looked at this other thing. And so actually you can’t look at that data in the same pool.
CM:
So thinking about how we fuse data sets in a way that’s not going to bias the outcome of the model or the algorithms. And so there are ways to do that, but it takes careful thought. And this is where Harshi and I both argue for the importance of having different experts working together actively as opposed to handing it off. And if I don’t understand how they do these measurements and how that might change the output, I can’t understand how to put that data in such a way that it’s not going to screw up our algorithm or screw up the output. The kinds of things that can happen where if you’re not careful about how you put in the data or how you test how well your model does on the data, then you can get things that sound really great but that aren’t that great once you start digging under the surface. So I think that’s a huge argument for folks from different disciplines working together.
HM:
So given that degree of complexity of the system, what is important is what are the questions you are asking of the data that you have and how you can actually validate those outcomes or be assured of those outcomes that you get from that dataset. With respect to how this project began, the early work on innate immunity-inspired diagnostics, and I was working on some modeling of comorbidities with our mutual colleague, Ben McMann at Los Alamos, and Carrie also was an integral part of that project. We actually used a lot of our modeling to ask questions on, are diagnostics relevant? Where should we put our costs? And things of that sort. And that’s when we started thinking about modeling innate immunity or driving select decisions, diagnostics decisions. If we can get so much reproducibility questions or interpretability questions or generalizability questions from one significant, one small data set that we generated in the laboratory against these different machine learning classifiers. Then maybe we should be very cognizant of the fact that everything that people are using AI to generate may not be an absolute answer, but more of a probabilistic answer. And this is true for people in the public realm that look at news about AI-generated information and just having an understanding that not all of it might be accurate and not all of it may stand the test of time, and we need to have verification and validation before we completely buy into something that may come out of the use of these tools at this point.
MO:
It’s almost like researchers have been working with the scientific method for so long, learning how to look at the burden of truth and get there with experiments and how to design that. But this is a different kind of experiment.
HM:
We need a scientific method for the use of AI in biology.
MO:
Yes, exactly.
HM:
The other aspect is understanding at the opportunity space. I, for one, I’m very excited because we are limited in one bug, one drug, one kind of application for certain things in the health and biology space because it’s impossible for experimentalists to address the entire gamut of diversity that we have in this space. AI and ML bring a unique opportunity for us to be able to address this type of variability in biological systems in order to derive more crosscutting solutions. And that’s a huge opportunity space. And if we do right by it, I think it can really change the game in terms of what we might be able to do for health and for biotechnology growth, especially here in the United States.
MO:
Thank you so much for being here today.
HM:
Thank you for having us.