The scientific and national security communities have long shared an unmet need for a tool capable of quickly and reliably distinguishing genetically modified organisms from naturally occurring ones. Over the course of a six-year program funded by the United States Intelligence Advanced Research Projects Activity (IARPA), several techniques were developed and refined. To evaluate the work accomplished by its research teams, IARPA leverages national laboratories to perform testing and evaluation.
A team of Biosciences Area researchers overseen by the Biological Systems and Engineering (BSE) Division’s Susan Celniker was chosen to lead the testing and evaluation phase of the program, called Finding Engineering-Linked Indicators (FELIX). She and her colleagues, including project co-lead Ben Brown of the Environmental Genomics and Systems Biology (EGSB) Division, designed and produced biological samples of increasing complexity to assess how well the tools performed. To ensure the technologies would be as useful as possible for national security applications, the samples were based on current and potential real-world biothreat scenarios.
In total, the scientists at Berkeley Lab, Pacific Northwest National Laboratory, and the United States Department of Agriculture produced nearly 200 unique sample organisms with modifications ranging from large DNA sequence deletions or insertions to very subtle single nucleotide alterations made using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR). The samples included virus particles and cells from bacteria, mammals, and fungi; they represented potential human pathogens, such as HIV and E. coli, plant-infecting pathogens, and engineered complex species. To ensure health and security for everyone, none of the microbial or viral samples created for testing were infectious and all were controlled under strict biosafety procedures.
IARPA program leaders set an ambitious goal for the screening technologies of 99% specificity (no more than 1% of wild types misidentified as genetically modified) and 90% sensitivity (no more than 10% of tests could misidentify a modified organism as wild type). The four techniques that passed through to the final phase of testing and will be useful for identifying biological threats were: a lab-based test from the company Draper and computational models from Raytheon, Ginkgo Bioworks, and Noblis. These techniques were shown to be excellent at identifying wild type organisms, and a Berkeley Lab-developed ensemble of the computational models achieved 99% specificity. Overall performance of the individual models and the ensemble demonstrated considerable improvement over existing state-of-the-art capabilities.
One reason it’s so difficult to differentiate engineered organisms from naturally occurring ones is that scientists use many different databases and programs to review and store genome sequence data, as well as disparate names and terms to describe genes and their predicted functions. To remedy this issue, EGSB’s Chris Mungall led the development of an open-access software program and database, Synbio Schema, to catalog the annotated genomes of national security-relevant engineered and wild type organisms using standardized language.