Serial crystallography at modern lightsources, especially X-ray free electron lasers (XFELs), has allowed us to examine the time evolution of biomolecules, while avoiding the radiation damage commonly experienced with earlier single-crystal techniques. My group is developing the computational techniques needed to process XFEL data. One important target is photosystem II, where an extremely detailed structural description of the sunlight-driven water splitting process is emerging, based on work with collaborators in MBIB Division and elsewhere. Indeed, we hope to understand the sequential transfer of single electrons, using special diffraction experiments performed at the X-ray absorption edge of the Mn cofactor atoms. This entirely new analysis technique for metalloproteins will be enabled by cctbx.xfel, our open-source data processing package.
These challenging crystallography problems require ultrafast X-ray imaging detectors that produce massive datasets (100 TB/day), requiring radically scaled-up computer resources. Within the Exascale Computing Project (ECP) we have implemented a processing pipeline that utilizes GPU nodes at national supercomputing centers such as NERSC, thus turning around large datasets within a matter of minutes so experimental decisions can be made during data collection.
Profoundly detailed algorithms are needed to analyze every pixel of the diffraction pattern, using a Bayesian framework to “solve the inverse problem” to infer the best physics parameters that describe the data. For situations where we do not know the deterministic model, we are experimenting with machine learning approaches.