Jana B. Thayer
Research Technical Manager, SLAC National Accelerator Laboratory
Bio
Dr. Jana Thayer is the Division Director for the Linac Coheret Light Source (LCLS) Data Systems at the SLAC National Accelerator Laboratory, responsible for data acquisition, data management, data analysis framework, and computing for the LCLS facility and development of the next generation data system to support the LCLS-II upgrade. She is the PI for the Intelligent Learning for Light Source and Neutron Source User Measurements Including Navigation and Experiment Steering (ILLUMINE) project which facilitates rapid data analysis and autonomous experiment steering capabilities to support cutting-edge research tightly coupling high-throughput experiments, advanced computing architectures, and novel AI/ML algorithms to significantly reduce the time to science at the light and neutron sources. She is also the PI for the Actionable Information from Sensor to Data Center project to produce actionable information using AI/ML from instruments at the edge and provide the necessary computing workflows to train and retrain the models using HPC. Jana started at SLAC in 2004 working on the Fermi Gamma-Ray Space Telescope and led the Large Area Telescope Flight Software Team from 2006 - 2009. Jana has a Ph.D. in Elementary Particle Physics from The Ohio State University and has long nurtured an interest in data acquisition systems and high-performance software in the fields of HEP, astrophysics, and photon science.
Current Role at Stanford
LCLS Experimental Data Systems Division Director
Honors & Awards
-
2023 SLAC Director's Award for the LCLS-II Data System, SLAC National Accelerator Laboratory (April 2024)
-
NASA Public Service Group Achievement Award for the Flight Software for the LAT Instrument, NASA (November 2009)
-
Hazel Brown Outstanding Teaching Assistant Award, The Ohio State University (May 1997)
-
Laura Eisenstein Award, University of Illinois at Urbana-Champaign (May 1996)
-
Undergraduate Outreach Achievement Award, University of Illinois at Urbana-Champaign (May 1995)
Education & Certifications
-
Project Management Profesional, Project Management Institute (2018)
-
Certified Scrum Master, Scrum Alliance, Inc (2017)
-
Postdoctoral Research Associate, University of Rochester, High Energy Physics (2004)
-
Ph.D., The Ohio State University, High Energy Physics (2002)
-
B.S., University of Illinois Urbana-Champaign, Engineering Physics (1996)
Projects
-
ILLUMINE - - Intelligent Learning for Light Source and Neutron Source User Measurements including Navigation and Experiment Steering, SLAC National Accelerator Laboratory (9/1/2023 - Present)
Experiments at X-ray light sources and neutron sources enable the direct observation and characterization of materials and molecular assemblies critical for energy research. Ongoing facility enhancements are exponentially increasing the rate and volume of data collected, opening up new frontiers of scientific research but also necessitating advancements in computing, algorithms and analysis to exploit this data effectively. As data rates surge, accelerated processing workflows are needed that can mine continuously streamed data to select interesting events, reject poor data, and adapt to changing experimental conditions. Real-time data analysis can offer immediate feedback to users or direct instrument controls for self-driving experiments. Autonomous experiment steering in turn is poised to maximize the efficiency and quality of data collection by connecting the user’s intent in collecting data, data analysis results, and algorithms capable of driving intelligent data collection and guiding the instrument to optimal operating regimes. ILLUMINE will facilitate rapid data analysis and autonomous experiment steering capabilities to support cutting-edge research driven by unprecedented data production rates, tightly coupling high-throughput experiments, advanced computing architectures, and novel AI/ML algorithms to significantly reduce the time to optimize instrument configurations, leverage large datasets, and optimize the use of oversubscribed beam times. To deliver these pivotal capabilities — rapid data analysis and autonomous experiment steering — for diverse experiments across the facilities, we will develop algorithms to perform real-time compression and ML inference at the experiment edge and expand on current edge-to-HPC analysis pipelines. We will also create advanced workflow monitoring and decision support systems, including reinforcement learning for data optimization, handling uncertainty, and high-dimensional search algorithms for experiments. Connecting these two elements is the development of a multi-facility framework built on a common interoperability layer for autonomous experiment workflows and built on the widely-used Bluesky data collection platform into an accessible toolbox of reusable off-the-shelf components that can be assembled into tailored workflows that cater to specific scientific needs. Collectively these advances are poised to unlock the transformative potential of the facility upgrades by delivering rapid analysis and workflow monitoring algorithms built on a common, interoperable framework to ensure their broad transferability across facilities, instruments, and experiments. Ultimately, these capabilities will significantly enhance experimental output and enable groundbreaking scientific exploration, shedding light on some of the most challenging and novel scientific questions facing the nation.
Location
Menlo Park, CA
-
Actionable Information from Sensor to Data Center, SLAC National Accelerator Laboratory (9/1/2023 - 9/30/2023)
The exponential increase in the scale and speed of data production at light sources is prohibitive to traditional analysis workflows that rely on scientists painstakingly tuning parameters during live experiments to adapt data collection and analysis. User facilities will increasingly rely on the automated delivery of actionable information in real- time for rapid experiment adaptation.
Machine learning (ML) techniques coupled with advanced computing infrastructure provide a solution to this challenge. Specifically, ML permits embedding domain expertise into pre-trained but adaptable models that can rapidly perform analysis tasks at scale. During our previous work pe- riod, we developed and benchmarked ML algorithms that perform electron spectra reconstruction, anomaly detection in high-energy diffraction microscopy (HEDM) data, and feature extraction for serial crystallography (SX) and single-particle imaging (SPI) experiments. We also built the hard- ware and software ecosystem required to make these algorithms sufficiently adaptable. Real-time feedback is achieved by performing artificial intelligence (AI) inference at the experiment edge – on dedicated compute infrastructure at the sensor – using the SLAC Neural-network Library (SNL), a framework designed for high-bandwidth, low-latency data processing. Data are seamlessly streamed to high-performance computing where AI accelerators rapidly perform the compute-intensive task of model training. By returning the updated model parameters to the sensor, near real-time feed- back can be used to steer experiments or optimize data acquisition within minutes of changes to the experimental configuration. Collectively, this work has demonstrated that deploying ML-based workflows at the experiment edge that are directly coupled to remote computing facilities can provide critical feedback needed to guide experiments on-the-fly.
We build on this foundation by taking the next critical steps needed to render the information generated by ML routines from sensors to data centers fully actionable. Importantly, these advances meet the challenge of high data volumes by closing the automated feedback loop during active experiments. We will pursue five specific avenues of research to achieve this goal – two devoted to general infrastructure and three related to specific experiments. On the infrastructure side, we will expand SNL to generalize our AI inference capabilities. Complementing these efforts, we develop an optimized framework for rapid ML model re-training at remote data centers and integrate automated experiment steering capabilities. On the experiment side, we explore different representations of multi-channel time-series data to optimize its delivery as actionable information – in a form that can be used directly for control of attosecond science experiments. Second, we will leverage SNL to chain calibration, assembly, and inference tasks to overcome the bottleneck to providing near real-time feedback for imaging experiments on tiled detectors with high repetition rates, with impact in heterogeneous catalysis and biochemical science. Finally, we will refine the ML algorithms we deployed at both the experiment edge and remote data centers for HEDM experiments to reliably handle new materials. The infrastructure and experiment-driven aims outlined above are highly synergistic, as the actionable feedback required to steer diverse experiments will motivate the development of more adaptable ML frameworks and vice versa.
These advances will be instrumental for handling the massive data throughput generated by LCLS-II and APS-U. The edge-to-HPC pipeline we build will provide an integrated approach to data analytics, one that leverages the full palette of resources across the DOE complex to support novel experiments enabled by the coming upgrades at these ultra-bright light sources.Location
Menlo Park, CA
-
Diaspora, Argonne National Laboratory
The Diaspora project is working to create resilient scientific applications across integrated computing infrastructures. The project is developing a system that will allow scientists to quickly and accurately share information about data, application, and resource status to meet a broad set of resilience needs so that researchers can better manage and overcome potential disruptions in the future. To accomplish this, Diaspora is creating a hierarchical event fabric, developing resilience services, and evaluating these new capabilities in scientific applications.
Location
Lemont, IL
-
LCLStream, SLAC National Accelerator Laboraty with Oak Ridge National Laboratory (11/1/2023 - 10/31/2024)
Experiments at X-ray light sources and neutron sources enable the direct observation and characterization of materials and molecular assemblies critical for energy research. Ongoing facility enhancements are exponentially increasing the rate and volume of data collected, opening up new frontiers of scientific research but also necessitating advancements in computing, algorithms and analysis to exploit this data effectively. As data rates surge, accelerated processing workflows are needed that can mine continuously streamed data to select interesting events, reject poor data, and adapt to changing experimental conditions. Real-time data analysis can offer immediate feedback to users or direct instrument controls for self-driving experiments. Autonomous experiment steering in turn is poised to maximize the efficiency and quality of data collection by connecting the user's intent in collecting data, data analysis results, and algorithms capable of driving intelligent data collection and guiding the instrument to optimal operating regimes. ILLUMINE will facilitate rapid data analysis and autonomous experiment steering capabilities to support cutting-edge research tightly coupling high-throughput experiments, advanced computing architectures, and novel AI/ML algorithms to significantly reduce the time to optimize instrument configurations, leverage large datasets, and optimize the use of oversubscribed beam times. To deliver these pivotal capabilities — rapid data analysis and autonomous experiment steering — for diverse experiments across the facilities, we will develop algorithms to perform real-time compression and ML inference at the experiment edge and expand on current edge-to-HPC analysis pipelines. As part of the Integrated Research Infrastructure vision, this project is developing cross facility workflows. We will also create advanced workflow monitoring and decision support systems, including reinforcement learning for data optimization, handling uncertainty, and high-dimensional search algorithms for experiments. Connecting these two elements is the development of a multi-facility framework built on a common interoperability layer for autonomous experiment workflows and built on the widely-used Bluesky data collection platform into an accessible toolbox of reusable off-the-shelf components that can be assembled into tailored workflows that cater to specific scientific needs. Collectively these advances are poised to unlock the transformative potential of the facility upgrades by delivering rapid analysis and workflow monitoring algorithms built on a common, interoperable framework to ensure their broad transferability across facilities, instruments, and experiments. Ultimately, these capabilities will significantly enhance experimental output and enable groundbreaking scientific exploration. Facilitating rapid data analysis and autonomous experiment steering capabilities will shed light on some of the most challenging scientific research areas facing the nation including structural biology, materials science, quantum materials, environmental science, nanoscience, nanotechnology, additive manufacturing, and condensed matter physics.
Location
Oak Ridge, TN
Service, Volunteer and Community Work
-
Girl Scout Leader for Troop 61781 in Redwood City, Girl Scouts
Location
Redwood City, CA
Skills and Expertise
All Publications
-
Massive Scale Data Analytics at LCLS-II
E D P SCIENCES. 2024
View details for DOI 10.1051/epjconf/202429513002
View details for Web of Science ID 001244151903006
-
SpeckleNN: a unified embedding for real-time speckle pattern classification in X-ray single-particle imaging with limited labeled examples.
IUCrJ
2023; 10 (Pt 5): 568-578
Abstract
With X-ray free-electron lasers (XFELs), it is possible to determine the three-dimensional structure of noncrystalline nanoscale particles using X-ray single-particle imaging (SPI) techniques at room temperature. Classifying SPI scattering patterns, or `speckles', to extract single-hits that are needed for real-time vetoing and three-dimensional reconstruction poses a challenge for high-data-rate facilities like the European XFEL and LCLS-II-HE. Here, we introduce SpeckleNN, a unified embedding model for real-time speckle pattern classification with limited labeled examples that can scale linearly with dataset size. Trained with twin neural networks, SpeckleNN maps speckle patterns to a unified embedding vector space, where similarity is measured by Euclidean distance. We highlight its few-shot classification capability on new never-seen samples and its robust performance despite having only tens of labels per classification category even in the presence of substantial missing detector areas. Without the need for excessive manual labeling or even a full detector image, our classification method offers a great solution for real-time high-throughput SPI experiments.
View details for DOI 10.1107/S2052252523006115
View details for PubMedID 37458190
View details for PubMedCentralID PMC10478515
-
Testing the data framework for an AI algorithm in preparation for high data rate X-ray facilities
IEEE. 2022: 1-9
View details for DOI 10.1109/XLOOP56614.2022.00006
View details for Web of Science ID 000968746500001
-
fairDMS: Rapid Model Training by Data and Model Reuse
IEEE COMPUTER SOC. 2022: 394-405
View details for DOI 10.1109/CLUSTER51413.2022.00050
View details for Web of Science ID 000920273100035
-
Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval
IEEE COMPUTER SOC. 2021: 15-23
View details for DOI 10.1109/XLOOP54565.2021.00008
View details for Web of Science ID 000758608400003
-
Enabling scientific discovery at next-generation light sources with advanced AI and HPC
17th Smoky Mountains Computational Sciences and Engineering Conference
2020: 145-146
View details for DOI 10.1007/978-3-030-63393-6 10.
-
Data systems for the Linac coherent light source.
Advanced structural and chemical imaging
2017; 3 (1): 3
Abstract
The data systems for X-ray free-electron laser (FEL) experiments at the Linac coherent light source (LCLS) are described. These systems are designed to acquire and to reliably transport shot-by-shot data at a peak throughput of 5 GB/s to the offline data storage where experimental data and the relevant metadata are archived and made available for user analysis. The analysis and monitoring implementation (AMI) and Photon Science ANAlysis (psana) software packages are described. Psana is open source and freely available.
View details for DOI 10.1186/s40679-016-0037-7
View details for PubMedID 28261541
View details for PubMedCentralID PMC5313569
-
Data systems for the Linac Coherent Light Source
JOURNAL OF APPLIED CRYSTALLOGRAPHY
2016; 49: 1363-1369
View details for DOI 10.1107/S1600576716011055
View details for Web of Science ID 000382755900028
-
THE LARGE AREA TELESCOPE ON THE FERMI GAMMA-RAY SPACE TELESCOPE MISSION
ASTROPHYSICAL JOURNAL
2009; 697 (2): 1071-1102
View details for DOI 10.1088/0004-637X/697/2/1071
View details for Web of Science ID 000266159500012
- Search for Baryons in the Radiative Penguin Decay b → sγ Physical Review D 2003; 68 (011102)
- Branching Fraction and Photon Energy Spectrum for b → sγ Physical Review Letters 2001; 87 (251807)