In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
2. Biology, Energy, Environment Computing Materials, Chemistry, Geophysics
Particle Physics, Astrophysics
Largest funder of physical
science research in U.S.
Nuclear Physics Fusion Energy, Plasma Physics
- 2 -
NERSC is the mission HPC and data facility for the U.S
Department of Energy Office of Science
- 2 -
3. NERSC is the mission HPC and data facility for the U.S
Department of Energy Office of Science
Simulations at scale
Data analysis support for
DOE’s experimental and
observational facilities
Largest funder of
physical science
research in U.S.
7,000 Users
800 Projects
700 Codes
~2000 publications per year
- 3 -
4. NERSC Systems: present and future
NERSC-7:
Edison
Multicore
CPU
NERSC-8: Cori
Manycore CPU
NESAP Launched:
transition applications to
advanced architectures
2013
2016
2024
NERSC-9:
CPU and GPU nodes
Continued transition of
applications and support for
complex workflows
2020
NERSC-10:
Exa system
2028
NERSC-11:
Beyond
Moore
- 4 -
6. - 6 -
Supercomputers have super-fast interconnect
between nodes
Cluster cabinet Cori cabinet
7. Supercomputers have specialist storage systems
• Scale out the file system to 100s of
storage servers
• Access FS over high-speed
interconnect: high aggregate
bandwidth
• Global, coherent namespace
– Easy for scientists to use
– Hard to scale up metadata
operations!
- 7 -
Compute Nodes IO Nodes Storage Servers
How do you distribute PBs of data and millions of files to
hundreds of thousands of compute cores, with no latency?
8. Supercomputing usability for experimental science
Easier to:
• Write and run large-scale
parallelized code over
10,000s nodes.
• Read in and write out
huge data files
Harder to:
• Port code directly from your
laptop/cluster.
• Read and write lots of small files.
• Get fast turnaround on your
compute jobs.
• Stream data from external source.
- 8 -
9. Cori: Pre-Exascale System for DOE Science
• Cray XC System
• >9600 68-core Intel KNL compute nodes, >2800 32-core Intel Haswell nodes
• Cray Aries Interconnect
• NVRAM Burst Buffer, 1.6PB of SSDs, 1.7TB/sec I/O
• Lustre file system 28 PB of disk, >700 GB/sec I/O
• Investments to support large scale data analysis
– High bandwidth direct connection between experimental facilities and compute
nodes
– Virtualization capabilities (Shifter/Docker)
– More login nodes for managing advanced workflows
– Support for real time and high-throughput queues
- 9 -
10. Cori: Pre-Exascale System for DOE Science
• Cray XC System
• >9600 68-core Intel KNL compute nodes, >2800 32-core Intel Haswell nodes
• Cray Aries Interconnect
• NVRAM Burst Buffer, 1.6PB of SSDs, 1.7TB/sec I/O
• Lustre file system 28 PB of disk, >700 GB/sec I/O
• Investments to support large scale data analysis
– High bandwidth direct connection between experimental facilities and compute
nodes
– Virtualization capabilities (Shifter/Docker)
– More login nodes for managing advanced workflows
– Support for real time and high-throughput queues
- 10 -
#5 most powerful computer on the planet in Nov 2016.
#12 today.
11. NERSC-9: A System Optimized for Science
- 11 -
• Cray Shasta System providing 3-4x capability of Cori system
• First NERSC system designed to meet needs of both large scale
simulation and data analysis from experimental facilities
– Includes both NVIDIA GPU-accelerated and AMD CPU-only nodes
– Cray Slingshot network for Terabit-rate connections to system
– Optimised data software stack enabling analytics and Machine Learning at scale
– All-flash file system for accelerated IO
12. The needs of experimental facilities drive the design of our
supercomputers
- 12 -
Future
experiments
Experiments
operating now
BioEPIC
14. How Computing impacts experimental science
• Inform Experiments
– Simulations guide instrument design
– Simulations guide experimental methodology
– Real-time feedback guides experimental
operations
• Analyze Data
– Convert measured phenomena into
meaningful statistics
– Compare theory to measurement
GEANT4 ATLAS model
- 14 -
Viz of mouse brain ions
15. How Computing impacts experimental science
• Inform Experiments
– Simulations guide instrument design
– Simulations guide experimental methodology
– Real-time feedback guides experimental
operations
• Analyze Data
– Convert measured phenomena into
meaningful statistics
– Compare theory to measurement
• Replace Hardware
– Why solve a problem in hardware if you can
solve it in software?
GEANT4 ATLAS model
Viz of mouse brain ions
LSST CCD tree
ring effects- 15 -
16. - 16 -
Enabling new discoveries by coupling experimental science with
large scale data analysis and simulations
17. - 17 -
• How does photosynthesis
happen?
• How do drugs dock with
proteins in our cells?
• Why do jet engines fail?
Supercomputing for real-time experiments
Super-intense femtosecond xray pulses, >10PB data, up to 100 PF required for analysis
18. Supercomputing for data analysis
- 18 -A billion proton-proton collisions per second and multi-GB of data per second.
• What is the relationship
between fundamental
particles?
• What is the mechanism that
gives matter mass?
19. Supercomputing for sequencing
• How does the soil
microbiome impact crop
success?
• How did viruses evolve?
• Can we engineer
enzymes for more
effective carbon
fixation?
>170 trillion bases sequenced per year, >7PB of archived data, >100,000 users
20. Supercomputing to enable radical new detectors
20
FPGA-based
readout system
4D scanning transmission electron microscope, >1TB/sec data
• How does the structure
of batteries impact
their performance?
• Can nanocrystals be
used to store carbon
dioxide?
21. Custom computing for scientific data
• Enable experimentation with data
reduction and analysis techniques
– Enable higher frame rates
– Real-time data quality feedback
– New analysis algorithms
- 21 -
22. Supercomputing and Machine Learning
• Scientific data is typically
large and complex
– Harder to find optimal
hyperparameters
– Need lots of prototyping and
model evaluation
• Key metric: time to
scientific insight
– Don’t want to wait for days to train
a single model
– Fast turnaround of ideas and
exploration
- 22 -
→ use supercomputers to scale machine
learning algorithms to multiple nodes
Physics papers on the arXiv with abstracts
containing contain phrase “deep
learning” :
36 in 2016
133 in 2017
335 in 2018
109 in 2019
23. Supercomputing and Machine Learning
• Scientific data is typically
large and complex
– Harder to find optimal
hyperparameters
– Need lots of prototyping and
model evaluation
• Key metric: time to
scientific insight
– Don’t want to wait for days to
train a single model
– Fast turnaround of ideas and
exploration
- 23 -
→ use supercomputers to scale machine
learning algorithms to multiple nodes
24. - 24 -
Determining the fundamental constants of cosmology
https://arxiv.org/abs/1808.04728
25. Determining the fundamental constants of cosmology
• Achieved unprecedented accuracy in cosmological parameter estimation.
• Scaled out to 8192 CPU nodes; 20min training time; 3.5PF sustained performance.
• Largest application of TensorFlow on CPU-based system with fully-synchronous updates.
- 25 - https://arxiv.org/abs/1808.04728
27. • High quality segmentation results obtained for climate data.
• Network scaled out to 4560 Summit nodes (27,360 Volta GPUs).
• 60min training time, 0.99 EF sustained performance in 16-bit precision.
• Largest application of TensorFlow on GPU-based system, first Exascale DL application.
- 27 -
Characterising Extreme Weather in a Changing Climate
https://arxiv.org/abs/1810.01993
29. Exascale Deep Learning is driving innovation in
supercomputing
- 29 -
Open Questions
– How much of the future supercomputer workload will be
machine learning?
• How far will scientists be able to use machine learning?
• Will the interpretability problem ever be solved?
– Can specialist devices be used for other algorithms?
– What does the ideal storage system look like for AI?
30. • Supercomputers play an increasingly important role
in experimental science.
• High performance computing can change the way
scientists form their questions, and open new
possibilities in detector design, experiment
operations and data analysis.
• We need a co-evolution of experimental and
computing techniques to leverage bleeding-
edge technologies for scientific insight.