A Tectonic Shift in Analytics and Computing Is Coming Artificial intelligence

Artificial intelligence combined with high-performance computing could trigger a fundamental change in how geoscientists extract knowledge from large volumes of data.

More than 50 years ago, a fundamental scientific revolution occurred, sparked by the concurrent More than 50 years ago, a fundamental scientific revolution occurred, sparked by the concurrent emergence of a huge amount of new data on seafloor bathymetry and profound intellectual insights from emergence of a huge amount of new data on seafloor bathymetry and profound intellectual insights from researchers rethinking conventional wisdom. Data and insight combined to produce the researchers rethinking conventional wisdom. Data and insight combined to produce the Data and insight combined to produce the paradigm of paradigm of paradigm of plate tectonics plate tectonics plate tectonics (https://eos.org/features/meeting-gave-birth-idea-global-tectonics) (https://eos.org/features/meeting-gave-birth-idea-global-tectonics). Similarly, in the coming decade, a . Similarly, in the coming decade, a .
Already today, geoscientists must understand modern tools of data analytics and the hardware on which Already today, geoscientists must understand modern tools of data analytics and the hardware on which they work. Now AI and HPC, along with they work. Now AI and HPC, along with cloud computing cloud computing (https://eos.org/science-updates/putting-the-cloud-to-(https://eos.org/science-updates/putting-the-cloud-towork-for-seismology) work-for-seismology) and interactive programming languages, are becoming essential tools for geoscientists. and interactive programming languages, are becoming essential tools for geoscientists.
Here we discuss the current state of AI and HPC in Earth science and anticipate future trends that will Here we discuss the current state of AI and HPC in Earth science and anticipate future trends that will shape applications of these developing technologies in the field. We also propose that it is time to rethink shape applications of these developing technologies in the field. We also propose that it is time to rethink graduate and professional education to account for and capitalize on these quickly emerging tools. graduate and professional education to account for and capitalize on these quickly emerging tools.

Work in Progress Work in Progress
Great strides in AI capabilities, including speech and facial recognition, have been made over the past Great strides in AI capabilities, including speech and facial recognition, have been made over the past , and it was generally believed at the time that and it was generally believed at the time that artificial speech recognition was just around the corner. We know now that this was not the case, as artificial speech recognition was just around the corner. We know now that this was not the case, as today's speech and writing recognition capabilities emerged only as a result of both vastly increased today's speech and writing recognition capabilities emerged only as a result of both vastly increased computing power and conceptual breakthroughs such as the use of multilayered neural networks, which computing power and conceptual breakthroughs such as the use of multilayered neural networks, which mimic the biological structure of the brain. mimic the biological structure of the brain.
These and other advances are striking, yet AI and many other artificial computing tools are still in their These and other advances are striking, yet AI and many other artificial computing tools are still in their infancy.
We cannot predict what AI will be able to do 20-30 years from now, but a survey of existing AI infancy.
We cannot predict what AI will be able to do 20-30 years from now, but a survey of existing AI applications recently showed that applications recently showed that computing power is the key computing power is the key (https://www.wired.com/story/prepare-artificial-(https://www.wired.com/story/prepare-artificialintelligence-produce-less-wizardry/) intelligence-produce-less-wizardry/) when targeting practical applications today. The fact that AI is still in its when targeting practical applications today.  networks-the-eli5-way-3bd2b1164a53) (CNNs), a kind of neural network that adaptively learns which features to (CNNs), a kind of neural network that adaptively learns which features to look at in a data set. In seismology (Figure 1), pattern recognition is the most common application of look at in a data set. In seismology (Figure 1), pattern recognition is the most common application of  New AI applications and technologies are also emerging; these involve, for example, the self-ordering of New AI applications and technologies are also emerging; these involve, for example, the self-ordering of shown impressive capabilities in modeling complex natural signals, with the most promising applications shown impressive capabilities in modeling complex natural signals, with the most promising applications in autoencoders and GANs (e.g., for generating images from data). in autoencoders and GANs (e.g., for generating images from data).

Data-Centric Geosciences Data-Centric Geosciences
CNNs are a form of CNNs are a form of supervised machine learning supervised machine learning (https://blogs.nvidia.com/blog/2018/08/02/supervised-(https://blogs.nvidia.com/blog/2018/08/02/supervisedunsupervised-learning/) unsupervised-learning/) (SML), meaning that before they are applied for their intended use, they are first (SML), meaning that before they are applied for their intended use, they are first trained to find prespecified patterns in labeled data sets and to check their accuracy against an answer trained to find prespecified patterns in labeled data sets and to check their accuracy against an answer key. key. Training a neural network Training a neural network (https://eos.org/opinions/artificial-intelligence-may-be-key-to-better-weather-forecasts) (https://eos.org/opinions/artificial-intelligence-may-be-key-to-better-weather-forecasts) using SML requires large, well-labeled data sets as well as massive computing power. Massive computing using SML requires large, well-labeled data sets as well as massive computing power. change-problem/?sh=34caaed66b43) and causing a large and and causing a large and growing carbon footprint growing carbon footprint (https://eos.org/opinions/earth-system-modeling-must-become-more-energy-efficient) (https://eos.org/opinions/earth-system-modeling-must-become-more-energy-efficient). .
AI is starting to improve the efficiency of geophysical sensors: Some sensors use AI to detect when AI is starting to improve the efficiency of geophysical sensors: Some sensors use AI to detect when "interesting" data are recorded, and these data are selectively stored. "interesting" data are recorded, and these data are selectively stored.
In the future, the trend in geoscientific applications of AI might shift from using bigger CNNs to using In the future, the trend in geoscientific applications of AI might shift from using bigger CNNs to using more scalable algorithms that can improve performance with less training data and fewer computing more scalable algorithms that can improve performance with less training data and fewer computing resources. Alternative strategies will likely involve less energy-intensive neural networks, such as resources. Alternative strategies will likely involve less energy-intensive neural networks, such as spiking spiking neural networks neural networks (https://towardsdatascience.com/spiking-neural-networks-the-next-generation-of-machine-learning-(https://towardsdatascience.com/spiking-neural-networks-the-next-generation-of-machine-learning-84e167f4eb2b) 84e167f4eb2b), which reduce data inputs by analyzing discrete events rather than continuous data streams. , which reduce data inputs by analyzing discrete events rather than continuous data streams.
Unsupervised ML (UML), in which an algorithm Unsupervised ML (UML), in which an algorithm identifies patterns identifies patterns (https://eos.org/editors-vox/deep-learning-a-(https://eos.org/editors-vox/deep-learning-anext-generation-big-data-approach-for-hydrology) next-generation-big-data-approach-for-hydrology) on its own rather than searching for a user-specified pattern, is on its own rather than searching for a user-specified pattern, is another alternative to data-hungry SML. One type of UML identifies unique features in a data set to another alternative to data-hungry SML. One type of UML identifies unique features in a data set to allow users to discover anomalies of interest (e.g., evidence of hidden geothermal resources in seismic allow users to discover anomalies of interest (e.g., evidence of hidden geothermal resources in seismic data) and to distinguish trends of interest (e.g., rapidly versus slowly declining production from oil and data) and to distinguish trends of interest (e.g., rapidly versus slowly declining production from oil and gas wells based on gas wells based on production rate transients production rate transients (https://ihsmarkit.com/research-analysis/a-study-of-rate-transient- AI is also starting to improve the efficiency of geophysical sensors. Data storage limitations require AI is also starting to improve the efficiency of geophysical sensors. Data storage limitations require instruments such as seismic stations, acoustic sensors, infrared cameras, and remote sensors to record instruments such as seismic stations, acoustic sensors, infrared cameras, and remote sensors to record and save data sets that are much smaller than the total amount of data they measure. Some sensors use and save data sets that are much smaller than the total amount of data they measure. Some sensors use AI to detect when "interesting" data are recorded, and these data are selectively stored. Sensor-based AI AI to detect when "interesting" data are recorded, and these data are selectively stored. Sensor-based AI algorithms also help minimize energy consumption by and prolong the life of sensors located in remote algorithms also help minimize energy consumption by and prolong the life of sensors located in remote regions, which are difficult to service and often powered by a single solar panel. These techniques include regions, which are difficult to service and often powered by a single solar panel.

Advances in Computing Architectures Advances in Computing Architectures
Powerful, efficient algorithms and software represent only one part of the data revolution; the hardware Powerful, efficient algorithms and software represent only one part of the data revolution; the hardware and networks that we use to process and store data have evolved significantly as well. and networks that we use to process and store data have evolved significantly as well.
Since about 2004, when the increase in frequencies at which processors operate stalled at about 3 Since about 2004, when the increase in frequencies at which processors operate stalled at about 3 gigahertz (the end of Moore's law), computing power has been augmented by increasing the number of gigahertz (the end of Moore's law), computing power has been augmented by increasing the number of cores per CPU and by the parallel work of cores in multiple CPUs, as in cores per CPU and by the parallel work of cores in multiple CPUs, as in computing clusters computing clusters 10ff41f50e78), which were developed specifically for matrix-based operations, excel at the most demanding , which were developed specifically for matrix-based operations, excel at the most demanding tasks of most neural network algorithms. In the future, computers will likely become increasingly tasks of most neural network algorithms. In the future, computers will likely become increasingly heterogeneous, with a single system combining several types of processors, including specialized ML heterogeneous, with a single system combining several types of processors, including specialized ML coprocessors (e.g., coprocessors (e.g., Cerebras Cerebras (https://cerebras.net/product/) (https://cerebras.net/product/)) and quantum computing processors. ) and quantum computing processors.
Computational systems that are physically distributed across remote locations and used on demand, Computational systems that are physically distributed across remote locations and used on demand, usually called cloud computing, are also becoming more common, although these systems impose usually called cloud computing, are also becoming more common, although these systems impose limitations on the code that can be run on them. For example, cloud infrastructures, in contrast to limitations on the code that can be run on them. For example, cloud infrastructures, in contrast to centralized HPC clusters and supercomputers, are not designed for performing large-scale parallel centralized HPC clusters and supercomputers, are not designed for performing large-scale parallel simulations. Cloud infrastructures face limitations on high-throughput interconnectivity, and the simulations. Cloud infrastructures face limitations on high-throughput interconnectivity, and the synchronization needed to help multiple computing nodes coordinate tasks is substantially more difficult synchronization needed to help multiple computing nodes coordinate tasks is substantially more difficult to achieve for physically remote clusters. Although several cloud-based computing providers are now to achieve for physically remote clusters. Although several cloud-based computing providers are now investing in high-throughput interconnectivity, the problem of synchronization will likely remain for the investing in high-throughput interconnectivity, the problem of synchronization will likely remain for the foreseeable future. foreseeable future.

6/22/2021
A Tectonic Shift in Analytics and Computing Is Coming -Eos https://eos.org/science-updates/a-tectonic-shift-in-analytics-and-computing-is-coming#:~:text=Artificial intelligence combined with high,from large volumes of data.… 6/10 AI has proven invaluable in discovering and analyzing patterns in large, real-world data sets. It could AI has proven invaluable in discovering and analyzing patterns in large, real-world data sets. It could also become a source of realistic artificial data sets. also become a source of realistic artificial data sets.
Artificial intelligence has proven invaluable in discovering and analyzing patterns in large, real-world Artificial intelligence has proven invaluable in discovering and analyzing patterns in large, real-world data sets. It could also become a source of realistic artificial data sets, generated through models and data sets. It could also become a source of realistic artificial data sets, generated through models and simulations. Artificial data sets enable geophysicists to examine problems that are unwieldy or simulations. Artificial data sets enable geophysicists to examine problems that are unwieldy or intractable using real-world data-because these data may be too costly or technically demanding to intractable using real-world data-because these data may be too costly or technically demanding to obtain-and to explore what-if scenarios or interconnected physical phenomena in isolation. For obtain-and to explore what-if scenarios or interconnected physical phenomena in isolation. For example, simulations could generate artificial data to help study seismic wave propagation; large-scale example, simulations could generate artificial data to help study seismic wave propagation; large-scale geodynamics; or flows of water, oil, and carbon dioxide through rock formations to assist in energy geodynamics; or flows of water, oil, and carbon dioxide through rock formations to assist in energy extraction and storage. extraction and storage.
HPC and cloud computing will help produce and run 3D models, not only assisting in improved HPC and cloud computing will help produce and run 3D models, not only assisting in improved visualization of natural processes but also allowing for investigation of processes that can't be adequately visualization of natural processes but also allowing for investigation of processes that can't be adequately studied with 2D modeling. In geodynamics, for example, using 2D modeling makes it difficult to studied with 2D modeling. In geodynamics, for example, using 2D   Adding an additional dimension to a model can require a significant increase in the amount of data Adding an additional dimension to a model can require a significant increase in the amount of data processed. For example, in exploration seismology, going from a 2D to a 3D simulation involves a processed. For example, in exploration seismology, going from a 2D to a 3D simulation involves a transition from requiring three-dimensional data (i.e., source, receiver, time) to five-dimensional data transition from requiring three-dimensional data (i.e., source, receiver, time) to five-dimensional data (source x, source y, receiver x, receiver y, and time [e.g., (source

Emerging Methods and Enhancing Education Emerging Methods and Enhancing Education
Interactive programming and language-agnostic programming environments are young techniques that Interactive programming and language-agnostic programming environments are young techniques that will facilitate introducing computing to geoscientists. will facilitate introducing computing to geoscientists.
As far as we've come in developing AI for uses in geoscientific research, there is plenty of room for As far as we've come in developing AI for uses in geoscientific research, there is plenty of room for growth in the algorithms and computing infrastructure already mentioned, as well as in other developing growth in the algorithms and computing infrastructure already mentioned, as well as in other developing technologies. For example, interactive programming, in which the programmer develops new code while technologies. For example, interactive programming, in which the programmer develops new code while a program is active, and language-agnostic programming environments that can run code in a variety of a program is active, and language-agnostic programming environments that can run code in a variety of languages are young techniques that will facilitate introducing computing to geoscientists. languages are young techniques that will facilitate introducing computing to geoscientists. may become darlings of major funding opportunities may become darlings of major funding opportunities (https://fcw.com/articles/2020/02/28/energy-randd-ostp-(https://fcw.com/articles/2020/02/28/energy-randd-ostpcongress.aspx) congress.aspx), offering the means for ambitious geophysicists to pursue fundamental research. , offering the means for ambitious geophysicists to pursue fundamental research.
Taking advantage of these new capabilities will, of course, require geoscientists who know how to use Taking advantage of these new capabilities will, of course, require geoscientists who know how to use them. Today, many geoscientists face enormous pressure to requalify themselves for a rapidly changing them. Today, many geoscientists face enormous pressure to requalify themselves for a rapidly changing job market and to keep pace with the growing complexity of computational technologies. Academia, job market and to keep pace with the growing complexity of computational technologies. Academia, meanwhile, faces the demanding task of designing innovative training to help students and others adapt meanwhile, faces the demanding task of designing innovative training to help students and others adapt to market conditions, although finding professionals who can teach these courses is challenging because to market conditions, although finding professionals who can teach these courses is challenging because they are in high demand in the private sector. However, such teaching opportunities could provide a they are in high demand in the private sector. However, such teaching opportunities could provide a The coming decade will see a rapid revolution in data analytics that will significantly affect the processing The coming decade will see a rapid revolution in data analytics that will significantly affect the processing and flow of information in the geosciences. Artificial intelligence and high-performance computing are and flow of information in the geosciences. Artificial intelligence and high-performance computing are the two central elements shaping this new landscape. Students and professionals in the geosciences will the two central elements shaping this new landscape. Students and professionals in the geosciences will need new forms of education enabling them to rapidly learn the modern tools of data analytics and need new forms of education enabling them to rapidly learn the modern tools of data analytics and predictive modeling. If done well, the concurrence of these new tools and a workforce primed to predictive modeling. If done well, the concurrence of these new tools and a workforce primed to capitalize on them could lead to new paradigm-shifting insights that, much as the plate tectonic capitalize on them could lead to new paradigm-shifting insights that, much as the plate tectonic revolution did, help us address major geoscientific questions in the future. revolution did, help us address major geoscientific questions in the future.