Materials science is changing fast with the rise of artificial intelligence (AI) and machine learning (ML). These tools are transforming how we discover, design, and optimize new materials to tackle the big challenges in clean energy and sustainable manufacturing, advanced electronics, and biomedicine.
However, getting the most out of AI in materials research requires more than just fancy algorithms and big data. It requires a robust, standardized infrastructure to access, share, and integrate materials data across different sources and domains. Without standards, researchers face big barriers to training accurate, generalizable models and getting their results into the real world.
Here, we will look at the importance of data standards for AI-driven materials discovery, with a focus on the new Open Databases Integration for Materials Design (OPTIMADE) initiative. We will cover the challenges of materials data exchange, the OPTIMADE API features and benefits, and real-world examples of how this standard is already changing materials research. Finally, we will look at the future of OPTIMADE and what it could mean for innovation in new materials.
The Challenges of Materials Data Exchange
To understand the importance of data standards in materials science, you need to understand the challenges researchers face in accessing and integrating data from different sources.
Materials data has been scattered across a fragmented landscape of databases, each with its own data schema, API, and access protocols. This lack of interoperability is a big barrier for researchers who want to build machine-learning models or do large-scale data mining.
Take, for example, a materials scientist who wants to discover new battery materials. To train a predictive model, they would need to gather data on a wide range of known battery compounds, their crystal structures, electrochemical properties, and synthesis conditions.
However, this data is likely to be spread across multiple databases, each with its own way of representing and serving the information.
To get the relevant data, the researcher would need to:
- Write custom code to query each database”s API
- Navigate their unique schema
- Clean and merge the results into a consistent format.
This is time-consuming, error-prone, and requires technical expertise outside the researcher’s core domain.
Dr Julia Ling, a materials informatics scientist at Lawrence Berkeley National Laboratory, has experienced this firsthand. She says:
“In my work, I often need to integrate data from multiple databases to build comprehensive training sets for my machine learning models. But the lack of standardization across these databases is a big problem. I can spend weeks just writing data processing scripts before I can even start training my models.”
The problem is made worse by the fact that many materials databases are locked away in individual research groups or institutions, so outside researchers can’t even find, let alone access potentially valuable data. This lack of visibility and accessibility is holding back science and causing unnecessary duplication of effort.
Dr. Bryce Meredig, co-founder and Chief Science Officer of Citrine Informatics, says:
“The current state of materials data is a mess. It’s scattered, heterogeneous, and often poorly documented. This makes it impossible to use this data effectively, especially for machine learning.”
The Need for Community Standards
To overcome these challenges and get the most out of AI in materials research the community needs a common set of standards and protocols for data exchange. These standards should allow researchers to access and integrate data from different sources in a consistent, machine readable format without having to navigate each individual database’s complexities.
These standards must be developed and adopted by the community in an open and collaborative way. They can’t be imposed top-down by any single institution or database provider. They must emerge from a process of consensus and iteration with input from a wide range of stakeholders across academia, industry, and government.
The benefits are clear. By providing a common language and framework for materials data exchange, they can reduce the barriers to data access and integration and allow researchers to spend more time on science and less time on data wrangling. Additionally, they can enable a rich ecosystem of interoperable tools and services, ranging from data visualization and analysis platforms to automated discovery pipelines and knowledge bases.
Dr Kristin Persson, director of the Materials Project at Lawrence Berkeley National Laboratory, says community standards are key to getting the most out of AI in materials science. She added:
“By agreeing on a common set of principles and protocols for data exchange, we can open up a whole new level of collaboration and innovation in materials research. It’s not just about making data more accessible but about enabling new science that was impossible before.”
The Rise of OPTIMADE
Seeing the need for community standards in materials data exchange, a group of leading materials databases and software providers came together in 2016 to launch the Open Databases Integration for Materials Design (OPTIMADE) initiative.
The goal of OPTIMADE is to develop a common API specification for querying and retrieving data from materials databases in a standardized, machine-readable format. By providing a single interface to many databases, OPTIMADE will make it easier for researchers to access and integrate materials data into their workflows regardless of the database or software they are using.
The OPTIMADE specification is based on RESTful web design using standard HTTP protocols and JSON data formats to enable communication between databases and client applications. It defines a set of common endpoints and query parameters that databases can implement to expose their data in a standardized, self-describing way.
For instance, a client application can send a simple HTTP GET request to an OPTIMADE-compliant database with the query parameters in a standardized format to search for materials containing iron and oxygen.
The database server then translates this into its own query language, executes the search, and returns the results in JSON. The client application can then parse and process those results using standard tools and libraries without knowing the underlying database schema or implementation details.
OPTIMADE in Action
Since 2019, OPTIMADE has been adopted by many materials databases and software tools.
One example is the Materials Project, a popular database of computed materials properties hosted by Lawrence Berkeley National Laboratory. In 2020, the Materials Project team implemented an OPTIMADE API so users could access its vast dataset using standard query parameters and response formats.
According to Dr Shyam Dwaraknath, the lead database architect:
“The Materials Project’s OPTIMADE API has been a game changer for our users. It has enabled a whole new ecosystem of tools and integrations that make it easier than ever to access and analyze our data from Jupyter notebooks and web applications to high throughput screening pipelines.”
NOMAD Archive, a repository for raw data from high-throughput materials simulations, is another early adopter of OPTIMADE. By exposing its data through an OPTIMADE API, NOMAD has enabled researchers to do large-scale data mining and train machine learning models on a huge dataset of computed properties.
According to Dr Luca Ghiringhelli, group leader at the Fritz Haber Institute and AI in materials science enthusiast:
“We are seeing a real surge of interest in data-driven materials research, and OPTIMADE is playing a key role in this. By providing a single interface to multiple databases, it is lowering the barriers to data access and integration and helping to democratize the field.”
Real-World Applications
The impact of OPTIMADE is already being seen across many materials research areas, from batteries and renewable energy to aerospace and biomedical engineering. Here are a few examples of how this is happening:
#1. Finding high-performance thermoelectrics: Researchers at Northwestern University used OPTIMADE to combine data from multiple computational databases, including the Materials Project and OQMD, to train a machine-learning model for predicting the thermoelectric properties of new materials. Using this dataset, they were able to find several new compounds with potentially record-breaking performance, which are now being synthesized and tested.
#2. High throughput screening of 2D materials: A team at the Technical University of Denmark used OPTIMADE to screen more than 50,000 computed 2D materials from the Computational 2D Materials Database (C2DB). By querying the database using OPTIMADE filters, they were able to quickly find materials with specific properties, such as high carrier mobility or low band gap, for next-generation electronics and optoelectronics.
#3. The rapid development of new battery materials: Researchers at MIT and Stanford University used OPTIMADE to build a centralized database of battery materials properties, combining data from the Materials Project, OQMD, and other sources. They trained a series of machine learning models on this dataset to predict key performance metrics, such as capacity and cyclability, for new lithium-ion battery chemistries. These models are now being used to guide experimental efforts to develop safer, longer-lasting, and more energy-dense batteries for electric vehicles and grid storage.
#4. Design of high entropy alloys: A team at the University of Maryland used OPTIMADE to combine data from multiple computational and experimental databases, including the Materials Project, OQMD, and the High-Entropy Alloys Database (THEAD), to build a dataset of high entropy alloy properties. They used this dataset to train a machine learning model to predict the formation energies and phase stabilities of new high entropy alloy compositions. They were able to screen thousands of candidates and find the most promising ones to be experimentally validated. This work is helping to accelerate the development of next-generation high entropy alloys with exceptional strength, toughness, and corrosion resistance for aerospace, defense, and beyond.
Now, let’s look at what companies can benefit the most from establishing these standards.
#1. Tesla (TSLA)
Tesla, Inc. will greatly benefit from OPTIMADE’s standardized data exchange, which will enhance its ability to develop better battery technologies and optimize materials in its manufacturing processes. This will help Tesla create batteries with higher energy density, longer life cycles, and improved safety features while also reducing costs and improving sustainability.
Financially speaking, in 2023, Tesla reported revenue of $96.8 billion, a 19% increase from the previous year, showcasing their strong financial growth and potential for continued innovation.
#2. Intel Corporation (INTC)
Another company that will benefit substantially from OPTIMADE’s standardized data exchange is Intel Corporation (INTC), a leader in the technology and semiconductor sectors. Leveraging AI and standardized materials data, Intel can discover and design new semiconductor materials, leading to the development of chips with better performance, higher efficiency, and new functionalities.
This will help Intel maintain its position at the forefront of semiconductor innovation. Moreover, integrating data across various databases will streamline Intel’s research and development processes, allowing for more focus on innovation and less on data management.
On the financial side, Intel reported a revenue of $54.2 billion in 2023, reflecting the company’s substantial role in the industry and its ongoing potential for growth and development.
The Future of OPTIMADE
As OPTIMADE is being adopted more and more, the materials science community is exploring new frontiers of data integration and discovery. One area of development is the integration of OPTIMADE with other data standards and ontologies, such as the European Materials Modelling Ontology (EMMO) and the Crystallographic Information Framework (CIF).
Aligning these different standards and semantics will allow researchers to ask even more powerful and complex questions across multiple data sources, lengths, time scales, and domains of materials science.
Another area of focus for future research is the development of more advanced and automated tools for materials data analysis and machine learning. The rise of deep learning techniques such as graph neural networks and transformer architectures signals a need for both standardized and scalable ways to represent and process materials data in these models.
OPTIMADE is well placed to play a key role in this space as it can provide a common interface to access and integrate large, diverse datasets of materials properties and structures. As Dr. Matthias Scheffler, director of the Fritz Haber Institute and a pioneer in computational materials science, says:
“OPTIMADE is not just about making data more accessible, it’s about enabling new paradigms for materials discovery and design. By providing a foundation for data-driven and AI-enabled materials research, we are helping bring in a new era of innovation and discovery.”
Looking further ahead, there is also interest in using OPTIMADE to enable more decentralized and collaborative models of data sharing and discovery of materials. For example, some researchers are exploring the use of blockchain to create secure, distributed networks of OPTIMADE databases where data can be shared and queried across multiple institutions and domains.
Others are looking at federated learning to train machine learning models on decentralized datasets without the need to centralize or harmonize the data. By allowing researchers at companies like Matgenix and Data Science OÜ to collaborate and share insights across institutional boundaries while still controlling their own data and IP, these approaches could accelerate the pace of materials discovery and innovation.
Click here to learn why artificial intelligence is a billion-dollar play for Cisco Systems.
Concluding Thoughts
AI and data-driven techniques in materials science are changing the way we discover, design, and deploy new materials. But to fully realize these approaches, we need a robust, standardized infrastructure to access and integrate data across multiple sources and domains.
The OPTIMADE API is a key enabler for this by providing a common language and protocol to query and retrieve material data in a machine-readable format. By reducing the barriers to data access and integration, OPTIMADE is making materials research more democratic and accelerating innovation.
As OPTIMADE is being adopted more and more and new tools and techniques for data-driven materials discovery emerge, we can expect even more to come in the future. From new battery materials and high-performance alloys to customized drugs and functional nanomaterials, the possibilities are endless.
But to realize this, we need sustained investment and collaboration across the materials science community, as well as open data, open standards, and open science. Only by working together across disciplinary and institutional boundaries can we hope to unleash the full power of AI and data-driven discovery in materials science.
As Dr. Gerbrand Ceder, professor of materials science at UC Berkeley and computational materials design pioneer, says:
“The future is bright, but we need to change the way we think about data and collaboration. By using open standards like OPTIMADE and working together as a community to share knowledge, we can accelerate innovation and solve some of the biggest problems we face today.”
Overall, the adoption of standards like OPTIMADE will revolutionize materials science by streamlining data integration, enhancing collaboration, and driving rapid innovation across multiple industries.
Click here to learn all about investing in artificial intelligence (AI).