Primary Menu

Education, Events, Publication

Funding & Recognition

Balanced Data Aggregation for Materials Informatics via Acquisition Functions

Semester: Summer 2024


Presentation description

The success of a learning model heavily depends on the quality of data used to train it. In materials informatics, available datasets are insufficient in size and thus their aggregation is often required. However, when put to practice, larger, combined datasets have produced models that underperform compared to their single dataset counterparts. While somewhat counter-intuitive, the direct addition of datasets results in severe imbalance, model overfitting, and general noise. To overcome this saturation effect, we will employ an acquisition function to evaluate individual data points and incorporate only those that provide new information.
This study acts as a follow-up to the previous data aggregation research conducted by Ottomano et al. in which three machine learning aggregation techniques were tested: simple concatenation, element-focused concatenation, and the DiSCoVeR algorithm. Pursuing a Bayesian optimization approach, which uses acquisition functions to prioritize the exploration and exploitation of data points, should return a more balanced aggregation that can be used to create more accurate models in the future. This is especially relevant to materials informatics as most data sets are high dimensional and limited in size. We expect the inclusion of an acquisition function will greatly enhance the aggregation model's performance with cleaner, more balanced data.

Presenter Name: Layla Purdy
Presentation Type: Poster
Presentation Format: In Person
Presentation #1
College: Engineering
School / Department: Materials Science and Engineering
Research Mentor: Taylor Sparks
Time: 11:00 AM
Physical Location or Zoom link:

Dumke