
Plant fibers contain valuable chemicals that can be used to make biofuels, plastics, medicines, and other products, but separating and purifying them is challenging, especially without using toxic solvents.
Now University of Wisconsin–Madison scientists have used machine learning to streamline the process of finding the best solvents for the job, balancing selectivity, efficiency, and environmental impact.

A collaboration between scientists with the Great Lakes Bioenergy Research Center (GLBRC) and Wisconsin Energy Institute, the process uses Bayesian experimental design, a framework used to make experiments more efficient and informative in uncertain situations.
The Bayesian framework uses statistical models to guess what a design space looks like based on existing knowledge and to decide which areas of the space to explore next. By balancing exploration of unknown areas and exploitation of promising ones, the framework can be used to improve predictive models.
That means instead of testing thousands of mixtures, researchers can instead focus on dozens of the most promising candidates, said Shannon Stahl, a professor of chemistry who led the project with Reid Van Lehn, a professor of chemical and biological engineering.
Van Lehn said the process, which combines computer modeling with lab experimentation, could speed innovation of bioproducts as well as pharmaceuticals.
“We think of this as a flexible technique that could be applied in multiple different contexts,” Van Lehn said. “And this is really just a proof of concept.”
Green solvents for plant-based products
Lignin, one of three main parts of plant cell walls, contains valuable aromatic compounds (molecules with rings of carbon atoms) typically produced from petroleum.
These bioproducts can be used to make plant-based plastics, fibers, pharmaceuticals, and other products, but that requires separating them from other byproducts in the broths used to break down the biomass.
Liquid-liquid extraction is a way to separate and purify chemicals using two liquids that don't mix, like water and oil. When the liquids are stirred or shaken together, the target substance will dissolve in its preferred solvent. Once the liquids separate into layers, the heavier liquid can be drained from the bottom.
The key is finding the right solvent, which is challenging when there’s more than one product.
“The mixture of the bioproducts contains a lot of different chemicals,” said Amy Qin, a senior research specialist with Dow Chemical who collaborated on the project while earning her PhD at UW–Madison. “Some of the chemical properties are very similar to each other, which makes it difficult to separate one from the other.”
Chlorinated solvents are effective at dissolving these products, but they are also toxic, which limits their use in industrial biorefineries.
So the researchers set out to find something less hazardous.

Knowing they would need a blend of green solvents to match the performance versatility of chlorinated solvents, the team identified eight candidates, including water, alcohols, and ethers.
But with nearly infinite possible combinations, traditional trial and error methods weren't practical.
The researchers used Bayesian experimental design to train a computer model to predict a certain property — in this case a measure of a substance’s preference between two solvents.
The framework has three stages.
The first step (design) is to identify a set of solvent mixtures, in this case some combination of eight green solvents. Then researchers test a selection of solvent mixtures the model suggests to get the actual values (observe). Those values are then used to train the model to improve its predictions (learn).
In selecting batches for experimental tests, researchers have to balance two principles: exploration and exploitation.
In the initial exploration rounds, they selected mixtures with high uncertainty.
“I just know very little about how that mixture will perform,” Van Lehn said. “If I do an experiment for that mixture, I’ll learn a lot, and my model will improve substantially.”

As the model becomes more accurate, researchers shift to exploitation, focusing on the mixtures predicted to have the best performance.
For the observation phase, the researchers used a liquid-handling robot that could test 40 samples at a time. But taking advantage of that batch automation presented another challenge: how to choose 40 different mixtures that result in the biggest improvements to the model?
Normally with this type of machine learning, you choose a data point that is expected to maximize improvement. Once the model updates, you choose another point, which is likely very different.
Choosing mixtures one at a time isn’t very efficient, but neither is picking the top 40.
“All 40 points would probably be mixtures that are similar because the model is uncertain of its predictions for some region of the space of possible mixtures,” Vah Lehn said. “(So) it’s unlikely to add much information compared to one mixture.”

To get around this problem, the researchers created a loop in the design phase using a physics-based model known as COSMO-RS to sequentially generate 40 “fantasy samples” that temporarily update the model. COSMO-RS predictions aren’t as accurate as experiments, but they are good enough to identify the mixtures most likely to improve performance.
The inner loop is only used to improve predictions; the model is trained only on experimental data.
A side-by-side comparison showed the new model is far more accurate than COSMO-RS on its own, according to results published in the journal ACS Sustainable Chemistry & Engineering.
GLBRC researchers are now working to apply the framework to other reaction conditions, such as temperature and pressure, and to consider practical parameters like cost and whether a solvent can be easily recovered for reuse.
“Bayesian optimization is not a new concept in academia, but I feel like, especially in those experimental labs or even in industry, not a lot of people use Bayesian optimization to guide either optimization or experimental design process,” Qin said. “We’re just showing this application as a showcase on this complex problem.”