Process development of novel transformations is often a laborious and complex task. This is mostly due to the difficulties in identifying the underlying chemical and physical parameters that affect the process objective(s), quantifying the nonlinear interactions between them, and lack of data.
When optimising for continuous variables only, conventional and Bayesian optimisation-based DoE have been demonstrated. However, when expanded into discrete variables including solvents, reagents, or ligands, these approaches often struggle with the curse of dimensionality and inefficiencies (in predictive accuracy) of black-box optimisation algorithms.
A common approach to overcome this problem has been demonstrated using molecular descriptors to map the discrete variables onto a continuous space. However, existing approaches often: (1) utilise large number of variables / dimensions (15+) to parameterise solvents, (2) start the optimisation from scratch (i.e., choice of descriptors are not validated to be the most relevant), and (3) do not demonstrate a holistic approach (i.e., keep other reaction variables fixed).
Parameterisation of discrete variables with relevant descriptors could increase the predictive accuracy of surrogate models to unlock new mechanistic insights and to enhance robust and green process development. Demonstrated on three case studies, we address these challenges using a generalisable workflow, by (1) extraction of relevant literature data, (2) identifying and quantifying relevant reaction parameters using supervised ML to generate a priori model, (3) an efficient DoE for sampling from the discrete space using unsupervised ML. Using an (4) automated flow setup, this approach is then integrated into a (5) Bayesian optimisation algorithm to demonstrate a holistic workflow on optimisation of discrete and continuous variables for multiple objectives on a pharmaceutically relevant photoredox amine synthesis.
When optimising for continuous variables only, conventional and Bayesian optimisation-based DoE have been demonstrated. However, when expanded into discrete variables including solvents, reagents, or ligands, these approaches often struggle with the curse of dimensionality and inefficiencies (in predictive accuracy) of black-box optimisation algorithms.
A common approach to overcome this problem has been demonstrated using molecular descriptors to map the discrete variables onto a continuous space. However, existing approaches often: (1) utilise large number of variables / dimensions (15+) to parameterise solvents, (2) start the optimisation from scratch (i.e., choice of descriptors are not validated to be the most relevant), and (3) do not demonstrate a holistic approach (i.e., keep other reaction variables fixed).
Parameterisation of discrete variables with relevant descriptors could increase the predictive accuracy of surrogate models to unlock new mechanistic insights and to enhance robust and green process development. Demonstrated on three case studies, we address these challenges using a generalisable workflow, by (1) extraction of relevant literature data, (2) identifying and quantifying relevant reaction parameters using supervised ML to generate a priori model, (3) an efficient DoE for sampling from the discrete space using unsupervised ML. Using an (4) automated flow setup, this approach is then integrated into a (5) Bayesian optimisation algorithm to demonstrate a holistic workflow on optimisation of discrete and continuous variables for multiple objectives on a pharmaceutically relevant photoredox amine synthesis.