An increasingly popular concept in lab automation is that of a “self-driving lab.” This is a term for a platform that can run many rounds of experiments while using the data it generates during each round to inform the next experiment, essentially a closed-loop feedback control system for doing science. In such a closed-loop platform, the effort to set up each new round of experiments is significantly reduced or eliminated by using robots or other automated machines, and experimental parameters are determined for each successive iteration using statistical algorithms that learn from the previously generated data.
Closed-loop automation is a very valuable design pattern because there are many scientific fields where the parameter space of possible experiments is extremely high relative to the number of parameter combinations that yield desirable results. A platform that can automate a single round of an experiment may still suffer from high operational costs if humans are needed to set up each new round. This can make or break the success of an experimental pipeline in cases where a lab is burning limited resources to achieve a “hit” over many experimental cycles. Closed-loop platforms increase the number of experiments that can be run by a lab by minimizing the operational cost of human effort, in addition to other classic advantages of automation such as scalability, precision, consistency, and uptime.
There are a few principles to keep in mind for someone building a self-driving lab. The first is that closed-loop automation trades capital for operational costs. Creating a platform where all of the necessary reagents and materials can be handled autonomously often involves a pretty sophisticated array of electromechanical components that have to be integrated programmatically into the lab control system. Most lab instruments, like centrifuges, are manual by default, and automated versions of these are usually much more expensive. Encapsulating the experiments required for a pipeline into a set of automated hardware can be far more expensive or technically challenging than one might suppose at first glance. Think of designing a self-driving lab as solving for the tradeoff between hardware cost and experimental breadth.
Another consideration is that the range of a self-driving lab will be determined by its finite capacity for storing and handling different reagents. For example, if you want to build a self-driving lab for doing drug discovery, you either have to load it with a predetermined set of drug candidates or integrate some apparatus to do synthesis and purification. Even in the latter case, you probably won’t be able to synthesize every possible molecule, so the search space is going to be fundamentally limited with respect to reagent inputs. A limited set of parameters or reagent inputs that are easily adjusted (such as component concentrations in a mixture) can be a much easier way to address a large search space than relying on a high diversity of reagents themselves.
Similar to the constraints of a finite set of reagents, a closed-loop platform is also fundamentally limited by being able to run a finite set of experiments based on its hardware capabilities. Whereas a human scientist running a manual pipeline might take note of an interesting data point and run a follow-up experiment outside the scope of the pipeline, this is less feasible in the context of a closed-loop platform. The sample of interest may have already been discarded or subject to downstream processing in a way that makes the desired information impossible to recover. There would also be a challenge with incorporating any data produced by off-platform experiments into the algorithms that are responsible for iterative experimental design. The closed-loop nature of a self-driving lab means that much control is taken away from human scientists, which may have unforeseen and regrettable consequences.
The last thing I’ll point out that can be a challenge for self-driving labs is the nature of the algorithms responsible for iteratively designing new experiments. This is probably where a self-driving lab differs the most from manual experiments or traditional automated platforms. One of the biggest differences from manual experimentation is that self-driving labs are usually optimizing for a pre-defined reward function. There isn’t a magic bullet algorithm or model that works in every case, although Bayesian optimization is a very common feature. There is an inherent difficulty of choosing the right model to apply because model design itself is often done using a trial-and-error approach in other areas of machine learning. For a platform running time-consuming experiments that require expensive reagents, it may be costly to burn many rounds of experiments on meta-optimization of the model itself. Using simulation tools to model the state space of the domain being explored can probably mitigate some of the risk that a model does not converge within a feasible number of iterations.
Overall, the domains that make the most sense for a self-driving lab are ones where you can explore a very broad search space with a limited set of equipment and relatively inexpensive reagents, and where the domain of interest is feasible to model computationally. Experiments that can use liquid-handling robots are highly suitable because these robots can work with literally thousands of different samples and reagents at a time to create combinatorically vast mixtures of varying compositions. Other devices that have been used for such diverse combinatorial mixing are powder dispensers and microfluidic pumps. Measurement types like imaging, spectrometry, flow cytometry, and sequencing are all potential methods of obtaining data that have been used in closed-loop systems that each have their own tradeoffs, such as data richness, flexibility, or feasibility of automating.
A lot of the design space for self-driving labs is related to what problem to apply them to in the first place. Some of the most promising areas that have been explored using self-driving labs have been in the field of materials engineering, especially around technologies like photovoltaics, batteries, and nanoparticles. I think this is because of the essentially combinatorial nature of many materials engineering problems coupled with our ability to use first-principles models like density functional theory to estimate material properties. Spectroscopy and microscopic imaging are also very well suited to automatically measuring properties of novel materials. Another field where there has been success in using self-driving labs has been organic synthesis, especially relating to the modular Chemputer platform invented by Lee Cronin. There have been fewer examples of self-driving labs in biology, although there have been cases for protein engineering and cell culture. I think this is largely due to the much greater complexity of biological systems and the difficulty of obtaining measurements that give us a complete picture of what’s happening inside a cell. It’s possible to take automated measurements that give us a pretty complete picture of the structure and properties of a perovskite crystal or organic molecule, but measurements of cells can only illuminate a tiny fraction of the dynamics occurring at any moment. This makes it far more challenging for the optimization algorithms determining experimental parameters to make decisions about what conditions are likely to yield good results.
Purely biochemical platforms such as for protein engineering have a much smaller state space to model than for platforms exploring cell biology, so they are probably more viable for autonomous exploration using self-driving labs at the moment. A self-driving lab for cell biology would probably have to either massively scale up experimental throughput, settle for a relatively low number of parameter inputs over which to optimize, or build off of some qualitative design improvement like leveraging LLM-based literature and database searches to inform Bayesian priors. Still, the tremendous possible benefits of creating such a platform make me optimistic that it could eventually be achieved.
Further reading
If you want to read more about self-driving labs and closed-loop automation, check out this paper (which heavily informed this post) and other work from Sergei Kalinin or Mahshid Ahmadi’s lab. This paper by Maxim Ziatdinov from Sergei’s lab is a great example of a self-driving lab for materials engineering, and this paper by Jacob Rapp from Philip Romero’s lab on a self-driving lab for protein engineering is a great biochemistry-specific example. Lee Cronin's work on building a modular system for autonomous organic synthesis is also very exciting. There’s been a ton of other awesome papers published in the last few years on this subject so it was hard for me to narrow down what to include, but these are the ones that I mostly had in mind when writing this post.
References
Hypothesis Learning in Automated Experiment: Application to Combinatorial Materials Libraries
Self-driving laboratories to autonomously navigate the protein fitness landscape
Robotic search for optimal cell culture in regenerative medicine
From Sunlight to Solutions: Closing the Loop on Halide Perovskites
Closed-loop optimization of nanoparticle synthesis enabled by robotics and machine learning
Organic synthesis in a modular robotic system driven by a chemical programming language
Nice write-up, an exciting space for sure!
I agree - on the biology side - biopolymer(protein/RNA/DNA) engineering should see traction in this first since the methods don't change when building highly diverse molecules.
I could see traction on the cell biology side with the new types CRISPR and related perturbations becoming more routine (eg. perturb-seq), but it will definitely be challenging.
Nice article!
There is a company called SES that recently unveiled batteries using electrolyte optimized by AI. While they don’t seem to be using self-driving laboratories for these discoveries, it’s another example of increased automation in research.
I see such automation feasible for repetitive tasks of data acquisition, however a large aspect of innovation is serendipity. Think of the discovery of microwave or penicillin. It is such accidental discoveries that make giant leaps in innovation. Do you think that automating laboratories would eliminate such accidental discoveries?