Loading [MathJax]/extensions/Safe.js
SciPost logo

SciPost Submission Page

Bayesian Illumination: Inference and Quality-Diversity Accelerate Generative Molecular Models

by Jonas Verhellen

This Submission thread is now published as

Submission summary

Authors (as registered SciPost users): Jonas Verhellen
Submission information
Preprint Link: scipost_202504_00051v1  (pdf)
Code repository: https://github.com/Jonas-Verhellen/Bayesian-Illumination
Data repository: https://zenodo.org/records/13899605
Date accepted: May 12, 2025
Date submitted: April 30, 2025, 3:04 p.m.
Submitted by: Verhellen, Jonas
Submitted to: SciPost Chemistry
Ontological classification
Academic field: Chemistry
Specialties:
  • Artificial Intelligence
  • Theoretical and Computational Chemistry
Approaches: Theoretical, Computational

Abstract

In recent years, there have been considerable academic and industrial research efforts to develop novel generative models for high-performing, small molecules. Traditional, rules-based algorithms such as genetic algorithms [Jensen, Chem. Sci., 2019, 12, 3567-3572] have, however, been shown to rival deep learning approaches in terms of both efficiency and potency. In previous work, we showed that the addition of a quality-diversity archive to a genetic algorithm resolves stagnation issues and substantially increases search efficiency [Verhellen, Chem. Sci., 2020, 42, 11485-11491]. In this work, we expand on these insights and leverage the availability of bespoke kernels for small molecules [Griffiths, Adv. Neural. Inf. Process. Syst., 2024, 36] to integrate Bayesian optimisation into the quality-diversity process. This novel generative model, which we call Bayesian Illumination, produces a larger diversity of high-performing molecules than standard quality-diversity optimisation methods. In addition, we show that Bayesian Illumination further improves search efficiency com- pared to previous generative models for small molecules, including deep learning approaches, genetic algorithms, and standard quality-diversity methods.

Author comments upon resubmission

We are grateful for the opportunity to revise and resubmit our manuscript entitled "Bayesian Illumination: Inference and Quality-Diversity Accelerate Generative Molecular Models" . We thank you and the reviewers for your thorough and thoughtful evaluations. The feedback we received was both constructive and insightful, and we have made several important revisions to improve the clarity, rigor, and reproducibility of our work. Below, we summarise the response to the reviewers’ comments:

Reviewer 1:

Reference Formatting: We carefully reviewed references 14, 19, 24, 70, 88, and 98 to resolve formatting issues and improved the overall citation style.

Data and Code Accessibility: We improved the structure of our GitHub repository to better link figures in the manuscript to the corresponding code and data. A dedicated folder now contains the scripts and processed data used to generate key figures, and this is clearly referenced in the top-level README.md.

Reproducibility: We created a permanent, versioned GitHub release (v1.0-paper-submission) that corresponds exactly to the version used in the manuscript. This release has been archived via Zenodo and is now cited in the revised manuscript.

Reviewer 2:

Method Naming Clarification: We clarified the rationale behind the name “graph-based Bayesian illumination,” emphasizing its alignment with related methods and the role of genetic algorithms, which are indeed central to our approach.

Generalization to Other Domains: While the manuscript focuses on small molecules, we acknowledged the reviewer’s suggestion and added discussion of future work exploring broader chemical domains such as macrocycles and modular materials.

Kernel Choice: We elaborated on our use of the Tanimoto kernel, noting its empirical effectiveness with binary molecular fingerprints and discussing the possibility of other kernel functions, which are a valuable avenue for future study.

Clarification of Multi-Property Optimization: We clarified our use of the term "multi-property optimization" to refer specifically to a fixed scalar function combining multiple molecular properties, rather than multi-objective (Pareto-based) optimization.

SELFIES Performance: We expanded our discussion of SELFIES and provided context for their underperformance in our setting, consistent with other benchmarks in the field, while recognizing their strengths in specific auto-encoder contexts.

Improved Citations: We revised in-text citations to include article titles where appropriate and added several new references to better situate our work within the relevant literature.

We have also submitted a detailed, point-by-point response for each of the reviewers. Thank you again for your time and consideration. We hope that the revised manuscript meets the expectations of the journal and look forward to your feedback.

List of changes

List of Changes

  • Three recommended references were incorporated into the manuscript to and situate our work more clearly within the existing literature: Machine Learning and the Future of Bayesian Computation (https://doi.org/10.1021/acscentsci.0c00026), Augmenting genetic algorithms with machine learning for inverse molecular design (https://doi.org/10.1039/D4SC02934H), and Deep Evolutionary Learning for Molecular Design (https://doi.org/10.1109/MCI.2022.3155308).
  • The citation style was updated for the entirety of the revised manuscript from a minimal, conference-style format to a complete bibliographic format including full author lists, article titles, journal names, volume/issue numbers, page ranges, and publication years
  • A permanent GitHub release (v1.0-paper-submission) was created to represent the exact version of the code used in this work. This release has been archived via Zenodo and is now mentioned in the Data Availability section of the revised manuscript.

Published as SciPost Chem. 4, 001 (2025)

Login to report or comment