Add model training with mini-batches by JMorado · Pull Request #59 · chemle/emle-engine

JMorado · 2025-07-23T08:17:02Z

This PR adds support for training EMLE models using mini-batches. As training datasets continue to grow in size, this feature becomes essential to avoid memory issues. Unlike with the QM7 dataset, we can no longer fit everything into memory.

The implementation introduces three flags: --use-minibatch, which enables/disables mini-batch training; --batch-size, which specifies the size of each mini-batch, and --shuffle, which shuffles the training data. By default, training still uses the original full-batch optimization.

lohedges · 2025-07-23T08:26:48Z

Thanks for this. Ignore the test failures. This is because sqm is currently completely broken with recent versions of ambertools. (There are glibc issues, so the package will likely need to be rebuilt.)

…batches

JMorado · 2025-07-23T15:15:29Z

I've added a proof-of-concept implementation to perform the IVM and AEV calculations and to make the training step of valence widths "lazy", i.e. such that batches of masked AEVs are written to disk and loaded on the fly as needed. This is necessary because it is otherwise impossible to load large datasets into memory (the training does not go past the AEV computation, and it's impossible to store the aev_mols tensor in memory). I've been testing this on a dataset with ca. 0.5 M configurations, and it seems like a viable solution so far, although not the most performant. I'm keen to improve the implementation, so any suggestions are welcome!

Add mini-batch training

7c8605b

JMorado added 2 commits July 23, 2025 16:02

Add lazy calculation of AEVs, IVM, and s training using disk-written …

a9cb9d6

…batches

Fix bug preventing training of sqrtk_ref

231c982

JMorado added 5 commits July 23, 2025 18:53

Ruff formatting

d7df56c

Refactor to use DataLoader

c887e02

Add multi GPU training with DDP

5f33d33

Small fixes

e518a63

Add IVM checkpointing and loading

6718d26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model training with mini-batches#59

Add model training with mini-batches#59
JMorado wants to merge 8 commits intochemle:mainfrom
JMorado:feature_batching

JMorado commented Jul 23, 2025

Uh oh!

lohedges commented Jul 23, 2025

Uh oh!

JMorado commented Jul 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JMorado commented Jul 23, 2025

Uh oh!

lohedges commented Jul 23, 2025

Uh oh!

JMorado commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JMorado commented Jul 23, 2025 •

edited

Loading