Resources large scale Computer Vision pretraining

ericJ · January 30, 2026, 5:59pm

Hi,

has someone experience with large-scale pre-training of Masked Autoencoders similar to the FAIR SAMs and can point me to resources how to do this best in 2D (and in the long term also in 3D) microscope data?

I am particular interested in

the minimal number of (unlabeled) training samples such that the model does not overfit
whether this can be pulled of on 100 Quadro RTX 6000 connected via 100G infiniband in a reasonable amount of time
there are nicely formatted code repositories I can get some inspirations from to scale to larger data distributed training settings

I started to play around with the original MAE repository and it got it to run a year ago. Then other priorities arises and I now realized that the repo is archived.

Are there better ways of doing this available? I.e. I read that JEPA would require less compute for pretraining, but has someone outside FAIR tried this?

Best wishes,
Eric