Back to Home

MolFun

Open-source framework for fine-tuning protein structure prediction models

MolFun Hero

MolFun is an open-source framework for fine-tuning and adapting pre-trained protein structure prediction models like OpenFold to specific downstream tasks. Instead of training from scratch, researchers can leverage existing model knowledge while customizing for their particular molecular problems.

It offers four fine-tuning strategies — from lightweight Head Only (~50K params) to Full fine-tuning (~93M params), with LoRA and Partial options in between — each optimized for different dataset sizes and computational budgets.

Beyond fine-tuning, MolFun provides built-in experiment tracking with native integrations for WandB, Comet, MLflow, Langfuse, and HuggingFace — supporting composite tracking to multiple backends simultaneously.

The framework provides:

  • Four fine-tuning strategies (Head Only, LoRA, Partial, Full) for adapting protein structure prediction models to specific downstream tasks.
  • Modular architecture with swappable components via a registry system — attention mechanisms, block types, structure modules, and embedders.
  • Experiment tracking with native integrations for WandB, Comet, MLflow, Langfuse, and HuggingFace with composite multi-backend support.
  • Complete pipeline: data loading, MSA handling, featurization, and model export to ONNX, TorchScript, and HuggingFace Hub.
  • Experiment tracking with native integrations for WandB, Comet, MLflow, Langfuse, and HuggingFace.

MolFun makes protein model adaptation accessible and efficient, enabling researchers to fine-tune state-of-the-art models without building pipelines from scratch.

By combining modular fine-tuning strategies with comprehensive experiment tracking, MolFun bridges the gap between pre-trained models and real-world protein engineering tasks.

A framework where protein model adaptation is modular, trackable, and reproducible.

Features

Comprehensive fine-tuning and acceleration tools for protein structure prediction

Four Fine-Tuning Strategies

Head Only (~50K params), LoRA (~600K params), Partial (~5M params), and Full (~93M params) — each optimized for different dataset sizes, from under 100 samples to over 10K.

Modular Architecture

Registry-based system for swappable components: attention mechanisms (standard, Flash, linear, gated), block types (Evoformer, Pairformer), structure modules (IPA, diffusion), and embedders.

Experiment Tracking

Native integrations with WandB, Comet, MLflow, Langfuse, and HuggingFace. Supports composite tracking to multiple backends simultaneously for full reproducibility.

Complete ML Pipeline

End-to-end workflow: data loading, MSA handling, featurization, trajectory analysis, and model export to ONNX, TorchScript, and HuggingFace Hub.

Experiment Tracking

Native integrations with WandB, Comet, MLflow, Langfuse, and HuggingFace for seamless experiment monitoring and reproducibility.

Seamless PyTorch Integration

Built on PyTorch 2.x with native compatibility. Drop-in replacements for standard operations with automatic GPU acceleration and zero-copy tensor handling.