Skip to content

ai-boost/awesome-ai-for-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✨ Awesome AI for Science (AI4Science) ✨

Awesome AI for Science Banner

A curated list of awesome AI tools, libraries, papers, datasets, and frameworks that accelerate scientific discovery across all disciplines.

Deutsch | English | Español | français | 日本語 | 한국어 | Português | Русский | 中文

Awesome License: MIT GitHub stars GitHub forks

AI is revolutionizing scientific research - from drug discovery and materials design to climate modeling and astrophysics. This repository collects the best resources to help researchers leverage AI in their work.

📚 Contents


🧪 AI Tools for Research

Literature & Knowledge Management

  • Semantic Scholar - AI-powered academic search (Allen AI)
  • arXiv - Open-access repository of electronic preprints and postprints
  • OpenAlex - Open catalog of scholarly papers and authors
  • CORE - Aggregator of open access research papers

Data Analysis & Visualization

  • PandasAI - Conversational data analysis using natural language
  • DeepAnalyze - First agentic LLM for autonomous data science with end-to-end pipeline from data to analyst-grade reports
  • AutoViz - Automated data visualization with minimal code
  • Chat2Plot - Secure text-to-visualization through standardized chart specifications

Data Labeling & Annotation

  • Label Studio - Multi-type data labeling and annotation tool
  • Snorkel - Programmatic data labeling and weak supervision

Research Workbench & Plugins

  • Claude Scientific Skills - Comprehensive collection of 125+ ready-to-use scientific skill modules for Claude AI across bioinformatics, cheminformatics, clinical research, ML, and materials science

📄 Paper→Poster / Slides / Graphical Abstract

Poster Generation

  • Paper2Poster - Multi-agent system with Parser-Planner-Painter architecture converting paper.pdf to editable poster.pptx, outperforms GPT-4o with 87% fewer tokens
  • mPLUG-PaperOwl - Multimodal LLM for scientific charts and diagrams understanding/generation

Slides & Presentation Generation

  • Auto-Slides - Multi-agent academic paper to high-quality presentation slides with interactive refinement
  • PPTAgent - Beyond text-to-slides generation with PPTEval multi-dimensional evaluation (EMNLP 2025)
  • paper2slides - Transform arXiv papers into Beamer slides using LLMs
  • PaperToSlides - AI-powered tool that automatically converts academic papers (PDF) into presentation slides
  • pdf2slides - Convert PDF files into editable slides with three lines of code
  • SlideDeck AI - Co-create PowerPoint presentations with Generative AI from documents or topics
  • AI Multi-Agent Presentation Builder - Azure Semantic Kernel multi-agent PPT generation reference

Video & Media Generation

  • Paper2Video - First benchmark for automatic video generation from scientific papers (NeurIPS 2025)
  • paper2video - Transform arXiv research papers into engaging presentations and YouTube-ready videos

Website & Interactive Content Generation

  • Paper2All - AI-powered pipeline converting papers into interactive websites, posters, and multimedia presentations with "Let's Make Your Paper Alive!" philosophy

Chart & Visualization Generation

Note: For comprehensive chart understanding and code generation tools, see 📊 Chart Understanding & Generation section


📊 Chart Understanding & Generation

Chart-to-Code & Reproducibility

Scientific Visualization Tools

  • Chat2Plot - Secure text-to-visualization through standardized chart specifications
  • AutoViz - Automated data visualization with minimal code
  • PlotlyAI - AI-powered data visualization and dashboard creation

🔄 Paper-to-Code & Reproducibility

Automated Code Generation

  • AutoP2C - LLM agent framework generating runnable repositories from academic papers
  • ResearchCodeAgent - Multi-agent system for automated codification of research methodologies
  • ToolMaker - Convert papers with code into callable agent tools

Experiment Automation

  • BioProBench - Comprehensive benchmark for automatic evaluation of LLMs on biological protocols and procedural understanding
  • Alhazen - Extract experimental metadata and protocol information from scientific documents

📋 Scientific Documentation & Parsing

High-Performance Document Processing

  • MinerU (2024/2025) - SOTA multimodal document parsing with 1.2B parameters outperforming GPT-4o, converts PDFs to LLM-ready Markdown/JSON
  • PDF-Extract-Kit (2024) - Comprehensive toolkit for high-quality PDF content extraction with layout detection, formula recognition, and OCR
  • Docling (IBM, AAAI 2025) - Multi-format (PDF/DOCX/PPTX/HTML/Images) → structured data (Markdown/JSON) with layout reconstruction, table/formula recovery
  • Nougat (Meta AI) - Neural optical understanding for academic documents, transforms scientific PDFs to Markdown with mathematical formula support
  • PaddleOCR 3.0 (2024/2025) - Advanced OCR with PP-StructureV3 document parsing, 13% accuracy improvement, supports 80+ languages
  • Unstructured - Production-grade ETL for transforming complex documents into structured formats, with open-source API
  • Marker - High-accuracy PDF→Markdown/JSON/HTML conversion, specialized for tables/formulas/code blocks with benchmark scripts
  • S2ORC doc2json (AllenAI) - Large-scale PDF/LaTeX/JATS parsing to standardized JSON for millions of papers
  • GROBID - Machine learning software for extracting structured metadata from scholarly documents
  • Science-Parse / SPv2 (AllenAI) - Parse scientific papers to structured fields (title/author/sections/references)

Production Pipelines & Data Preparation

Figure & Table Extraction

  • PDFFigures2 - Extract figures, tables, captions, and section titles from scholarly PDFs
  • TableBank - Large-scale table detection and recognition dataset with pre-trained models

Scientific Literature RAG & Analysis

  • PaperQA2 - High-accuracy RAG for scientific PDFs with citation support, agentic RAG, and contradiction detection
  • OpenScholar - Retrieval-augmented LM synthesizing scientific literature from 45M papers with human-expert-level citation accuracy, outperforming GPT-4o by 5% on ScholarQABench (Nature 2026, UW & Ai2)
  • paper-reviewer - Generate comprehensive reviews from arXiv papers and convert to blog posts

🧰 Research Workbench & Plugins

Interactive Research Environments

Literature Management Plugins

Scientific Writing & Collaboration


🕸 Knowledge Extraction & Scholarly KGs

Knowledge Graph Construction

  • iText2KG - Incremental knowledge graph construction using LLMs with entity extraction and Neo4j visualization
  • GraphGen - Knowledge graph-guided synthetic data generation for LLM fine-tuning, achieving strong performance on scientific QA (GPQA-Diamond) and math reasoning (AIME)
  • KoPA - Structure-aware prefix adaptation for integrating LLMs with knowledge graphs (ACM MM 2024)
  • Scholarly KGQA - LLM-powered question answering over scholarly knowledge graphs (ArXiv paper)

Knowledge Graph Resources

  • Awesome-LLM-KG - Comprehensive collection of papers on unifying LLMs and knowledge graphs

🤖 Research Agents & Autonomous Workflows

Autonomous Research Systems (2024-2025 Breakthroughs)

  • The AI Scientist v1 (2024) - First fully autonomous research system: hypothesis→experiment→writing→review simulation
  • The AI Scientist v2 (2025) - Enhanced with Agentic Tree Search, reduced template dependency, first workshop-level accepted paper
  • DeepScientist - First system progressively surpassing human SOTA on frontier AI tasks (183.7%, 1.9%, 7.9% improvements), month-long autonomous discovery with 20,000+ GPU hours
  • Kosmos - Extended autonomy AI scientist with 200 parallel agent rollouts, 42K lines of code execution, 1.5K papers analyzed per run, achieving 79.4% accuracy and 7 scientific discoveries (Edison Scientific)
  • AlphaResearch - Autonomous algorithm discovery combining evolutionary search with peer-review reward models, achieving best-known performance on circle packing problems
  • AI-Researcher - Autonomous pipeline from literature review→hypothesis→algorithm implementation→publication-level writing with Scientist-Bench evaluation
  • Agent Laboratory - Multi-agent workflows for complete research cycles with AgentRxiv for cumulative discovery
  • InternAgent - Closed-loop multi-agent system from hypothesis to verification across 12 scientific tasks, #1 on MLE-Bench (36.44%)
  • freephdlabor - First fully customizable open-source multiagent framework automating complete research lifecycle from idea conception to LaTeX papers with dynamic workflows
  • ToolUniverse - Democratizing AI scientists by transforming any LLM into research systems with 600+ scientific tools (Harvard MIMS)
  • LabClaw - Skill operating layer for biomedical AI agents with 211 production-ready SKILL.md files across 7 domains (biology, pharmacology, medicine, data science, literature search), enabling modular dry-lab reasoning and protocol composition for Stanford LabOS-compatible agents
  • Robin - FutureHouse's end-to-end scientific discovery multi-agent system orchestrating literature search (Crow/Falcon) and data analysis (Finch) agents, first AI-generated drug discovery identifying ripasudil as novel dry AMD therapeutic (2025)
  • Aviary - Language agent gymnasium for challenging scientific tasks including DNA manipulation, literature search, and protein engineering
  • Curie - Automated and rigorous experiments using AI agents for scientific discovery
  • POPPER - Automated hypothesis testing with agentic sequential falsifications
  • autoresearch - Andrej Karpathy's autonomous LLM research framework: AI agent runs overnight experiments on a real training setup, auto-editing code→5min training→evaluation in a loop, ~100 experiments per night on a single GPU
  • UniScientist - Universal scientific research intelligence covering 50+ disciplines, repositioning LLMs as cross-disciplinary generators with human experts as verifiers; 30B model outperforms Claude Opus and GPT on 5 research benchmarks

Evaluation & Benchmarking

  • ScienceAgentBench (ICLR 2025) - 102 executable tasks from 44 peer-reviewed papers across 4 disciplines with containerized evaluation
  • BuildArena - First physics-aligned interactive benchmark for LLM agents in engineering construction, designing rockets/cars/bridges in physics simulator with 3D spatial geometry library
  • SciTrust (2024) - Trustworthiness evaluation framework for scientific LLMs (truthfulness, hallucination, sycophancy)
  • SciCode - Research coding benchmark curated by scientists with 338 subproblems across 16 subdomains (physics, math, materials, biology, chemistry), evaluating LLMs on realistic scientific programming tasks with gold-standard solutions (NeurIPS 2024)
  • SciBench - College-level scientific problem-solving evaluation across multiple domains

Academic Review & Evaluation

  • AgentReview - LLM agents simulating academic peer review ecosystems
  • LLM-Peer-Review - Web application for LLM-assisted manuscript review and annotation

Domain-Specific Research Agents

  • Aletheia - Google DeepMind's autonomous mathematics research agent powered by Gemini Deep Think, autonomously solving 4 open problems from 700 Erdős conjectures and generating complete research papers without human intervention (February 2026)
  • AlphaGeometry - DeepMind's Olympiad-level geometry theorem prover combining neural language model with symbolic deduction engine, AlphaGeometry2 solves 84% of IMO geometry problems (42/50) at gold-medalist level (Nature 2024)
  • Goedel-Prover-V2 - Strongest open-source automated theorem prover in Lean 4, 8B model matches DeepSeek-Prover-V2-671B at 84.6% MiniF2F, 32B model achieves 90.4% with self-correction, using scaffolded data synthesis and verifier-guided proof refinement (Princeton, 2025)
  • BioDiscoveryAgent - AI agent for biological discovery and research automation
  • MOOSE - Large Language Models for automated open-domain scientific hypotheses discovery (ACL 2024, ICML Best Poster)
  • ChemCrow - LLM agents for chemistry research with tool integration
  • Coscientist - Autonomous chemical experiment planning and execution

🏷 Data Labeling & Curation

Weak Supervision & Auto-Labeling

  • Snorkel - Programmatic data labeling and weak supervision for scientific datasets
  • PandasAI - Conversational data analysis and visualization using natural language

⚗ Scientific Machine Learning

Neural Differential Equations

Physics-Informed Neural Networks

  • DeepXDE - Deep learning library for solving PDEs
  • Lang-PINN - LLM-driven multi-agent system that builds trainable PINNs from natural language task descriptions, achieving 3-5 orders of magnitude MSE reduction and 50%+ execution success improvement (ICLR 2026)
  • PINNs - Physics-informed neural networks
  • NVIDIA PhysicsNeMo - Open-source framework for building physics-ML models at scale (renamed from Modulus, 2025)
  • PINA - Physics-Informed Neural networks for Advanced modeling in PyTorch
  • SciANN - Keras-based scientific neural networks
  • NeuralPDE.jl - Physics-informed neural networks in Julia

Neural Operators & Model Discovery

  • DeepONet - Learning nonlinear operators
  • PySINDy - Sparse identification of nonlinear dynamics
  • PySR - High-performance symbolic regression for discovering interpretable scientific equations from data, multi-population evolutionary search with Python/Julia backend, widely used in physics and astronomy (Cambridge, NeurIPS 2023)
  • LLM-SR - Scientific equation discovery and symbolic regression using LLMs, combining code generation with evolutionary search (ICLR 2025 Oral)
  • Fourier Neural Operator - Learning operators in Fourier space

📖 Papers & Reviews

Foundational Papers

📊 Comprehensive Surveys & Reviews (2024-2025)

AI for Scientific Research

Scientific Large Language Models

Scientific Machine Learning

Uncertainty Quantification

Automation & Self-Driving Laboratories

Policy & Strategic Perspectives

  • Artificial Intelligence for Science (CSIRO 2022) - Landmark report analyzing AI adoption across 98% of scientific fields over 60 years
  • AI for Science 2025 (Fudan University & Nature 2025) - Comprehensive report on AI's transformative impact across 7 scientific fields, 28 research directions, and 90+ challenges
  • AI in science evidence review (European Scientific Advice 2024) - Policy-focused evidence review on AI's impact in research

🚀 AI Scientist & Autonomous Research (2024-2025 Breakthroughs)

Recent Advances & Domain Applications

📈 Evaluation & Benchmarking


🔬 Domain-Specific Applications

🧬 Biology & Medicine

Protein & Drug Discovery

  • AlphaFold - Protein structure prediction
  • ColabFold (2025 Updates) - AlphaFold/ESMFold accessible implementation with AF3 JSON export, database updates
  • OpenFold3 - Fully open-source (Apache 2.0) biomolecular structure prediction reproducing AlphaFold3, free for academic and commercial use (Columbia AlQuraishi Lab & OpenFold Consortium, 2025)
  • Protenix - Trainable PyTorch reproduction of AlphaFold 3
  • Chai-1 - Multi-modal foundation model for biomolecular structure prediction (proteins, small molecules, DNA, RNA, glycans) achieving SOTA across benchmarks, with optional MSA/template support (Chai Discovery, 2024)
  • Boltz - First fully open-source model achieving AlphaFold3-level accuracy with 1000x faster binding affinity prediction (MIT)
  • BoltzGen - De novo protein binder design via generative model, achieving nanomolar binding for 66% of novel targets tested (MIT, 2025)
  • xfold - Democratizing AlphaFold3: PyTorch reimplementation to accelerate protein structure prediction research
  • MegaFold - Cross-platform system optimizations for accelerating AlphaFold3 training with 1.73x speedup and 1.23x memory reduction
  • Graphormer - General-purpose deep learning backbone for molecular modeling
  • DiffDock - Diffusion-based molecular docking achieving SOTA blind docking performance, treating ligand pose prediction as generative diffusion over SE(3), with DiffDock-L update for improved generalization (MIT CSAIL, ICLR 2023)
  • targetdiff - 3D Equivariant Diffusion for Target-Aware Molecule Generation (ICLR2023)
  • ReQFlow - Rectified Quaternion Flow for efficient protein backbone generation, 37× faster than RFDiffusion with 0.972 designability (ICML 2025)
  • BioEmu - Microsoft's generative model for sampling protein equilibrium conformations 100,000× faster than MD simulations, predicting domain motions, local unfolding and cryptic binding pockets on a single GPU (Science 2025)
  • ProteinMPNN - Deep learning-based protein sequence design (inverse folding) from backbone structures, achieving 52.4% sequence recovery vs 32.9% for Rosetta, core tool in modern protein design pipelines (Baker Lab, Science 2022)
  • RFdiffusion3 - Latest RFdiffusion for protein structure design with 10× speedup and atom-level precision (December 2025)
  • IgGM - Generative foundation model for functional antibody and nanobody design, supporting de novo generation, affinity maturation, inverse design, structure prediction, and humanization (Tencent AI4S, ICLR 2025)
  • DrugAssist - LLM-based molecular optimization tool
  • mint - Learning the language of protein-protein interactions
  • Mol-Instructions - Large-scale biomolecular instruction dataset for chemistry/biology LLMs (ICLR2024)
  • Uni-Mol - Universal 3D molecular pretraining framework with 209M conformations, scaling to 1.1B parameters (Uni-Mol2) on 800M conformations for molecular property prediction, docking, and quantum chemistry (ICLR 2023, NeurIPS 2024)
  • ChemBERTa - Chemical language model
  • DeepChem - Machine learning for chemistry
  • DeepMol - Unified ML/DL framework for drug discovery workflows, integrating RDKit, DeepChem, and scikit-learn with SHAP explainability
  • RDKit - Cheminformatics toolkit
  • ESM3 - 98B-parameter frontier generative model jointly reasoning over protein sequence, structure, and function, trained on 2.78 billion proteins; generated a novel fluorescent protein (esmGFP) with only 58% sequence identity to known GFPs (EvolutionaryScale, 2024)
  • ESMFold - Protein structure prediction from ESM models

Genomics & Bioinformatics

  • RhoFold+ - End-to-end RNA 3D structure prediction using RNA language model pretrained on 23.7M sequences, outperforming existing methods and human expert groups on RNA-Puzzles and CASP15 (Nature Methods 2024)
  • Evo 2 - Arc Institute's 40B-parameter genome foundation model trained on 9 trillion nucleotides from all domains of life, supporting 1M base pair context for generalist DNA/RNA/protein prediction and design (Nature 2026)
  • LucaOne - Generalized biological foundation model with unified nucleic acid and protein language, integrating DNA/RNA/protein sequences (Nature Machine Intelligence 2025)
  • Geneformer - Single-cell transformer foundation model pretrained on 104M human transcriptomes via masked gene prediction, enabling transfer learning for cell type classification, gene network analysis, and in silico perturbation with limited labeled data (Nature 2023, V2 2024)
  • scFoundation - 100M-parameter foundation model pretrained on 50M+ human single-cell transcriptomes covering ~20,000 genes, achieving SOTA on gene expression enhancement, drug response and perturbation prediction (Nature Methods 2024)
  • scvi-tools - Deep probabilistic framework for single-cell and spatial omics analysis, integrating scVI, scANVI, totalVI and other VAE-based models for batch correction, cell annotation, multi-omics integration, and RNA velocity (scverse/NumFOCUS, Nature Methods 2018/2024)
  • GEARS - Geometric deep learning model predicting transcriptional outcomes of novel single- and multi-gene perturbations using gene–gene knowledge graphs, 40% higher precision than prior methods on combinatorial perturbation prediction (Stanford, Nature Biotechnology 2024)
  • scGPT - Single-cell analysis with transformers
  • Cell2Sentence - Teaching Large Language Models the Language of Biology through single-cell transcriptomics (ICML 2024)
  • ChatSpatial - MCP server enabling spatial transcriptomics analysis via natural language, integrating 60+ methods including SpaGCN, Cell2location, LIANA+, CellRank for Visium, Xenium, MERFISH platforms
  • Enformer - Gene expression prediction
  • DNABERT - DNA sequence analysis
  • scBERT - Single-cell BERT for gene expression
  • GenePT - Generative pre-training for genomics
  • DNA Claude Analysis - Interactive personal genome analysis toolkit using Claude Code and Python. Parses raw genotyping data from consumer DNA services and analyzes SNPs across 17 categories including health risks, pharmacogenomics, ancestry, and nutrition, with a terminal-style HTML dashboard.

Medical AI & Clinical Applications

  • Cellpose - Generalist deep learning algorithm for cell and nucleus segmentation across diverse image types, with human-in-the-loop training (2.0) and one-click image restoration (3.0), 70K+ training objects (Nature Methods 2021/2022/2025)
  • MedSAM - Universal medical image segmentation foundation model trained on 1.57M image-mask pairs across 10 imaging modalities and 30+ cancer types, with MedSAM2 extending to 3D and video segmentation (Nature Communications 2024)
  • MedAgents - Multi-disciplinary collaboration framework for zero-shot medical reasoning using role-playing LLM agents (ACL 2024)
  • MedAgentGym - Scalable agentic training environment for code-centric reasoning in biomedical data science

⚛ Chemistry & Materials

LLM for Chemistry

  • LLM4Chemistry - Curated paper list about LLMs for chemistry covering fine-tuning, reasoning, multi-modal models, agents, and benchmarks (COLING 2025)

Materials Discovery

  • GNoME - DeepMind's graph neural network for materials exploration, discovering 2.2M new crystal structures (380K most stable) equivalent to 800 years of traditional research, with 520K+ materials dataset open-sourced (Nature 2023)
  • FAIRChem (OMat24) - Meta's comprehensive ML ecosystem for materials/chemistry with 118M+ DFT calculations, EquiformerV2 models achieving top Matbench Discovery performance
  • MACE - Machine learning interatomic potentials
  • CHGNet - Universal pretrained neural network potential with charge and magnetic moment awareness, trained on 1.5M+ Materials Project inorganic structures for charge-informed molecular dynamics and phase diagram prediction (Berkeley, Nature Machine Intelligence 2023 Cover)
  • MatterSim - Deep learning atomistic model across elements, temperatures, and pressures
  • Crystal Graph CNNs - Crystal property prediction
  • MatBench - Materials informatics benchmark
  • Best of Atomistic Machine Learning - Curated list of atomistic ML projects for materials science

Chemical Synthesis

  • AiZynthFinder - AstraZeneca's industrial-grade retrosynthetic planning tool using MCTS to recursively decompose molecules into purchasable precursors, with multi-step route scoring and support for custom one-step models (v4.0, 2024)
  • Molecular Transformers - AI for chemical reaction prediction and synthesis planning

🌌 Physics & Astronomy

Machine Learning for Physics

  • FermiNet - DeepMind's neural network for ab-initio quantum chemistry, directly solving the many-electron Schrödinger equation via variational Monte Carlo with antisymmetric wavefunctions, extended to excited states (Phys. Rev. Research 2020, Science 2024)
  • JAX-MD - Molecular dynamics in JAX
  • Neural ODEs - Differential equations with neural networks
  • Physics-Informed Neural Networks - Physics-constrained ML
  • EquiformerV2 - Improved equivariant Transformer for 3D atomic graphs (ICLR2024)
  • Equiformer - Equivariant graph attention Transformer (ICLR2023)

Astronomy & Astrophysics

🌍 Earth & Climate Science

Climate Modeling

  • GenCast - Google DeepMind's diffusion-based ensemble weather forecasting model at 0.25° resolution, outperforming ECMWF ENS on 97.2% of targets up to 15 days ahead, with open-source code and weights (Nature 2024)
  • Aurora - Microsoft's foundation model for the Earth system supporting weather, air pollution, and ocean wave forecasting at multiple resolutions, trained on 1M+ hours of diverse atmospheric data (Nature 2025)
  • ClimaX - First foundation model for weather and climate by Microsoft, Vision Transformer-based architecture trained on heterogeneous datasets (ICML 2023)
  • NeuralGCM - Google Research's hybrid ML/physics atmospheric model combining learned dynamics with physical constraints, outperforming traditional models on 2-15 day forecasts and 40-year climate simulation, developed with ECMWF (Nature 2024)
  • NVIDIA Earth-2 - World's first fully open, accelerated weather AI software stack with Medium Range forecasting and Nowcasting models using generative AI (January 2026)
  • Pangu-Weather - Huawei's 3D high-resolution global weather forecast model at 0.25° resolution, first AI method to comprehensively outperform traditional NWP across all variables and lead times, integrated into ECMWF operational forecasts (Nature 2023)
  • Prithvi WxC - IBM-NASA open-source 2.3B parameter weather and climate foundation model trained on 160 MERRA-2 variables, runs on desktop with fine-tuned variants for climate downscaling and gravity wave parameterization
  • ClimateBench - Climate data benchmark for ML models
  • WeatherBench - Weather prediction benchmark
  • WeatherGFT - Physics-AI hybrid modeling for fine-grained weather forecasting (NeurIPS'24)
  • Awesome Large Weather Models - Curated list of large weather models for AI Earth science
  • TerraTorch - Python toolkit for fine-tuning geospatial foundation models
  • Earth-Agent - LLM agent framework for Earth Observation with 104 specialized tools across 5 functional kits
  • AI for Earth - Microsoft's environmental AI

🌾 Agriculture & Ecology

Agricultural AI

  • PlantNet - Plant identification using AI and citizen science
  • AgML - Agricultural machine learning platform

Ecological Modeling


🤖 Foundation Models for Science

General Science Models

  • Galactica - Large language model for science
  • Llemma - Open language model for mathematics (7B/34B) trained on Proof-Pile-2, outperforming Minerva at equal scale on MATH benchmark, with tool use and formal theorem proving in Lean without finetuning (EleutherAI, ICLR 2024)
  • MinervaAI - Mathematical reasoning
  • PaLM-2 - Scientific reasoning capabilities

Domain-Specific Models

  • ESM - Protein language models
  • BioNeMo Framework - NVIDIA's open-source platform for building and adapting biological AI models at scale, bundling ESM-2, Geneformer, MolMIM and DNA embedding models with recipes for single-GPU to multi-node training (2025)
  • ChemGPT - Chemistry-focused language model
  • BioGPT - Biomedical text generation

📈 Datasets & Benchmarks

Multidisciplinary

Biology & Medicine

  • TDC - Therapeutics Data Commons: 66 AI-ready datasets across 22 drug discovery tasks with 29 leaderboards, covering target identification, molecular generation, ADMET prediction, and clinical trial outcomes (Harvard MIMS, NeurIPS 2021/2024)
  • Protein Data Bank - Protein structures
  • ChEMBL - Chemical bioactivity data
  • Human Protein Atlas - Protein expression data
  • Chinese Medical Dataset - Comprehensive collection of Chinese medical datasets for AI research

Chemistry & Materials

Physics


💻 Computing Frameworks

Machine Learning

  • PyTorch - Deep learning framework
  • JAX - High-performance ML research
  • TensorFlow - End-to-end ML platform

Scientific Computing

Scientific Machine Learning Frameworks

  • SciML - Scientific machine learning ecosystem
  • DifferentialEquations.jl - Multi-language suite for high-performance differential equation solving and scientific machine learning (3.0k+ stars)
  • ModelingToolkit.jl - Acausal modeling framework for automatically parallelized scientific machine learning (1.5k+ stars)
  • SciMLBenchmarks.jl - Scientific machine learning benchmarks & differential equation solvers
  • NeuralPDE.jl - Physics-informed neural networks (PINNs) for solving partial differential equations (1.1k+ stars)
  • DiffEqFlux.jl - Neural ordinary differential equations with O(1) backprop and GPU support (900+ stars)
  • Optimization.jl - Unified interface for local, global, gradient-based and derivative-free optimization (800+ stars)
  • PaddleScience - SDK & library for AI-driven scientific computing applications
  • Flux.jl - Machine learning in Julia

Specialized Frameworks

  • MDAnalysis - Molecular dynamics analysis
  • MDtrajNet - Neural network foundation model that directly generates MD trajectories bypassing force calculations, accelerating simulations by up to 100× with equivariant Transformer architecture (2025)
  • ASE - Atomic Simulation Environment for materials modeling
  • PyMC - Probabilistic programming
  • OpenMM - High-performance molecular simulation toolkit

🎓 Educational Resources

Courses & Tutorials

Open Access Educational Materials

📋 Paper Collections & Repositories

YouTube Channels


🏛 Research Communities

Conferences

Organizations

Online Communities


📚 Related Awesome Lists

This project builds upon and complements several excellent resources:

🎯 Specialized Collections

📊 Paper & Research Collections

🌟 Key Insights from These Collections

  • Current Focus: Shift from tool-level assistance to autonomous scientific agents
  • Emerging Trends: Multi-modal scientific models, self-improving research systems
  • Research Gaps: Evaluation frameworks, ethical governance, human-AI collaboration
  • Future Directions: Fully autonomous discovery cycles, robotic lab integration

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

How to Contribute

  1. Fork this repository
  2. Add your resource in the appropriate section
  3. Ensure the format matches existing entries
  4. Submit a pull request with a clear description

Contribution Guidelines

  • Ensure the resource is actively maintained
  • Include a brief, clear description
  • Check for duplicates before adding
  • Use proper markdown formatting

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

Special thanks to all researchers and developers pushing the boundaries of AI for Science. This list is inspired by the awesome community and the transformative potential of AI in scientific discovery.

Star ⭐ this repository if you find it helpful!


Last updated: January 2026 - Enhanced with 2025-2026 breakthroughs in autonomous research, equation discovery, and scientific foundation models

About

A curated list of awesome AI tools, libraries, papers, datasets, and frameworks that accelerate scientific discovery — from physics and chemistry to biology, materials, and beyond.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors