Skip to content

"Workspace" monorepo plan #1773

@ryan-williams

Description

@ryan-williams

(Versioned write-up here, synced via ghpr)

Status

  • #1690 (step 1): Init workspace, move marin to lib/marin/
  • #1723 (step 2): Ingest Levanter as lib/levanter/
  • 🚧 Step 3: Ingest Haliax as lib/haliax/
flowchart TB
    subgraph " "
        experiments["<b>experiments</b><br/><small>step 1</small>"]
        data_browser["<b>data_browser</b><br/><small>independent</small>"]
    end

    subgraph "lib/"
        marin["<b>marin ✅</b><br/><small>step 1</small>"]
        levanter["<b>levanter ✅</b><br/><small>step 2</small>"]
        haliax["<b>haliax 🚧</b><br/><small>step 3</small>"]
        thalas["<b>thalas 📋</b><br/><small>step 4</small>"]
        zephyr["<b>zephyr ✅</b><br/><small><a href='https://github.com/marin-community/marin/pull/1646'>#1646</a></small>"]
    end

    experiments --> marin
    experiments --> levanter
    experiments --> haliax
    experiments --> zephyr

    marin --> levanter
    marin --> zephyr

    levanter --> haliax

    style experiments fill:#d4edda,color:#000
    style marin fill:#d4edda,color:#000
    style levanter fill:#d4edda,color:#000
    style zephyr fill:#d4edda,color:#000
    style haliax fill:#fff3cd,color:#000
    style thalas fill:#f8d7da,color:#000
    style data_browser fill:#e2e3e5,color:#000

    classDef completed fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
    classDef inProgress fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000
    classDef planned fill:#f8d7da,stroke:#dc3545,stroke-width:2px,stroke-dasharray: 5 5,color:#000
    classDef independent fill:#e2e3e5,stroke:#6c757d,stroke-width:2px,color:#000
Loading

Legend:

  • ✅ Completed & merged
  • 🚧 In progress
  • 📋 Planned

(data_browser stays independent, not a workspace member)

Problem

Marin and Levanter repos contain components that depend on one another in ways that the current repo split doesn't reflect well, and makes awkward for co-development.

Proposed solution: uv workspaces

"Workspaces" provide a way to colocate distinct libraries in one repo, such that they can be published and depended on independently (by external users), but naturally depend on each others' HEAD commits (and can easily be updated in lockstep, during common internal / co-development cases).

Implementation Plan

Below is a rough sequence of steps to get there, with the goal of minimizing disruption along the way.

Workspace migration scripts provide hermetic replay of the steps below on top of arbitrary Marin/Levanter main commits, which helps avoid conflicts while developing, and is more legible for review than the huge PR patches it generates.

Step 1: init workspace, marin member (#1690)

 marin/
   pyproject.toml  # Workspace root (experiments/ become workspace root member)
   experiments/    # Becomes part of workspace root member
-  src/            # Move to lib/marin/
+  lib/
+    marin/
+      pyproject.toml
+      src/

Note: data_browser stays independent (separate deps/venv, excluded from workspace).

Step 2: Levanter member (#1723)

 marin/
   pyproject.toml
   experiments/
   lib/
     marin/
       pyproject.toml
       src/
+    levanter/
+      pyproject.toml
+      src/

Additional notes:

  • This will require namespacing GHA .ymls with levanter- and marin- prefixes, to distinguish them.
  • We'll also want to path-restrict GHAs to only run on relevant changes.

Step 3: Haliax member

 marin/
   pyproject.toml
   experiments/
   lib/
+    haliax/
+      pyproject.toml
+      src/
     levanter/
       pyproject.toml
       src/
     marin/
       pyproject.toml
       src/

Step 4: "Thalas" (executor) member

Thalas was an attempt at factoring Marin's executor code out as a separate library (and repo).

The new plan is to make it a workspace member in the new workspace repo, instead:

 marin/
   pyproject.toml
   experiments/
   lib/
     haliax/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
     marin/
       pyproject.toml
       src/
+    thalas/
+      pyproject.toml
+      src/

Step Omega: ray_tpu, rl, marin-core, marin-crawl, experiments packages

 marin/
   pyproject.toml
   experiments/
+    hero_runs/
+      pyproject.toml
+      expXXX_tootsie8b.py
+    compel/
+      pyproject.toml
+      expXXX_compel_v0.py
   lib/
-    marin/
-      pyproject.toml
-      src/
+    marin-core/
+      pyproject.toml
+      src/
     haliax/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
+    marin-crawl/
+      pyproject.toml
+      src/
+    ray_tpu/
+      pyproject.toml
+      src/
+    rl/
+      pyproject.toml
+      src/
     thalas/
       pyproject.toml
       src/

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions