Skip to content

dobidu/layered_music_gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

237 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

musicgen — synthetic music dataset generator

Open In Colab — demo Open In Colab — neural Binder

A Python library and CLI for generating reproducible, fully-annotated synthetic music datasets for ML/MIR research. Each sample is a complete training example: mixed audio, per-layer stems, per-layer MIDI, and a rich JSON annotation with every musical and synthesis parameter.

Suitable for training models that learn music tagging, source separation, beat/tempo/downbeat detection, and audio→MIDI transcription at the 1k–10k sample scale.


Versions

Version What shipped
v0.8 Soundfont license audit (default SF2 → FluidR3_GM MIT); sharded layout --shard-width 3 for 100k+ datasets; SF2 pool expansion (GeneralUserGS/MuseScoreGeneral/SGM-V2); rock genre; beat-pattern coverage for all time sigs across all genres; chord_type_hard_filter set for classical/pop; measures_per_part_override; create_genre.py wizard
v0.7 Dataset export (musicgen export/stats); quality pipeline (musicgen score/filter); eval CLI (musicgen eval reliability/validity); neural tests
v0.6 Asset downloader — musicgen download-assets bootstraps SF2 soundfonts and MIDI corpora from open sources; assets.toml registry with checksums and license metadata
v0.5 ML-assisted generators — LSTM chord/melody models trained on self-generated corpus; extract-sequences + train CLI
v0.4 Sample composition — real audio samples alongside/substituting FluidSynth layers; musicality standalone package
v0.3 Higher-order Markov — 2nd-order chords, two-layer quality gate, calibration harness
v0.2 Genre system — 8 built-in genres, GenreSpec composition engine, extended chord vocabulary
v0.1 Initial release — single-sample API, parallel batch, full CLI, determinism contract

Quick start

# 1. Clone and install
git clone https://github.com/dobidu/layered_music_gen.git
cd layered_music_gen
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

# 2. Install FluidSynth (system dependency)
sudo apt-get install fluidsynth      # Ubuntu/Debian
# brew install fluidsynth            # macOS

# 3. Download default soundfonts  ← required before generating (FluidR3_GM, MIT, ~141 MB)
musicgen download-assets --sf2

# 4. Generate a dataset
musicgen generate --seed 42 --count 10 --out ./dataset

# 5. Explore the output
ls dataset/000000/
# mix.wav  sample.json  stems/  midi/

Each sample directory contains:

dataset/
├── manifest.jsonl
└── 000000/
    ├── sample.json      # full annotation (written last — completion sentinel)
    ├── mix.wav          # final mixed audio
    ├── stems/           # post-FX per-layer WAV stems
    │   ├── beat.wav
    │   ├── melody.wav
    │   ├── harmony.wav
    │   └── bassline.wav
    └── midi/            # per-layer MIDI (concatenated across all song parts)
        ├── beat.mid
        ├── melody.mid
        ├── harmony.mid
        └── bassline.mid

For 100k+ samples use --shard-width 3 (see Sharded layout).


Installation

Base (required)

pip install -e .

Requires Python ≥ 3.10 and FluidSynth on PATH:

sudo apt-get install fluidsynth   # Ubuntu/Debian
brew install fluidsynth           # macOS
# Windows: https://www.fluidsynth.org/

Optional extras

pip install -e '.[samples]'   # real audio sample composition (v0.4)
pip install -e '.[neural]'    # LSTM chord/melody backends (v0.5)
pip install -e '.[dev]'       # test suite

Asset management (v0.6)

musicgen ships an asset registry (assets.toml) and a downloader that bootstraps soundfonts and MIDI corpora from free/open sources. No manual file hunting.

Soundfonts

# Download default SF2 (FluidR3_GM, 141 MB, MIT — placed in all sf/<layer>/ dirs)
musicgen download-assets --sf2

# List all available sources
musicgen download-assets --list

# Download a specific opt-in source by name
musicgen download-assets --name GeneralUserGS    # ~31 MB, melody/harmony layers
musicgen download-assets --name TimGM6mb         # ~5.7 MB, all layers (GPL-2.0, see note)

# Re-download (overwrite)
musicgen download-assets --sf2 --force

Default SF2 sources (included in --sf2):

Name Size Layers License Dataset-safe?
FluidR3_GM ~141 MB all MIT Yes

Opt-in SF2 sources (use --name):

Name Size Layers License Dataset-safe?
GeneralUserGS ~31 MB all custom-permissive Yes
MuseScoreGeneral ~206 MB all MIT Yes
SGM-V2 ~236 MB all CC-BY 3.0 Yes — with attribution
TimGM6mb 5.7 MB all GPL-2.0 Only if dataset is GPL-compatible

GPL notice: Audio rendered with TimGM6mb is a derivative work under GPL-2.0. Do not use it when building datasets intended for public distribution unless your dataset license is GPL-compatible. Use FluidR3_GM (default, MIT) instead. See LICENSES.soundfonts.md for full details.

MIDI corpora

Downloaded MIDI files land in midi_assets/<layer>/ and can be indexed with musicgen index-midi.

# Download default MIDI corpora (GrooveMIDI + FreeMidiChords)
musicgen download-assets --midi

# Opt-in large corpus
musicgen download-assets --name LakhCleanMIDI    # 223 MB, CC-BY-4.0

Default MIDI sources (included in --midi):

Name Files Layers License
GrooveMIDI 1,150 drum files beat CC-BY-4.0
FreeMidiChords 11,400 progressions harmony MIT

Opt-in MIDI sources:

Name Size Layers License
FannonChords ~400 chord shapes harmony MIT
FannonScales ~300 scale patterns melody MIT
LakhCleanMIDI 17k multi-track melody, harmony, bassline CC-BY-4.0

Auto-download

When auto_download_sf2 = true (default), musicgen silently downloads default SF2 sources on the first generate call if any layer pool is empty. Disable with:

export MUSICGEN_AUTO_DOWNLOAD_SF2=false

Adding custom sources

Edit assets.toml. Fill url= and run sha256sum <file> to get the checksum:

[[sf2]]
name         = "MySF2"
description  = "My custom soundfont"
url          = "https://example.com/my.sf2"
sha256       = "abc123..."          # sha256sum my.sf2
filename     = "my.sf2"
size_hint_mb = 20
license      = "MIT"
license_url  = "https://example.com/LICENSE"
layers       = ["melody", "harmony"]
default      = false

CLI reference

musicgen generate

musicgen generate --seed SEED [options]
Option Default Description
--seed / -s (required) Global RNG seed.
--count / -n 1 Number of samples to generate.
--out / -o ./dataset Output directory.
--workers / -w all cores Parallel workers.
--output-mode / -m full full | mix-only | stems-only | midi-only
--genre / -g None Genre preset (repeatable for composition).
--genres-dir <repo>/genres Custom genres directory.
--min-musicality-score Via Config; see Musicality scoring.
--verbose / -v DEBUG logging.
--quiet / -q ERROR-only logging.

Sample composition options (requires musicgen[samples]):

Option Default Description
--sample-db PATH None SampleManager JSON library. Enables sample composition.
--sample-beat MODE alongside alongside | substitution | adlib | off
--sample-bassline MODE alongside Same modes.
--sample-melody MODE off Same modes.
--sample-harmony MODE off Same modes.
--sample-gain DB -3.0 Gain applied to all sample layers.
--sample-min-score FLOAT 0.0 Min musicality score for sample selection.

Neural backend options (requires musicgen[neural]):

Option Default Description
--chord-backend markov markov | neural
--melody-backend markov markov | neural
--models-dir PATH <repo>/models Directory with trained .pt files.

Layout options (v0.8):

Option Default Description
--shard-width INT 0 Shard prefix length. 0 = flat; 3 = <root>/000/000042/ for ≤1M samples.

musicgen export / musicgen stats

Export a generated dataset to JSONL, CSV, Parquet, or HuggingFace AudioFolder.

musicgen export ./dataset --out dataset.jsonl           # JSONL (default)
musicgen export ./dataset --out dataset.csv --fmt csv
musicgen export ./dataset --out dataset.parquet --fmt parquet   # requires musicgen[export]
musicgen export ./dataset --out hf/ --fmt hf            # HuggingFace audiofolder + metadata.jsonl
musicgen export ./dataset --out dataset.jsonl --relative  # paths relative to dataset root

musicgen stats ./dataset                                # distribution summary (text)
musicgen stats ./dataset --fmt json                     # machine-readable
Option Default Description
--out / -o (required) Output path (file for JSONL/CSV/Parquet, dir for HF).
--fmt jsonl jsonl | csv | parquet | hf
--relative / -r False Store paths relative to dataset root.
--no-midi False Omit MIDI path columns from output.

Both commands auto-detect flat and sharded dataset layouts.

musicgen score / musicgen filter

Post-hoc quality scoring and filtering for generated datasets.

# Score all unscored samples; updates sample.json atomically
musicgen score ./dataset

# Re-score even already-scored samples
musicgen score ./dataset --force

# Filter: move samples below threshold to a reject directory
musicgen filter ./dataset --min-score 0.6
musicgen filter ./dataset --min-score 0.6 --reject-dir ./bad
musicgen filter ./dataset --min-score 0.6 --dry-run     # preview without moving
Option Default Description
--force False (score) Re-score even samples that already have a score.
--min-score FLOAT (required) (filter) Samples below this are moved to reject dir.
--reject-dir PATH <dataset>/rejected (filter) Destination for rejected samples.
--dry-run False (filter) Report what would happen without moving anything.

musicgen eval

Measure scorer reliability and construct validity.

musicgen eval reliability --type det     # determinism: same input → same score
musicgen eval reliability --type rinv    # rank-invariance: monotone transform preserves order
musicgen eval reliability --type seed    # seed stability: score variance across seeds

musicgen eval validity --mode both       # AUROC ≥ 0.80 criterion (8 pathologies)
musicgen eval validity --mode good       # good-set statistics only
Option Default Description
--type det (reliability) det | rinv | seed
--n-samples 10 Samples per test.
--mode both (validity) good | bad | both
--bootstrap-n 200 Bootstrap iterations for AUROC CI.
--output PATH None Write JSON result to file.

Both commands exit non-zero when the criterion fails.

musicgen download-assets

musicgen download-assets [--sf2] [--midi] [--all] [--name NAME] [--list] [--force]
Option Description
--sf2 Download all default SF2 sources.
--midi Download all default MIDI corpus sources.
--all Download all default sources (SF2 + MIDI).
--name NAME Download a specific source by name (ignores default flag).
--list / -l List all sources with URLs, layers, and license.
--force / -f Re-download even if files already exist.

musicgen samples build

Build a SampleManager library from a directory of audio files.

musicgen samples build --dir ./drums --output drums.json --musicality
musicgen samples build --dir ./loops --output loops.json \
    --category bass --genre electronic --recursive
Option Default Description
--dir / -d (required) Audio files directory (WAV/FLAC/OGG/AIF).
--output / -o (required) Output SampleManager JSON path.
--category auto Force category: beat | bass | melody | harmony.
--genre TAG None Genre tag applied to all samples (repeatable).
--mood TAG None Mood tag (repeatable).
--tag TAG None Extra tag (repeatable).
--musicality False Score samples with musicality.explain().
--recursive / -r False Walk subdirectories.

Category is inferred from filename keywords when --category is not set:

Category Keywords
beat beat, kick, hat, snare, drum, perc, clap, hh, hihat
bass bass, sub
harmony pad, chord, harm, atmo, ambient, strings, vox, choir, keys, piano, organ
melody lead, melody, lick, riff, synth, arp, melo, hook (default fallback)

musicgen index-midi

Index generated MIDI files into a MidiManager database (requires midi_file_manager).

musicgen index-midi --dataset ./dataset --out ./midi_db.json [--csv ./midi_db.csv]
Option Default Description
--dataset / -d (required) musicgen dataset root.
--out / -o ./midi_db.json Output database path.
--midi-dir None Base dir for relative MIDI paths in the db.
--csv None Also export a CSV.

musicgen index-audio

Index generated WAV stems into a SampleManager database (requires audio_sample_manager).

musicgen index-audio --dataset ./dataset --out ./audio_db.json [--csv ./audio_db.csv]
Option Default Description
--dataset / -d (required) musicgen dataset root.
--out / -o ./audio_db.json Output database path.
--samples-dir None Base dir for relative WAV paths in the db.
--csv None Also export a CSV.

Other commands

musicgen list-genres [--genres-dir DIR]   # list available genre presets
musicgen calibrate [-v]                   # measure FluidSynth pre-roll offset (run once per machine)
musicgen clean --failed [--out DIR]       # remove partial sample directories

Genre system (v0.2)

musicgen ships 9 built-in genre presets that constrain generation parameters. Genres are composable — specify multiple to merge their constraints.

musicgen generate --seed 42 --genre jazz
musicgen generate --seed 42 --genre jazz --genre latin   # composition
musicgen list-genres
Genre Tempo Swing Time sigs Style
jazz 80–200 BPM 0.60–0.75 4/4, 3/4, 6/8, 12/8 Swing-heavy, maj7/m7 chords
hip-hop 70–110 BPM 0.50–0.65 4/4 dominant Heavy kick-snare, minor-key bias
blues 60–140 BPM 0.55–0.70 4/4, 6/8, 12/8 Dominant 7ths, shuffle feel
pop 90–140 BPM 0.50–0.55 4/4 dominant Clean patterns, major-key bias
electronic 110–160 BPM 0.50–0.55 4/4 dominant Four-on-floor, synth layers
latin 90–140 BPM 0.50–0.60 4/4, 3/4, 6/8 Clave syncopation, conga patterns
reggae 60–90 BPM 0.50–0.58 4/4 dominant One-drop + steppers patterns, bass-heavy
classical 50–160 BPM 0.50–0.52 4/4, 3/4, 2/4, 5/4, 6/8, 12/8 Wide dynamics, orchestral timbres
rock 70–180 BPM 0.50–0.57 4/4, 3/4, 6/8, 12/8 Strong backbeat, power chords, guitar-driven

Genre constraints applied per parameter type:

  • Tempo/swing — hard bounds: drawn value clamped to [min, max]
  • Time signature — soft weights: shifts draw probabilities
  • Key/scale, chord type, inversions — soft weight dicts + optional hard filter
  • Chord type hard filter — when set, restricts the allowed chord vocabulary entirely (e.g. classical blocks sus/add9; jazz blocks plain triads)
  • Drum patterns — per-time-sig patterns_*.txt files; each genre ships patterns for every time sig it uses
  • FX profile — multiplies effect probabilities
  • Soundfonts — per-layer tag overrides when SoundfontManager is active

Genre wizard

create_genre.py is an interactive terminal wizard for authoring new genre configurations:

python create_genre.py                   # guided wizard, start fresh
python create_genre.py --from rock       # clone an existing genre as defaults
python create_genre.py --list            # list all genres with their files
python create_genre.py --midi            # MIDI drum note reference table

The wizard walks through all spec.json fields, auto-normalizes weight dicts, generates starter beat-pattern files (choose a style: backbeat / swing / electronic / one-drop / minimal), and optionally installs a chord-transition Markov matrix.

See genres/README.md for the full spec.json format and how to write custom genres.


Output format

Directory layout

Flat layout (default, --shard-width 0):

<dataset_root>/
├── manifest.jsonl                  # one append-per-sample log
└── 000042/
    ├── sample.json                 # full annotation — written LAST (completion sentinel)
    ├── mix.wav
    ├── stems/
    │   ├── beat.wav
    │   ├── melody.wav
    │   ├── harmony.wav
    │   └── bassline.wav
    └── midi/
        ├── beat.mid
        ├── melody.mid
        ├── harmony.mid
        └── bassline.mid

Sharded layout (--shard-width 3, recommended for 100k+ samples):

<dataset_root>/
├── manifest.jsonl
├── 000/
│   ├── 000000/          # shard prefix = first 3 chars of zero-padded index
│   │   ├── sample.json
│   │   └── ...
│   └── 000042/
│       └── ...
└── 001/
    └── 001000/
        └── ...

Use sample_dir_path(dataset_root, index, shard_width) from config.py to compute paths. Export, quality pipeline, and manifest traversal all auto-detect the layout.

sample.json is always written last. Its presence means the sample is complete. Re-running generate() with the same (global_seed, sample_index) skips work when this sentinel exists.

sample.json schema

Every sample carries:

  • Identity: seed, musicgen_version, fluidsynth_version
  • Musical params: key, mode, tempo_bpm, time_signature, swing, duration_seconds
  • Structure: song_arrangement ([{part, start_seconds, end_seconds}])
  • Per-part: chord_progression, active_layers, soundfonts, fx_params, time_signatures_per_part, measures_per_part
  • Annotations: beat_times, downbeat_times (seconds, swing-aware from MIDI ticks)
  • Quality: musicality_score (tempo 30%, harmony 30%, rhythm 25%, noise 15%, with render-integrity penalty)
  • Routing: split (train / valid / test, deterministic from seed)
  • Paths: mix, stems.*, midi.* (relative to sample dir)
  • Sample composition: used_samples (when --sample-db is active)

sample.json is serialized with sort_keys=True — byte-identical re-runs are detectable via SHA-256 without parsing.

manifest.jsonl

One JSON object per sample: sample_index, seed, status (ok/failed), split, path, musicality_score, duration_seconds, attempt, wrote_at.


Musicality scoring and quality gate (v0.3)

Rejects samples that would contaminate a training distribution — not rank good music from very good music.

Two-layer architecture

Layer 1 — symbolic (pre-render, < 5 ms). check_midi_quality(midi_paths, key) runs hard checks (empty layer, stuck pitch > 80%, extreme pitch range > 36 semitones) and soft metrics on the melody (Krumhansl–Schmuckler key-profile correlation, scale adherence, melodic step fraction, n-gram entropy, LZ compression ratio). Failing hard checks → score 0.0, no render.

Layer 2 — audio integrity (post-render). get_musicality_score(filename) applies a render-integrity penalty (clipping, silence, DC offset) to a weighted musical analysis (tempo stability/clarity 30%, harmony KS correlation 30%, rhythm regularity/strength 25%, noise/spectral 15%).

Quality-gate loop

result = generate(Config(
    global_seed=42,
    sample_index=0,
    dataset_root="./dataset",
    min_musicality_score=0.6,   # reject below 0.6; 0.0 = disabled
    max_attempts=3,             # re-roll up to 3x with distinct seeds
))
print(result.attempt)           # which attempt was accepted (1, 2, or 3)

Standalone musicality package

pip install -e '.[samples]'   # musicality is bundled in src/musicality/

musicality score  ./mix.wav
musicality explain ./mix.wav
musicality batch  ./dataset/**/*.wav --output scores.csv

See docs/musicality-scoring.md for metric derivations and literature references.


Neural backends (v0.5)

Replace Markov matrices with small LSTMs trained on a self-generated corpus.

Install

pip install -e '.[neural]'   # requires torch >= 2.0

Workflow

# 1. Generate a training corpus (MIDI-only is fast)
musicgen generate --count 500 --seed 1 --out ./corpus --output-mode midi-only

# 2. Extract chord/melody sequences
musicgen extract-sequences --dataset ./corpus --output sequences.json

# 3. Train models
musicgen train --sequences sequences.json --layer chord --output-dir ./models
musicgen train --sequences sequences.json --layer melody --output-dir ./models

# Genre-specific models take precedence at inference
musicgen train --sequences sequences.json --layer chord --genre jazz --output-dir ./models

# 4. Generate with neural backends
musicgen generate --count 32 --seed 1 --out ./dataset \
    --chord-backend neural --melody-backend neural \
    --models-dir ./models

models_dir lookup order: chord_{genre}.ptchord.pt. Missing file → Markov fallback with warning.

Model sizes

Model Params Architecture
ChordLSTM ~35 K 2-layer LSTM, hidden=64, genre one-hot conditioning
MelodyLSTM ~10 K 2-layer LSTM, hidden=32

Determinism

The determinism contract is preserved: logits are pure (fixed weights → fixed given input), and sampling uses rng.choices(tokens, weights=softmax(logits)) — the same seeded random.Random instance used by the Markov path.

See docs/neural-generators.md for model architecture, sequences.json schema, and training hyperparameters.


Sample composition (v0.4)

Mix real audio samples alongside or instead of FluidSynth-rendered layers.

Install

pip install -e '.[samples]'   # audio-sample-manager, soundfile, rubberband-stretch

Workflow

# 1. Build a sample library
musicgen samples build --dir ./my_drums --output drums.json --musicality
musicgen samples build --dir ./bass_loops --output bass.json \
    --category bass --genre electronic --recursive

# 2. Generate with sample composition
musicgen generate --seed 42 --count 10 --out ./dataset \
    --sample-db drums.json \
    --sample-beat alongside \
    --sample-bassline alongside \
    --sample-gain -6 \
    --sample-min-score 0.65

Mixing modes

Mode Behaviour
alongside Sample overlaid on the FluidSynth-rendered mix (additive).
substitution Sample replaces the FluidSynth stem before mixing.
adlib One-shot placed at a specific beat offset. Requires oneshot_at_beat in Python API.
off FluidSynth only (default for melody and harmony).

Python API

from musicgen import generate, Config
from musicgen.sample_composition import SampleLayerRule, SampleCompositionConfig

cfg = Config(
    global_seed=42,
    sample_index=0,
    dataset_root="./dataset",
    sample_composition=SampleCompositionConfig(
        sample_db_path="./library.json",
        layer_rules={
            "beat": SampleLayerRule(
                layer="beat",
                mode="alongside",
                gain_db=-6.0,
                max_bpm_stretch_pct=15.0,
                min_musicality_score=0.65,
                genre=["hip-hop"],
            ),
            "bassline": SampleLayerRule(
                layer="bassline",
                mode="substitution",
                gain_db=-3.0,
            ),
        },
        global_min_musicality=0.50,
        allow_transposition=True,
        allow_time_stretching=True,
    ),
)
result = generate(cfg)

See docs/sample-composition.md for full reference.


Optional integrations (v0.2)

All three integrations are opt-in with zero new hard dependencies. Each package is lazy-imported; a clear ImportError with an install hint is raised when absent.

SoundfontManager — tag-based soundfont selection

pip install git+https://github.com/dobidu/soundfont_manager

Replaces blind rng.choice(os.listdir(...)) with metadata-aware tag-based selection from a SoundfontManager JSON database.

result = generate(Config(
    global_seed=42,
    sample_index=0,
    dataset_root="./dataset",
    soundfont_manager_db="/path/to/soundfonts.json",
    soundfont_manager_sf_dir="/path/to/sf2/files",
))

Layer → tag mapping: beat["drums", "percussion"], melody["melody", "lead", "piano", "strings"], harmony["harmony", "chords", "pads"], bassline["bass"].

Fallback: any error or empty tag result → sorted directory scan.

MIDI indexer

pip install git+https://github.com/dobidu/midi_file_manager

musicgen index-midi --dataset ./dataset --out ./midi_db.json

Indexes all generated MIDI files into a MidiManager database with ground-truth musicgen metadata (tempo_bpm, key, time_signature, split, musicality_score).

Audio indexer

pip install git+https://github.com/dobidu/audio_sample_manager

musicgen index-audio --dataset ./dataset --out ./audio_db.json

Indexes generated WAV stems into a SampleManager database alongside external audio libraries — enables unified cross-library queries (e.g., "all bass stems at 90 BPM in A minor").


Library API

from musicgen import generate, generate_batch, Config, SampleResult, BatchResult

# Single sample
result = generate(Config(global_seed=42, sample_index=0, dataset_root="./dataset"))
print(result.sample_dir)        # "./dataset/000000"
print(result.split)             # "train" | "valid" | "test"
print(result.musicality_score)  # float
print(result.status)            # "ok" | "failed"

# Batch
result = generate_batch(Config(global_seed=1, count=32, dataset_root="./dataset", workers=4))
print(result.succeeded, result.failed, result.skipped)

Re-running with the same (global_seed, sample_index) short-circuits when sample.json exists — batches are idempotent.


Configuration reference

Config is a @dataclass with three precedence layers: CLI args > env vars > defaults.

Core fields

Field Default Env var Notes
global_seed None Required at generate time.
sample_index 0 Per-sample identity within dataset.
dataset_root <repo>/dataset MUSICGEN_DATASET_ROOT Output directory.
count 1 MUSICGEN_COUNT Samples per generate_batch.
workers None (all cores) generate_batch parallelism.
output_mode "full" MUSICGEN_OUTPUT_MODE full / mix-only / stems-only / midi-only
split_ratios (0.8, 0.1, 0.1) Train/valid/test split.

Quality gate

Field Default Env var Notes
min_musicality_score 0.0 MUSICGEN_MIN_MUSICALITY_SCORE 0.0 = disabled.
max_attempts 1 MUSICGEN_MAX_ATTEMPTS Max re-roll attempts per sample.

Soundfonts and assets

Field Default Env var Notes
sf_dir <repo>/sf MUSICGEN_SF_DIR Root directory for sf/<layer>/ subdirs.
auto_download_sf2 True MUSICGEN_AUTO_DOWNLOAD_SF2 Download default SF2 on empty pool.
assets_toml <repo>/assets.toml Asset registry path.
soundfont_manager_db None MUSICGEN_SOUNDFONT_MANAGER_DB Activates tag-based soundfont selection.
soundfont_manager_sf_dir None MUSICGEN_SOUNDFONT_MANAGER_SF_DIR Base dir for relative SF2 paths in SM db.

Genre

Field Default Env var Notes
genre None MUSICGEN_GENRE (comma-separated) Genre name(s) for constrained generation.
genres_dir <repo>/genres MUSICGEN_GENRES_DIR Root dir for genre spec files.

Neural backends

Field Default Notes
chord_backend "markov" "markov" or "neural". Falls back to Markov when model is absent.
melody_backend "markov" Same.
models_dir <repo>/models Directory with .pt checkpoint files.

Layout (v0.8)

Field Default Notes
shard_width 0 Shard prefix length. 0 = flat (<root>/000042/); 3 = <root>/000/000042/. Range 0–5.
measures_per_part_override None Dict overriding per-part measure counts after time-sig scaling. E.g. {"intro": 4, "verse": 8, "chorus": 8, "bridge": 4, "outro": 4} for short listening demos.

Domain-specific config files

File Purpose
song_structures.json Song arrangements (intro/verse/chorus/bridge/outro).
chord_patterns.txt Chord progressions per song part.
beat_roll_patterns_<sig>.txt Drum patterns per time signature.
inst_probabilities.json Per-layer inclusion probabilities.
levels.json Per-layer gain and pan.
*_fx.json FX chain parameter ranges per layer.

Determinism

Same global_seed + same sample_index → bit-identical MIDI + bit-identical canonical sample.json regardless of PYTHONHASHSEED. WAV bit-identity holds when the FluidSynth binary version matches.

Five named random.Random instances per sample, derived deterministically from the sample seed:

sample_seed = derive_sample_seed(global_seed, sample_index)   # sha256[:8]
rngs = make_rngs(sample_seed)
# params, generators, soundfonts, fx, mix — each seeded with seed ^ offset

Zero bare random.* calls anywhere in src/musicgen/ — enforced by an AST static guard. Global random state is never touched.

Regression tests (tests/test_determinism_golden.py):

  • TestSameProcessStability — fast, no FluidSynth — runs generate() twice and asserts sha256(sample.json) matches.
  • TestDeterminismGoldens@pytest.mark.slow — compares SHA-256 artifacts for mix.wav + MIDIs + sample.json across separate process invocations.

Architecture

src/musicgen/
├── __init__.py           # public exports: generate, generate_batch, Config, SampleResult, BatchResult
├── api.py                # generate(Config) — composition root; resolve_genre_spec
├── batch.py              # generate_batch(Config) → BatchResult via ProcessPoolExecutor
├── cli.py                # typer app — all CLI commands
├── config.py (root)      # Config dataclass with CLI > env > defaults precedence
├── asset_downloader.py   # download SF2/MIDI from assets.toml; auto-trigger on empty pool
├── calibrate.py          # FluidSynth pre-roll measurement + .musicgen/ cache
├── seeds.py              # derive_sample_seed, make_rngs, save_random_state, assign_split
├── genre.py              # GenreSpec, load_genre, merge_genres, resolve_genres
├── sampler.py            # SongParams + genre-constrained draws
├── generators/
│   ├── chord.py          # Markov/neural chord generation; extended chord vocab
│   ├── melody.py         # Markov/neural melody; scale-degree path
│   ├── bassline.py       # Bassline generation (keyed to chords + melody)
│   └── beat.py           # Drum patterns + swing; genre pattern union
├── neural/               # optional — requires musicgen[neural]
│   ├── model.py          # ChordLSTM, MelodyLSTM, NeuralSampler
│   ├── trainer.py        # train(), save_model(), load_model()
│   └── sampler.py        # sample_chord_neural(), sample_melody_neural()
├── corpus_extractor.py   # extract_sequences() — dataset → sequences.json
├── renderer.py           # FluidSynth wrapper; ThreadPoolExecutor stem rendering; soundfont selection
├── mixer.py              # FX (pedalboard), pydub overlay, layer mask, part concat
├── beats.py              # MIDI-tick beat/downbeat extraction (mido), swing-aware
├── annotator.py          # pure-function sample.json assembler
├── musicality.py         # Layer 1 MIDI quality + Layer 2 audio integrity scorer
├── writer.py             # atomic sample dir, sum-of-stems assertion, output_mode routing
├── manifest.py           # ManifestWriter (JSONL, append-under-lock)
├── quality.py            # score_dataset(), filter_dataset(), quality_report() — batch quality pipeline
├── exporter.py           # collect_samples(), export_dataset() — JSONL/CSV/Parquet/HF export
├── sample_composition.py # SampleLayerRule, SampleCompositionConfig
├── sample_mixer.py       # BPM stretch, key shift, loop tiling, alongside/substitution
├── sample_builder.py     # build_library() — WAV dir → SampleManager JSON
├── midi_indexer.py       # index_midi_dataset() — indexes MIDI into MidiManager db
└── audio_indexer.py      # index_audio_dataset() — indexes WAV into SampleManager db

Pipeline:

resolve_genre_spec → sampler (genre-constrained draws)
  → generators (chord: LSTM or Markov; melody: LSTM or Markov; bassline/beat: Markov)
  → check_midi_quality (Layer 1: hard + soft symbolic checks, < 5 ms)
  → [re-roll up to max_attempts if score < min_musicality_score]
  → renderer (FluidSynth parallel stems; genre soundfont tags; auto-download on empty pool)
  → mixer (FX + overlay + concat; genre FX profile)
  → beats (MIDI-tick extraction)
  → get_musicality_score (Layer 2: audio integrity + musical analysis)
  → annotator (sample.json dict + pre-roll offset)
  → writer (atomic sample dir + sum-of-stems + output_mode routing)
  → manifest (JSONL append)
  → SampleResult

generate_batch wraps generate in a ProcessPoolExecutor (spawn context) and returns BatchResult.


Try in the cloud

Platform How
Google Colab Click a badge at the top. Each notebook has a setup cell that apt installs FluidSynth + fluid-soundfont-gm and pip-installs musicgen. Demo · Sample composition · Neural generators
mybinder.org JupyterLab in the browser, all deps pre-wired. Cold build ~5–10 min. Launch
HuggingFace Spaces Gradio web UI wrapping musicgen.generate(). Source under hf_space/ (Dockerfile + app.py). See hf_space/README.md.

Tests

pytest -m "not slow"    # fast suite — 1648 tests, ~12 s
pytest -m slow          # requires FluidSynth binary + populated sf/ pools
pytest                  # everything

Coverage target: ≥ 80% on pure functions (samplers, generators, annotator, beats, validators).


Contributing

PRs welcome. Run pytest -m "not slow" before submitting. Project planning lives under .planning/.


License

See LICENSE.

Acknowledgments

About

Random (and 'coherent') Music Generator

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors