A Python library and CLI for generating reproducible, fully-annotated synthetic music datasets for ML/MIR research. Each sample is a complete training example: mixed audio, per-layer stems, per-layer MIDI, and a rich JSON annotation with every musical and synthesis parameter.
Suitable for training models that learn music tagging, source separation, beat/tempo/downbeat detection, and audio→MIDI transcription at the 1k–10k sample scale.
| Version | What shipped |
|---|---|
| v0.8 | Soundfont license audit (default SF2 → FluidR3_GM MIT); sharded layout --shard-width 3 for 100k+ datasets; SF2 pool expansion (GeneralUserGS/MuseScoreGeneral/SGM-V2); rock genre; beat-pattern coverage for all time sigs across all genres; chord_type_hard_filter set for classical/pop; measures_per_part_override; create_genre.py wizard |
| v0.7 | Dataset export (musicgen export/stats); quality pipeline (musicgen score/filter); eval CLI (musicgen eval reliability/validity); neural tests |
| v0.6 | Asset downloader — musicgen download-assets bootstraps SF2 soundfonts and MIDI corpora from open sources; assets.toml registry with checksums and license metadata |
| v0.5 | ML-assisted generators — LSTM chord/melody models trained on self-generated corpus; extract-sequences + train CLI |
| v0.4 | Sample composition — real audio samples alongside/substituting FluidSynth layers; musicality standalone package |
| v0.3 | Higher-order Markov — 2nd-order chords, two-layer quality gate, calibration harness |
| v0.2 | Genre system — 8 built-in genres, GenreSpec composition engine, extended chord vocabulary |
| v0.1 | Initial release — single-sample API, parallel batch, full CLI, determinism contract |
# 1. Clone and install
git clone https://github.com/dobidu/layered_music_gen.git
cd layered_music_gen
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
# 2. Install FluidSynth (system dependency)
sudo apt-get install fluidsynth # Ubuntu/Debian
# brew install fluidsynth # macOS
# 3. Download default soundfonts ← required before generating (FluidR3_GM, MIT, ~141 MB)
musicgen download-assets --sf2
# 4. Generate a dataset
musicgen generate --seed 42 --count 10 --out ./dataset
# 5. Explore the output
ls dataset/000000/
# mix.wav sample.json stems/ midi/Each sample directory contains:
dataset/
├── manifest.jsonl
└── 000000/
├── sample.json # full annotation (written last — completion sentinel)
├── mix.wav # final mixed audio
├── stems/ # post-FX per-layer WAV stems
│ ├── beat.wav
│ ├── melody.wav
│ ├── harmony.wav
│ └── bassline.wav
└── midi/ # per-layer MIDI (concatenated across all song parts)
├── beat.mid
├── melody.mid
├── harmony.mid
└── bassline.mid
For 100k+ samples use --shard-width 3 (see Sharded layout).
pip install -e .Requires Python ≥ 3.10 and FluidSynth on PATH:
sudo apt-get install fluidsynth # Ubuntu/Debian
brew install fluidsynth # macOS
# Windows: https://www.fluidsynth.org/pip install -e '.[samples]' # real audio sample composition (v0.4)
pip install -e '.[neural]' # LSTM chord/melody backends (v0.5)
pip install -e '.[dev]' # test suitemusicgen ships an asset registry (assets.toml) and a downloader that bootstraps soundfonts and MIDI corpora from free/open sources. No manual file hunting.
# Download default SF2 (FluidR3_GM, 141 MB, MIT — placed in all sf/<layer>/ dirs)
musicgen download-assets --sf2
# List all available sources
musicgen download-assets --list
# Download a specific opt-in source by name
musicgen download-assets --name GeneralUserGS # ~31 MB, melody/harmony layers
musicgen download-assets --name TimGM6mb # ~5.7 MB, all layers (GPL-2.0, see note)
# Re-download (overwrite)
musicgen download-assets --sf2 --forceDefault SF2 sources (included in --sf2):
| Name | Size | Layers | License | Dataset-safe? |
|---|---|---|---|---|
| FluidR3_GM | ~141 MB | all | MIT | Yes |
Opt-in SF2 sources (use --name):
| Name | Size | Layers | License | Dataset-safe? |
|---|---|---|---|---|
| GeneralUserGS | ~31 MB | all | custom-permissive | Yes |
| MuseScoreGeneral | ~206 MB | all | MIT | Yes |
| SGM-V2 | ~236 MB | all | CC-BY 3.0 | Yes — with attribution |
| TimGM6mb | 5.7 MB | all | GPL-2.0 | Only if dataset is GPL-compatible |
GPL notice: Audio rendered with
TimGM6mbis a derivative work under GPL-2.0. Do not use it when building datasets intended for public distribution unless your dataset license is GPL-compatible. UseFluidR3_GM(default, MIT) instead. SeeLICENSES.soundfonts.mdfor full details.
Downloaded MIDI files land in midi_assets/<layer>/ and can be indexed with musicgen index-midi.
# Download default MIDI corpora (GrooveMIDI + FreeMidiChords)
musicgen download-assets --midi
# Opt-in large corpus
musicgen download-assets --name LakhCleanMIDI # 223 MB, CC-BY-4.0Default MIDI sources (included in --midi):
| Name | Files | Layers | License |
|---|---|---|---|
| GrooveMIDI | 1,150 drum files | beat | CC-BY-4.0 |
| FreeMidiChords | 11,400 progressions | harmony | MIT |
Opt-in MIDI sources:
| Name | Size | Layers | License |
|---|---|---|---|
| FannonChords | ~400 chord shapes | harmony | MIT |
| FannonScales | ~300 scale patterns | melody | MIT |
| LakhCleanMIDI | 17k multi-track | melody, harmony, bassline | CC-BY-4.0 |
When auto_download_sf2 = true (default), musicgen silently downloads default SF2 sources on the first generate call if any layer pool is empty. Disable with:
export MUSICGEN_AUTO_DOWNLOAD_SF2=falseEdit assets.toml. Fill url= and run sha256sum <file> to get the checksum:
[[sf2]]
name = "MySF2"
description = "My custom soundfont"
url = "https://example.com/my.sf2"
sha256 = "abc123..." # sha256sum my.sf2
filename = "my.sf2"
size_hint_mb = 20
license = "MIT"
license_url = "https://example.com/LICENSE"
layers = ["melody", "harmony"]
default = falsemusicgen generate --seed SEED [options]| Option | Default | Description |
|---|---|---|
--seed / -s |
(required) | Global RNG seed. |
--count / -n |
1 |
Number of samples to generate. |
--out / -o |
./dataset |
Output directory. |
--workers / -w |
all cores | Parallel workers. |
--output-mode / -m |
full |
full | mix-only | stems-only | midi-only |
--genre / -g |
None | Genre preset (repeatable for composition). |
--genres-dir |
<repo>/genres |
Custom genres directory. |
--min-musicality-score |
— | Via Config; see Musicality scoring. |
--verbose / -v |
— | DEBUG logging. |
--quiet / -q |
— | ERROR-only logging. |
Sample composition options (requires musicgen[samples]):
| Option | Default | Description |
|---|---|---|
--sample-db PATH |
None | SampleManager JSON library. Enables sample composition. |
--sample-beat MODE |
alongside |
alongside | substitution | adlib | off |
--sample-bassline MODE |
alongside |
Same modes. |
--sample-melody MODE |
off |
Same modes. |
--sample-harmony MODE |
off |
Same modes. |
--sample-gain DB |
-3.0 |
Gain applied to all sample layers. |
--sample-min-score FLOAT |
0.0 |
Min musicality score for sample selection. |
Neural backend options (requires musicgen[neural]):
| Option | Default | Description |
|---|---|---|
--chord-backend |
markov |
markov | neural |
--melody-backend |
markov |
markov | neural |
--models-dir PATH |
<repo>/models |
Directory with trained .pt files. |
Layout options (v0.8):
| Option | Default | Description |
|---|---|---|
--shard-width INT |
0 |
Shard prefix length. 0 = flat; 3 = <root>/000/000042/ for ≤1M samples. |
Export a generated dataset to JSONL, CSV, Parquet, or HuggingFace AudioFolder.
musicgen export ./dataset --out dataset.jsonl # JSONL (default)
musicgen export ./dataset --out dataset.csv --fmt csv
musicgen export ./dataset --out dataset.parquet --fmt parquet # requires musicgen[export]
musicgen export ./dataset --out hf/ --fmt hf # HuggingFace audiofolder + metadata.jsonl
musicgen export ./dataset --out dataset.jsonl --relative # paths relative to dataset root
musicgen stats ./dataset # distribution summary (text)
musicgen stats ./dataset --fmt json # machine-readable| Option | Default | Description |
|---|---|---|
--out / -o |
(required) | Output path (file for JSONL/CSV/Parquet, dir for HF). |
--fmt |
jsonl |
jsonl | csv | parquet | hf |
--relative / -r |
False |
Store paths relative to dataset root. |
--no-midi |
False |
Omit MIDI path columns from output. |
Both commands auto-detect flat and sharded dataset layouts.
Post-hoc quality scoring and filtering for generated datasets.
# Score all unscored samples; updates sample.json atomically
musicgen score ./dataset
# Re-score even already-scored samples
musicgen score ./dataset --force
# Filter: move samples below threshold to a reject directory
musicgen filter ./dataset --min-score 0.6
musicgen filter ./dataset --min-score 0.6 --reject-dir ./bad
musicgen filter ./dataset --min-score 0.6 --dry-run # preview without moving| Option | Default | Description |
|---|---|---|
--force |
False |
(score) Re-score even samples that already have a score. |
--min-score FLOAT |
(required) | (filter) Samples below this are moved to reject dir. |
--reject-dir PATH |
<dataset>/rejected |
(filter) Destination for rejected samples. |
--dry-run |
False |
(filter) Report what would happen without moving anything. |
Measure scorer reliability and construct validity.
musicgen eval reliability --type det # determinism: same input → same score
musicgen eval reliability --type rinv # rank-invariance: monotone transform preserves order
musicgen eval reliability --type seed # seed stability: score variance across seeds
musicgen eval validity --mode both # AUROC ≥ 0.80 criterion (8 pathologies)
musicgen eval validity --mode good # good-set statistics only| Option | Default | Description |
|---|---|---|
--type |
det |
(reliability) det | rinv | seed |
--n-samples |
10 |
Samples per test. |
--mode |
both |
(validity) good | bad | both |
--bootstrap-n |
200 |
Bootstrap iterations for AUROC CI. |
--output PATH |
None | Write JSON result to file. |
Both commands exit non-zero when the criterion fails.
musicgen download-assets [--sf2] [--midi] [--all] [--name NAME] [--list] [--force]| Option | Description |
|---|---|
--sf2 |
Download all default SF2 sources. |
--midi |
Download all default MIDI corpus sources. |
--all |
Download all default sources (SF2 + MIDI). |
--name NAME |
Download a specific source by name (ignores default flag). |
--list / -l |
List all sources with URLs, layers, and license. |
--force / -f |
Re-download even if files already exist. |
Build a SampleManager library from a directory of audio files.
musicgen samples build --dir ./drums --output drums.json --musicality
musicgen samples build --dir ./loops --output loops.json \
--category bass --genre electronic --recursive| Option | Default | Description |
|---|---|---|
--dir / -d |
(required) | Audio files directory (WAV/FLAC/OGG/AIF). |
--output / -o |
(required) | Output SampleManager JSON path. |
--category |
auto | Force category: beat | bass | melody | harmony. |
--genre TAG |
None | Genre tag applied to all samples (repeatable). |
--mood TAG |
None | Mood tag (repeatable). |
--tag TAG |
None | Extra tag (repeatable). |
--musicality |
False |
Score samples with musicality.explain(). |
--recursive / -r |
False |
Walk subdirectories. |
Category is inferred from filename keywords when --category is not set:
| Category | Keywords |
|---|---|
beat |
beat, kick, hat, snare, drum, perc, clap, hh, hihat |
bass |
bass, sub |
harmony |
pad, chord, harm, atmo, ambient, strings, vox, choir, keys, piano, organ |
melody |
lead, melody, lick, riff, synth, arp, melo, hook (default fallback) |
Index generated MIDI files into a MidiManager database (requires midi_file_manager).
musicgen index-midi --dataset ./dataset --out ./midi_db.json [--csv ./midi_db.csv]| Option | Default | Description |
|---|---|---|
--dataset / -d |
(required) | musicgen dataset root. |
--out / -o |
./midi_db.json |
Output database path. |
--midi-dir |
None | Base dir for relative MIDI paths in the db. |
--csv |
None | Also export a CSV. |
Index generated WAV stems into a SampleManager database (requires audio_sample_manager).
musicgen index-audio --dataset ./dataset --out ./audio_db.json [--csv ./audio_db.csv]| Option | Default | Description |
|---|---|---|
--dataset / -d |
(required) | musicgen dataset root. |
--out / -o |
./audio_db.json |
Output database path. |
--samples-dir |
None | Base dir for relative WAV paths in the db. |
--csv |
None | Also export a CSV. |
musicgen list-genres [--genres-dir DIR] # list available genre presets
musicgen calibrate [-v] # measure FluidSynth pre-roll offset (run once per machine)
musicgen clean --failed [--out DIR] # remove partial sample directoriesmusicgen ships 9 built-in genre presets that constrain generation parameters. Genres are composable — specify multiple to merge their constraints.
musicgen generate --seed 42 --genre jazz
musicgen generate --seed 42 --genre jazz --genre latin # composition
musicgen list-genres| Genre | Tempo | Swing | Time sigs | Style |
|---|---|---|---|---|
jazz |
80–200 BPM | 0.60–0.75 | 4/4, 3/4, 6/8, 12/8 | Swing-heavy, maj7/m7 chords |
hip-hop |
70–110 BPM | 0.50–0.65 | 4/4 dominant | Heavy kick-snare, minor-key bias |
blues |
60–140 BPM | 0.55–0.70 | 4/4, 6/8, 12/8 | Dominant 7ths, shuffle feel |
pop |
90–140 BPM | 0.50–0.55 | 4/4 dominant | Clean patterns, major-key bias |
electronic |
110–160 BPM | 0.50–0.55 | 4/4 dominant | Four-on-floor, synth layers |
latin |
90–140 BPM | 0.50–0.60 | 4/4, 3/4, 6/8 | Clave syncopation, conga patterns |
reggae |
60–90 BPM | 0.50–0.58 | 4/4 dominant | One-drop + steppers patterns, bass-heavy |
classical |
50–160 BPM | 0.50–0.52 | 4/4, 3/4, 2/4, 5/4, 6/8, 12/8 | Wide dynamics, orchestral timbres |
rock |
70–180 BPM | 0.50–0.57 | 4/4, 3/4, 6/8, 12/8 | Strong backbeat, power chords, guitar-driven |
Genre constraints applied per parameter type:
- Tempo/swing — hard bounds: drawn value clamped to
[min, max] - Time signature — soft weights: shifts draw probabilities
- Key/scale, chord type, inversions — soft weight dicts + optional hard filter
- Chord type hard filter — when set, restricts the allowed chord vocabulary entirely (e.g. classical blocks sus/add9; jazz blocks plain triads)
- Drum patterns — per-time-sig
patterns_*.txtfiles; each genre ships patterns for every time sig it uses - FX profile — multiplies effect probabilities
- Soundfonts — per-layer tag overrides when SoundfontManager is active
create_genre.py is an interactive terminal wizard for authoring new genre configurations:
python create_genre.py # guided wizard, start fresh
python create_genre.py --from rock # clone an existing genre as defaults
python create_genre.py --list # list all genres with their files
python create_genre.py --midi # MIDI drum note reference tableThe wizard walks through all spec.json fields, auto-normalizes weight dicts, generates starter beat-pattern files (choose a style: backbeat / swing / electronic / one-drop / minimal), and optionally installs a chord-transition Markov matrix.
See genres/README.md for the full spec.json format and how to write custom genres.
Flat layout (default, --shard-width 0):
<dataset_root>/
├── manifest.jsonl # one append-per-sample log
└── 000042/
├── sample.json # full annotation — written LAST (completion sentinel)
├── mix.wav
├── stems/
│ ├── beat.wav
│ ├── melody.wav
│ ├── harmony.wav
│ └── bassline.wav
└── midi/
├── beat.mid
├── melody.mid
├── harmony.mid
└── bassline.mid
Sharded layout (--shard-width 3, recommended for 100k+ samples):
<dataset_root>/
├── manifest.jsonl
├── 000/
│ ├── 000000/ # shard prefix = first 3 chars of zero-padded index
│ │ ├── sample.json
│ │ └── ...
│ └── 000042/
│ └── ...
└── 001/
└── 001000/
└── ...
Use sample_dir_path(dataset_root, index, shard_width) from config.py to compute paths. Export, quality pipeline, and manifest traversal all auto-detect the layout.
sample.json is always written last. Its presence means the sample is complete. Re-running generate() with the same (global_seed, sample_index) skips work when this sentinel exists.
Every sample carries:
- Identity:
seed,musicgen_version,fluidsynth_version - Musical params:
key,mode,tempo_bpm,time_signature,swing,duration_seconds - Structure:
song_arrangement([{part, start_seconds, end_seconds}]) - Per-part:
chord_progression,active_layers,soundfonts,fx_params,time_signatures_per_part,measures_per_part - Annotations:
beat_times,downbeat_times(seconds, swing-aware from MIDI ticks) - Quality:
musicality_score(tempo 30%, harmony 30%, rhythm 25%, noise 15%, with render-integrity penalty) - Routing:
split(train/valid/test, deterministic from seed) - Paths:
mix,stems.*,midi.*(relative to sample dir) - Sample composition:
used_samples(when--sample-dbis active)
sample.json is serialized with sort_keys=True — byte-identical re-runs are detectable via SHA-256 without parsing.
One JSON object per sample: sample_index, seed, status (ok/failed), split, path, musicality_score, duration_seconds, attempt, wrote_at.
Rejects samples that would contaminate a training distribution — not rank good music from very good music.
Layer 1 — symbolic (pre-render, < 5 ms). check_midi_quality(midi_paths, key) runs hard checks (empty layer, stuck pitch > 80%, extreme pitch range > 36 semitones) and soft metrics on the melody (Krumhansl–Schmuckler key-profile correlation, scale adherence, melodic step fraction, n-gram entropy, LZ compression ratio). Failing hard checks → score 0.0, no render.
Layer 2 — audio integrity (post-render). get_musicality_score(filename) applies a render-integrity penalty (clipping, silence, DC offset) to a weighted musical analysis (tempo stability/clarity 30%, harmony KS correlation 30%, rhythm regularity/strength 25%, noise/spectral 15%).
result = generate(Config(
global_seed=42,
sample_index=0,
dataset_root="./dataset",
min_musicality_score=0.6, # reject below 0.6; 0.0 = disabled
max_attempts=3, # re-roll up to 3x with distinct seeds
))
print(result.attempt) # which attempt was accepted (1, 2, or 3)pip install -e '.[samples]' # musicality is bundled in src/musicality/
musicality score ./mix.wav
musicality explain ./mix.wav
musicality batch ./dataset/**/*.wav --output scores.csvSee docs/musicality-scoring.md for metric derivations and literature references.
Replace Markov matrices with small LSTMs trained on a self-generated corpus.
pip install -e '.[neural]' # requires torch >= 2.0# 1. Generate a training corpus (MIDI-only is fast)
musicgen generate --count 500 --seed 1 --out ./corpus --output-mode midi-only
# 2. Extract chord/melody sequences
musicgen extract-sequences --dataset ./corpus --output sequences.json
# 3. Train models
musicgen train --sequences sequences.json --layer chord --output-dir ./models
musicgen train --sequences sequences.json --layer melody --output-dir ./models
# Genre-specific models take precedence at inference
musicgen train --sequences sequences.json --layer chord --genre jazz --output-dir ./models
# 4. Generate with neural backends
musicgen generate --count 32 --seed 1 --out ./dataset \
--chord-backend neural --melody-backend neural \
--models-dir ./modelsmodels_dir lookup order: chord_{genre}.pt → chord.pt. Missing file → Markov fallback with warning.
| Model | Params | Architecture |
|---|---|---|
| ChordLSTM | ~35 K | 2-layer LSTM, hidden=64, genre one-hot conditioning |
| MelodyLSTM | ~10 K | 2-layer LSTM, hidden=32 |
The determinism contract is preserved: logits are pure (fixed weights → fixed given input), and sampling uses rng.choices(tokens, weights=softmax(logits)) — the same seeded random.Random instance used by the Markov path.
See docs/neural-generators.md for model architecture, sequences.json schema, and training hyperparameters.
Mix real audio samples alongside or instead of FluidSynth-rendered layers.
pip install -e '.[samples]' # audio-sample-manager, soundfile, rubberband-stretch# 1. Build a sample library
musicgen samples build --dir ./my_drums --output drums.json --musicality
musicgen samples build --dir ./bass_loops --output bass.json \
--category bass --genre electronic --recursive
# 2. Generate with sample composition
musicgen generate --seed 42 --count 10 --out ./dataset \
--sample-db drums.json \
--sample-beat alongside \
--sample-bassline alongside \
--sample-gain -6 \
--sample-min-score 0.65| Mode | Behaviour |
|---|---|
alongside |
Sample overlaid on the FluidSynth-rendered mix (additive). |
substitution |
Sample replaces the FluidSynth stem before mixing. |
adlib |
One-shot placed at a specific beat offset. Requires oneshot_at_beat in Python API. |
off |
FluidSynth only (default for melody and harmony). |
from musicgen import generate, Config
from musicgen.sample_composition import SampleLayerRule, SampleCompositionConfig
cfg = Config(
global_seed=42,
sample_index=0,
dataset_root="./dataset",
sample_composition=SampleCompositionConfig(
sample_db_path="./library.json",
layer_rules={
"beat": SampleLayerRule(
layer="beat",
mode="alongside",
gain_db=-6.0,
max_bpm_stretch_pct=15.0,
min_musicality_score=0.65,
genre=["hip-hop"],
),
"bassline": SampleLayerRule(
layer="bassline",
mode="substitution",
gain_db=-3.0,
),
},
global_min_musicality=0.50,
allow_transposition=True,
allow_time_stretching=True,
),
)
result = generate(cfg)See docs/sample-composition.md for full reference.
All three integrations are opt-in with zero new hard dependencies. Each package is lazy-imported; a clear ImportError with an install hint is raised when absent.
pip install git+https://github.com/dobidu/soundfont_managerReplaces blind rng.choice(os.listdir(...)) with metadata-aware tag-based selection from a SoundfontManager JSON database.
result = generate(Config(
global_seed=42,
sample_index=0,
dataset_root="./dataset",
soundfont_manager_db="/path/to/soundfonts.json",
soundfont_manager_sf_dir="/path/to/sf2/files",
))Layer → tag mapping: beat → ["drums", "percussion"], melody → ["melody", "lead", "piano", "strings"], harmony → ["harmony", "chords", "pads"], bassline → ["bass"].
Fallback: any error or empty tag result → sorted directory scan.
pip install git+https://github.com/dobidu/midi_file_manager
musicgen index-midi --dataset ./dataset --out ./midi_db.jsonIndexes all generated MIDI files into a MidiManager database with ground-truth musicgen metadata (tempo_bpm, key, time_signature, split, musicality_score).
pip install git+https://github.com/dobidu/audio_sample_manager
musicgen index-audio --dataset ./dataset --out ./audio_db.jsonIndexes generated WAV stems into a SampleManager database alongside external audio libraries — enables unified cross-library queries (e.g., "all bass stems at 90 BPM in A minor").
from musicgen import generate, generate_batch, Config, SampleResult, BatchResult
# Single sample
result = generate(Config(global_seed=42, sample_index=0, dataset_root="./dataset"))
print(result.sample_dir) # "./dataset/000000"
print(result.split) # "train" | "valid" | "test"
print(result.musicality_score) # float
print(result.status) # "ok" | "failed"
# Batch
result = generate_batch(Config(global_seed=1, count=32, dataset_root="./dataset", workers=4))
print(result.succeeded, result.failed, result.skipped)Re-running with the same (global_seed, sample_index) short-circuits when sample.json exists — batches are idempotent.
Config is a @dataclass with three precedence layers: CLI args > env vars > defaults.
| Field | Default | Env var | Notes |
|---|---|---|---|
global_seed |
None | — | Required at generate time. |
sample_index |
0 |
— | Per-sample identity within dataset. |
dataset_root |
<repo>/dataset |
MUSICGEN_DATASET_ROOT |
Output directory. |
count |
1 |
MUSICGEN_COUNT |
Samples per generate_batch. |
workers |
None (all cores) | — | generate_batch parallelism. |
output_mode |
"full" |
MUSICGEN_OUTPUT_MODE |
full / mix-only / stems-only / midi-only |
split_ratios |
(0.8, 0.1, 0.1) |
— | Train/valid/test split. |
| Field | Default | Env var | Notes |
|---|---|---|---|
min_musicality_score |
0.0 |
MUSICGEN_MIN_MUSICALITY_SCORE |
0.0 = disabled. |
max_attempts |
1 |
MUSICGEN_MAX_ATTEMPTS |
Max re-roll attempts per sample. |
| Field | Default | Env var | Notes |
|---|---|---|---|
sf_dir |
<repo>/sf |
MUSICGEN_SF_DIR |
Root directory for sf/<layer>/ subdirs. |
auto_download_sf2 |
True |
MUSICGEN_AUTO_DOWNLOAD_SF2 |
Download default SF2 on empty pool. |
assets_toml |
<repo>/assets.toml |
— | Asset registry path. |
soundfont_manager_db |
None | MUSICGEN_SOUNDFONT_MANAGER_DB |
Activates tag-based soundfont selection. |
soundfont_manager_sf_dir |
None | MUSICGEN_SOUNDFONT_MANAGER_SF_DIR |
Base dir for relative SF2 paths in SM db. |
| Field | Default | Env var | Notes |
|---|---|---|---|
genre |
None | MUSICGEN_GENRE (comma-separated) |
Genre name(s) for constrained generation. |
genres_dir |
<repo>/genres |
MUSICGEN_GENRES_DIR |
Root dir for genre spec files. |
| Field | Default | Notes |
|---|---|---|
chord_backend |
"markov" |
"markov" or "neural". Falls back to Markov when model is absent. |
melody_backend |
"markov" |
Same. |
models_dir |
<repo>/models |
Directory with .pt checkpoint files. |
| Field | Default | Notes |
|---|---|---|
shard_width |
0 |
Shard prefix length. 0 = flat (<root>/000042/); 3 = <root>/000/000042/. Range 0–5. |
measures_per_part_override |
None |
Dict overriding per-part measure counts after time-sig scaling. E.g. {"intro": 4, "verse": 8, "chorus": 8, "bridge": 4, "outro": 4} for short listening demos. |
| File | Purpose |
|---|---|
song_structures.json |
Song arrangements (intro/verse/chorus/bridge/outro). |
chord_patterns.txt |
Chord progressions per song part. |
beat_roll_patterns_<sig>.txt |
Drum patterns per time signature. |
inst_probabilities.json |
Per-layer inclusion probabilities. |
levels.json |
Per-layer gain and pan. |
*_fx.json |
FX chain parameter ranges per layer. |
Same global_seed + same sample_index → bit-identical MIDI + bit-identical canonical sample.json regardless of PYTHONHASHSEED. WAV bit-identity holds when the FluidSynth binary version matches.
Five named random.Random instances per sample, derived deterministically from the sample seed:
sample_seed = derive_sample_seed(global_seed, sample_index) # sha256[:8]
rngs = make_rngs(sample_seed)
# params, generators, soundfonts, fx, mix — each seeded with seed ^ offsetZero bare random.* calls anywhere in src/musicgen/ — enforced by an AST static guard. Global random state is never touched.
Regression tests (tests/test_determinism_golden.py):
TestSameProcessStability— fast, no FluidSynth — runsgenerate()twice and assertssha256(sample.json)matches.TestDeterminismGoldens—@pytest.mark.slow— compares SHA-256 artifacts for mix.wav + MIDIs + sample.json across separate process invocations.
src/musicgen/
├── __init__.py # public exports: generate, generate_batch, Config, SampleResult, BatchResult
├── api.py # generate(Config) — composition root; resolve_genre_spec
├── batch.py # generate_batch(Config) → BatchResult via ProcessPoolExecutor
├── cli.py # typer app — all CLI commands
├── config.py (root) # Config dataclass with CLI > env > defaults precedence
├── asset_downloader.py # download SF2/MIDI from assets.toml; auto-trigger on empty pool
├── calibrate.py # FluidSynth pre-roll measurement + .musicgen/ cache
├── seeds.py # derive_sample_seed, make_rngs, save_random_state, assign_split
├── genre.py # GenreSpec, load_genre, merge_genres, resolve_genres
├── sampler.py # SongParams + genre-constrained draws
├── generators/
│ ├── chord.py # Markov/neural chord generation; extended chord vocab
│ ├── melody.py # Markov/neural melody; scale-degree path
│ ├── bassline.py # Bassline generation (keyed to chords + melody)
│ └── beat.py # Drum patterns + swing; genre pattern union
├── neural/ # optional — requires musicgen[neural]
│ ├── model.py # ChordLSTM, MelodyLSTM, NeuralSampler
│ ├── trainer.py # train(), save_model(), load_model()
│ └── sampler.py # sample_chord_neural(), sample_melody_neural()
├── corpus_extractor.py # extract_sequences() — dataset → sequences.json
├── renderer.py # FluidSynth wrapper; ThreadPoolExecutor stem rendering; soundfont selection
├── mixer.py # FX (pedalboard), pydub overlay, layer mask, part concat
├── beats.py # MIDI-tick beat/downbeat extraction (mido), swing-aware
├── annotator.py # pure-function sample.json assembler
├── musicality.py # Layer 1 MIDI quality + Layer 2 audio integrity scorer
├── writer.py # atomic sample dir, sum-of-stems assertion, output_mode routing
├── manifest.py # ManifestWriter (JSONL, append-under-lock)
├── quality.py # score_dataset(), filter_dataset(), quality_report() — batch quality pipeline
├── exporter.py # collect_samples(), export_dataset() — JSONL/CSV/Parquet/HF export
├── sample_composition.py # SampleLayerRule, SampleCompositionConfig
├── sample_mixer.py # BPM stretch, key shift, loop tiling, alongside/substitution
├── sample_builder.py # build_library() — WAV dir → SampleManager JSON
├── midi_indexer.py # index_midi_dataset() — indexes MIDI into MidiManager db
└── audio_indexer.py # index_audio_dataset() — indexes WAV into SampleManager db
Pipeline:
resolve_genre_spec → sampler (genre-constrained draws)
→ generators (chord: LSTM or Markov; melody: LSTM or Markov; bassline/beat: Markov)
→ check_midi_quality (Layer 1: hard + soft symbolic checks, < 5 ms)
→ [re-roll up to max_attempts if score < min_musicality_score]
→ renderer (FluidSynth parallel stems; genre soundfont tags; auto-download on empty pool)
→ mixer (FX + overlay + concat; genre FX profile)
→ beats (MIDI-tick extraction)
→ get_musicality_score (Layer 2: audio integrity + musical analysis)
→ annotator (sample.json dict + pre-roll offset)
→ writer (atomic sample dir + sum-of-stems + output_mode routing)
→ manifest (JSONL append)
→ SampleResult
generate_batch wraps generate in a ProcessPoolExecutor (spawn context) and returns BatchResult.
| Platform | How |
|---|---|
| Google Colab | Click a badge at the top. Each notebook has a setup cell that apt installs FluidSynth + fluid-soundfont-gm and pip-installs musicgen. Demo · Sample composition · Neural generators |
| mybinder.org | JupyterLab in the browser, all deps pre-wired. Cold build ~5–10 min. Launch |
| HuggingFace Spaces | Gradio web UI wrapping musicgen.generate(). Source under hf_space/ (Dockerfile + app.py). See hf_space/README.md. |
pytest -m "not slow" # fast suite — 1648 tests, ~12 s
pytest -m slow # requires FluidSynth binary + populated sf/ pools
pytest # everythingCoverage target: ≥ 80% on pure functions (samplers, generators, annotator, beats, validators).
PRs welcome. Run pytest -m "not slow" before submitting. Project planning lives under .planning/.
See LICENSE.
- music21 for music theory primitives
- FluidSynth for soundfont synthesis
- pedalboard for audio effects
- mido for MIDI manipulation
- librosa for audio analysis