Name	Name	Last commit message	Last commit date
parent directory ..
tests	tests
README.md	README.md
__init__.py	__init__.py
cli.py	cli.py
config.py	config.py
evals.py	evals.py
io_utils.py	io_utils.py
models.py	models.py
mutators.py	mutators.py
registry.py	registry.py
replay_builder.py	replay_builder.py
run_autoresearch_skill.py	run_autoresearch_skill.py
run_one_skill_with_llm.py	run_one_skill_with_llm.py
runner.py	runner.py
sdk.py	sdk.py

Name

Last commit message

Last commit date

tests

run_autoresearch_skill.py

run_one_skill_with_llm.py

runner.py

sdk.py

SkillEvo

SkillEvo is a replay-driven skill self-evolution runner for AutoSkill.

It does not modify autoskill/ or write back to the main SkillBank. It reads:

online skill provenance
offline conversation provenance
offline requirement stats
current skill snapshots from SkillBank

Then it runs a local self-evolution loop:

build a frozen replay pool for one skill lineage
compile 3-6 binary eval rules
generate small mutations under a fixed budget
evaluate on mutate_dev
promote only if the candidate beats the current SkillEvo champion on promotion_test

Layout

SkillEvo/
  registry/
  datasets/
  evals/
  runs/
  champions/
  reports/

Commands

Build replay:

python3 -m SkillEvo.cli build-replay --user-id u1 --skill-id <skill_id>

Compile evals:

python3 -m SkillEvo.cli compile-evals --user-id u1 --skill-id <skill_id>

Run the full self-evolution loop:

python3 -m SkillEvo.cli run --user-id u1 --skill-id <skill_id>

Run one stored skill with a real LLM on one replay sample:

python3 -m SkillEvo.run_one_skill_with_llm \
  --user-id WildChat_4.8M_qwen \
  --skill-id 2158daa6-570a-485f-8e0b-ea0a1e1632e5 \
  --llm-provider dashscope \
  --llm-model qwen-plus \
  --sample-split mutate_dev \
  --sample-index 0

Run one stored skill on ad-hoc input:

python3 -m SkillEvo.run_one_skill_with_llm \
  --user-id WildChat_4.8M_qwen \
  --skill-id 2158daa6-570a-485f-8e0b-ea0a1e1632e5 \
  --llm-provider dashscope \
  --llm-model qwen-plus \
  --custom-input 'input_template：请看这个句子：{{sentence}}，这句话里有没有人名？'

Run the full autoresearch loop with real LLMs:

python3 -m SkillEvo.run_autoresearch_skill \
  --user-id WildChat_4.8M_qwen \
  --skill-id 2158daa6-570a-485f-8e0b-ea0a1e1632e5 \
  --llm-provider dashscope \
  --llm-model qwen-plus \
  --judge-provider dashscope \
  --judge-model qwen-plus \
  --mutation-mode hybrid \
  --mutation-budget 6 \
  --min-replay-samples 2

Current MVP

Implemented:

online replay reconstruction from stored history[].messages
offline replay reconstruction via source_file + conversation_index
lineage registry and replay dataset persistence
heuristic eval compilation from prompt + requirement stats
programmatic + judge-LLM binary rule engine
heuristic mutations plus optional LLM-guided mutation
local champion registry under SkillEvo/champions/

Not implemented yet:

retrieval-only evals
automatic write-back into the main SkillBank
large-scale tournament scheduling across many lineages

Notes

If replay data is too small, the runner keeps the lineage in incubating.
The default config uses the AutoSkill store at ./SkillBank.
TOML config loading is optional; when TOML support is unavailable the default config still works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SkillEvo

Layout

Commands

Current MVP

Notes

FilesExpand file tree

SkillEvo

Directory actions

More options

Directory actions

More options

Latest commit

History

SkillEvo

Folders and files

parent directory

README.md

SkillEvo

Layout

Commands

Current MVP

Notes