ollama-repo-to-training-data

This tool is built in python. It creates many Training Data files numbered randomly ending in .td e.g. 956309819349.td

It is run by running python ortd.py [path to git repo] "[preamble for each training sample created]"

e.g. python ortd.py ../src/github.com/pavebank/pavebank "PaveBank core banking system"

The basic loop is to recurse through a code base over each source file (.go and .py only for now). for each file found:

create a training data file containing the whole source
create a training data file for each function
propose 5 hypothetical questions about the content of the file for example ** "This authentication handler uses the oauth2 middleware, what other handlers use this?" ** "This function uses the sql orm, where is the up/down schema migration defined?"

The ollama llm system with a model called "devstral" is running locally. All questions are provided to the local llm to answer.

Tools are defined and implemented by this system including:

list_files - lists files in a directory
list_files_recursively - lists files recursively in a directory
find_in_files - searches in a file or deep search if given a directory
implement other tools as necessesary

A queue is implemented to handle questions to be sent to the llm. Breadth first search rather than depth first should be preferred.

The goal is to create man training data files, each answering a sinle question about the code base. Each file will begin with the pre-amble provided on the code base and look like this: (simplified example)

** Pave Bank Core banking system **
Implemented in pavebank/auth/handler.go is a function called get_user.
How does this work?
The get_user function defined in pavebank/auth/handler.go makes an RPC call to ...

td files should contain extensive code snippits and comments and explain assumed business logic.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
ortd.py		ortd.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ollama-repo-to-training-data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

simonvc/ollama-repo-to-training-data

Folders and files

Latest commit

History

Repository files navigation

ollama-repo-to-training-data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages