local-llm-example

A lightweight set of programs to test different Large Language models with Ollama.

This project will use a list of models in models.txt You can create a stop.bat to make sure no models are in RAM You then create a get.bat to pull the models You then create a run.bat to run the models.

More info below

GPT-OSS Results

Many are here to see how well GPT-OSS runs on a standard computer. Results run on my hardware.

Table Emphasizing GPU is faster!

Model	Execution Tool	Processor	Quantized Variant	Tokens/Second
gpt-oss:20b	Ollama	100% CPU	MXFP4	7.73
gpt-oss:20b	LM Studio	100% GPU	MXFP4	23.32
gpt-oss:120b	Ollama	100% CPU	MXFP4	5.26
gpt-oss:120b	LM Studio	100% GPU	MXFP4	14.87

Links for information about the models from Ollama.

https://ollama.com/library/gpt-oss:20b https://ollama.com/library/gpt-oss:120b

More Information

Ollama as used here uses the command line. LM Studio can be run from the command line. An exercise for another day. The above results were run by the GUI.

Yes, with more work you can make a very nice table. An exercise left for the user.

Installation of Ollama and this project

I assume you have a place on your disk for your project

Installing git on your system

Windows git https://git-scm.com/downloads/win Download the installer

Linux git Use your package installer. Example ubuntu sudo apt install git -y ; sudo apt install git -y Example Fedora sudo dnf update; sudo dnf install git

Using Git to download the project

git clone https://github.com/steve100/local_llm_examples.git cd local-lmm-examples

If you have not done so already Please download the latest version at:

    https://ollama.com/download

Open OllamaSetup and run the install.
It is very simple:

More information on Installation of Ollama: (from Google)

To install Ollama, download the appropriate installer for your operating system (Windows, macOS, or Linux) from the official Ollama website. For Windows, you can download the setup.exe file and run the installer. Once installed, Ollama can be started by running the ollama serve command in your terminal. After that, you can download and run language models using commands like ollama pull <model_name> and ollama run <model_name>.

This video demonstrates how to install Ollama on Windows and run a language model: https://youtu.be/3W-trR0ROUY

Testing these programs

Tested Under Windows 11 pro It should run Under Linux - change the .bat files to .sh files

I used python to create the files to run because it is cross platform.

My Hardware to run models

Mini PC AMD Ryzen™ 7 8845HS 8C/16T 3.8GHz-5.1GHz Frequency 16MB L3 Cache AMD Radeon™ 780M 12-Core 2700 MHz

https://www.bee-link.com/products/beelink-ser8-8845hs RAM: More RAM makes it more interesting. I have 128 Gb NPU: I did not buy the NPU. Debatable use case: Only use for now: Windows 11 for copilot.

Graphics Cards

This project is designed to be simple. It used ollama - and uses the command line.

If it detects your GPU .. that is great!
Else it will run in the CPU.

Usage - of these programs

Modify Models.txt for the list of models you would like to test input1: models.txt

input2: Modify the python program run_models.py to change your query

Keep an eye on how much disk space you will be using with the models The models will be stored in folder .ollama in your home directory This is why you want a lot of fast storage for your disk drive

Not your average windows comamnd - available from windows git or wsl2 du -ch .ollama

Erase the *.bat files if you want to create your own.

-- The script stop_run_bat will write three(3) small bat files. stop_models.py >stop.bat get-models.py >get.bat run_models.py > run.bat

-- stop.bat stops any of the models in models.txt -- get.bat pulls the models from ollama's repository -- run.bat runs each model

Output - all output is to standard out

stop.bat.output.txt - shows the stopping of the models get.bat.output.txt - show the pulling of the models run.bat - runs each model

ps. run_models_fancy.py - a way to run ollama commands from ollama. I found it interesting but not useful.

Things to watch for while running

You can bring up windows task manager and watch Performance metrics.

Other Interesting metrics to watch:

The eval rate. It looks like this. eval rate: 13.53 tokens/s
The memory used by Ollama and if you used your GPU. qwen3:32b 030ee887880f 45 GB 100% CPU 131072 4 minutes from now
We could have used the serve and rest calls. That for another day. ollama serve
On my hardware LM Studio works better than Ollama
Ollama is very easy to setup on Linux and Windows. LM Studio for now is very easy to setup on Windows. Difficult on Linux.

More on AMD graphics cards (gpu)

Ideally you would want to run in the GPU

AMD and Intel "stuck" on gamers and gaming. They just now are making AI cards.

Most cards you will find "consumer grade" cards. Often called RDNA4 They often do not work with the "ROCM" ROCM drivers although AMD is improving

Use Vulcan drivers for now.

LM Studio will install them for you. Ollama will not Neither will use the AMD graphics card under Windows 11.

Recommanded For Linux or Windows .. use a recent version of Win 11 or Fedora Linux https://www.amd.com/en/products/graphics/desktops/radeon.html

Other hardware

A 1Tb disk makes everything much easier.

If you can afford it buy:

An Apple M4 pro or better with at least 64 Gb or 96Gb of RAM 512 Mb or 1Tb disk.
Framework Desktop ( Apple Mac M4 pro speeds ) Mini PC or Desktop AMD Ryzen™ AI Max 385 (soldered) and AMD Ryzen™ AI Max+ 395 (soldered) https://frame.work/desktop

addendum

I ran chat-gpt:20b after the other runs.

You may need a newer version of ollama

ollama pull gpt-oss:20b pulling manifest Error: pull model manifest: 412:

The model you are attempting to pull requires a newer version of Ollama.

Please download the latest version at:

    https://ollama.com/download

Open OllamaSetup and run the install. (no suprises)

GPT-OSS Runs

Ollama

gpt-oss:20b -- Ollama running cpu 7.73 tokens/s

gpt-oss:12ob -- Ollama running cpu 5.26 tokens/s

LM Studo

gpt-oss:20b -- 100% gpu memory 23.32 tokens/s

Used 15.6/16.0  Dedicated GPU mamory

gpt-oss:12ob -- 100% gpu memory split 14.87 tok/sec

Used 15.6/15    Dedicated GPP memory
Used 45.8/55.9  Shared    GPU memory

Here is the Ollama gpt-oss:20b Example: C:\Users\Steve\projects>ollama pull gpt-oss:20b pulling manifest pulling b112e727c6f1: 100% ▕██████████████████████████████████████████████████████████▏ 13 GB pulling fa6710a93d78: 100% ▕██████████████████████████████████████████████████████████▏ 7.2 KB pulling f60356777647: 100% ▕██████████████████████████████████████████████████████████▏ 11 KB pulling d8ba2f9a17b3: 100% ▕██████████████████████████████████████████████████████████▏ 18 B pulling 55c108d8e936: 100% ▕██████████████████████████████████████████████████████████▏ 489 B verifying sha256 digest writing manifest success

C:\Users\Steve\projects>ollama run gpt-oss:20b "Who was Ludwig van Beethoven. Use about 50 words" --verbose Thinking... We need to provide a short answer, about 50 words, about Ludwig van Beethoven. We need to ensure the word count is approximately 50 words. Let's craft something: "Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in transition between Classical and Romantic eras. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing works like Ninth Symphony, Moonlight Sonata, and his later sonatas, influencing generations." Count words: Let's count: Ludwig(1) van2 Beethoven3 (1770–1827)4 was5 a6 German7 composer8 and9 pianist,10 pivotal11 in12 transition13 between14 Classical15 and16 Romantic17 eras.18 Renowned19 for20 symphonies,21 piano22 sonatas,23 and24 operas,25 he26 composed27 despite28 deafness,29 producing30 works31 like32 Ninth33 Symphony,34 Moonlight35 Sonata,36 and37 his38 later39 sonatas,40 influencing41 generations42. That's 42 words. Need about 50. Add 8 words. Add "and his groundbreaking string quartets that shaped future composers' approach." Let's add: and3? Let's add phrase: "and his groundbreaking string quartets that shaped future composers' approach." Count words: "and1 his2 groundbreaking3 string4 quartets5 that6 shaped7 future8 composers'9 approach10." 10 words. Total becomes 42+10=52. Good. We need about 50 words, 52 is okay. Ensure not to exceed too many. Provide answer. ...done thinking.

Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in the transition from Classical to Romantic music. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing masterpieces like the Ninth Symphony, Moonlight Sonata, and his groundbreaking string quartets that shaped future composers’ approach.

total duration: 1m14.8871917s load duration: 14.4378225s prompt eval count: 78 token(s) prompt eval duration: 5.8681307s prompt eval rate: 13.29 tokens/s eval count: 422 token(s) eval duration: 54.5806016s eval rate: 7.73 tokens/s

Why RAM and vRAM is important (unified or shared RAM is best)

ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:20b aa4295ac10c3 50 GB 100% CPU 131072 3 minutes from now

gpt-oss:120b

Not as slow as I would have thought.

slightly different answer: Ludwig van Beethoven (1770‑1827) was a German‑born composer and pianist, pivotal in the transition from Classical to Romantic music. Blind in later years, he created symphonies, sonatas, concertos, and string quartets, including the iconic Ninth Symphony. His works remain central to Western classical repertoire, with enduring influence worldwide, still celebrated today.

total duration: 1m45.0750902s load duration: 17.0121888s prompt eval count: 78 token(s) prompt eval duration: 9.3676108s prompt eval rate: 8.33 tokens/s eval count: 414 token(s) eval duration: 1m18.6941755s eval rate: 5.26 tokens/s

Example of a hallucination

Seen with a low context window on gpt-oss:20b from an unoffical source

Who was Ludwig van Beethoven. Use about 50 words

Ludwig van Beethoven is not an actual historical figure; the name appears to conflate “Ludwig van Beethoven” with the real composer Ludwig van Beethoven, perhaps from a fictional work or a misprint. No reliable record exists of anyone by that exact name—any mention is likely mistaken or literary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

local-llm-example

GPT-OSS Results

Links for information about the models from Ollama.

More Information

Installation of Ollama and this project

Installing git on your system

Using Git to download the project

More information on Installation of Ollama: (from Google)

Udemy course

How to run Qwen - qwen has a /think or /nothink option

Ollama discussing the new gpt-oss

Testing these programs

My Hardware to run models

Graphics Cards

Usage - of these programs

Output - all output is to standard out

Things to watch for while running

More on AMD graphics cards (gpu)

Other hardware

addendum

GPT-OSS Runs

Ollama

gpt-oss:20b -- Ollama running cpu 7.73 tokens/s

gpt-oss:12ob -- Ollama running cpu 5.26 tokens/s

LM Studo

gpt-oss:20b -- 100% gpu memory 23.32 tokens/s

gpt-oss:12ob -- 100% gpu memory split 14.87 tok/sec

gpt-oss:120b

Example of a hallucination

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
code.code-workspace		code.code-workspace
get-models.py		get-models.py
get.bat		get.bat
get.bat.output.txt		get.bat.output.txt
models.txt		models.txt
run.bat		run.bat
run.bat.output.txt		run.bat.output.txt
run_models.py		run_models.py
run_models_fancy.py		run_models_fancy.py
stop.bat		stop.bat
stop.bat.output.txt		stop.bat.output.txt
stop_get_run.bat		stop_get_run.bat
stop_models.py		stop_models.py

License

jksingh72/local_llm_examples

Folders and files

Latest commit

History

Repository files navigation

local-llm-example

GPT-OSS Results

Links for information about the models from Ollama.

More Information

Installation of Ollama and this project

Installing git on your system

Using Git to download the project

More information on Installation of Ollama: (from Google)

Udemy course

How to run Qwen - qwen has a /think or /nothink option

Ollama discussing the new gpt-oss

Testing these programs

My Hardware to run models

Graphics Cards

Usage - of these programs

Output - all output is to standard out

Things to watch for while running

More on AMD graphics cards (gpu)

Other hardware

addendum

GPT-OSS Runs

Ollama

gpt-oss:20b -- Ollama running cpu 7.73 tokens/s

gpt-oss:12ob -- Ollama running cpu 5.26 tokens/s

LM Studo

gpt-oss:20b -- 100% gpu memory 23.32 tokens/s

gpt-oss:12ob -- 100% gpu memory split 14.87 tok/sec

gpt-oss:120b

Example of a hallucination

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages