A lightweight set of programs to test different Large Language models with Ollama.
This project will use a list of models in models.txt You can create a stop.bat to make sure no models are in RAM You then create a get.bat to pull the models You then create a run.bat to run the models.
More info below
Many are here to see how well GPT-OSS runs on a standard computer. Results run on my hardware.
Table Emphasizing GPU is faster!
| Model | Execution Tool | Processor | Quantized Variant | Tokens/Second |
|---|---|---|---|---|
| gpt-oss:20b | Ollama | 100% CPU | MXFP4 | 7.73 |
| gpt-oss:20b | LM Studio | 100% GPU | MXFP4 | 23.32 |
| gpt-oss:120b | Ollama | 100% CPU | MXFP4 | 5.26 |
| gpt-oss:120b | LM Studio | 100% GPU | MXFP4 | 14.87 |
https://ollama.com/library/gpt-oss:20b https://ollama.com/library/gpt-oss:120b
Ollama as used here uses the command line. LM Studio can be run from the command line. An exercise for another day. The above results were run by the GUI.
Yes, with more work you can make a very nice table. An exercise left for the user.
I assume you have a place on your disk for your project
Windows git https://git-scm.com/downloads/win Download the installer
Linux git Use your package installer. Example ubuntu sudo apt install git -y ; sudo apt install git -y Example Fedora sudo dnf update; sudo dnf install git
git clone https://github.com/steve100/local_llm_examples.git cd local-lmm-examples
If you have not done so already Please download the latest version at:
https://ollama.com/download
Open OllamaSetup and run the install.
It is very simple:
To install Ollama, download the appropriate installer for your operating system (Windows, macOS, or Linux) from the official Ollama website. For Windows, you can download the setup.exe file and run the installer. Once installed, Ollama can be started by running the ollama serve command in your terminal. After that, you can download and run language models using commands like ollama pull <model_name> and ollama run <model_name>.
This video demonstrates how to install Ollama on Windows and run a language model: https://youtu.be/3W-trR0ROUY
https://www.udemy.com/course/meta-llama-3-demystified-beginner-to-pro-with-ollama/
https://www.datacamp.com/tutorial/qwen3-ollama
https://ollama.com/blog/gpt-oss https://ollama.com/library/gpt-oss
Tested Under Windows 11 pro It should run Under Linux - change the .bat files to .sh files
I used python to create the files to run because it is cross platform.
Mini PC AMD Ryzen™ 7 8845HS 8C/16T 3.8GHz-5.1GHz Frequency 16MB L3 Cache AMD Radeon™ 780M 12-Core 2700 MHz
https://www.bee-link.com/products/beelink-ser8-8845hs RAM: More RAM makes it more interesting. I have 128 Gb NPU: I did not buy the NPU. Debatable use case: Only use for now: Windows 11 for copilot.
This project is designed to be simple. It used ollama - and uses the command line.
If it detects your GPU .. that is great!
Else it will run in the CPU.
Modify Models.txt for the list of models you would like to test input1: models.txt
input2: Modify the python program run_models.py to change your query
Keep an eye on how much disk space you will be using with the models The models will be stored in folder .ollama in your home directory This is why you want a lot of fast storage for your disk drive
Not your average windows comamnd - available from windows git or wsl2 du -ch .ollama
Erase the *.bat files if you want to create your own.
-- The script stop_run_bat will write three(3) small bat files. stop_models.py >stop.bat get-models.py >get.bat run_models.py > run.bat
-- stop.bat stops any of the models in models.txt -- get.bat pulls the models from ollama's repository -- run.bat runs each model
stop.bat.output.txt - shows the stopping of the models get.bat.output.txt - show the pulling of the models run.bat - runs each model
ps. run_models_fancy.py - a way to run ollama commands from ollama. I found it interesting but not useful.
You can bring up windows task manager and watch Performance metrics.
Other Interesting metrics to watch:
-
The eval rate. It looks like this. eval rate: 13.53 tokens/s
-
The memory used by Ollama and if you used your GPU. qwen3:32b 030ee887880f 45 GB 100% CPU 131072 4 minutes from now
-
We could have used the serve and rest calls. That for another day. ollama serve
-
On my hardware LM Studio works better than Ollama
-
Ollama is very easy to setup on Linux and Windows. LM Studio for now is very easy to setup on Windows. Difficult on Linux.
Ideally you would want to run in the GPU
AMD and Intel "stuck" on gamers and gaming. They just now are making AI cards.
Most cards you will find "consumer grade" cards. Often called RDNA4 They often do not work with the "ROCM" ROCM drivers although AMD is improving
Use Vulcan drivers for now.
LM Studio will install them for you. Ollama will not Neither will use the AMD graphics card under Windows 11.
Recommanded For Linux or Windows .. use a recent version of Win 11 or Fedora Linux https://www.amd.com/en/products/graphics/desktops/radeon.html
A 1Tb disk makes everything much easier.
If you can afford it buy:
-
An Apple M4 pro or better with at least 64 Gb or 96Gb of RAM 512 Mb or 1Tb disk.
-
Framework Desktop ( Apple Mac M4 pro speeds ) Mini PC or Desktop AMD Ryzen™ AI Max 385 (soldered) and AMD Ryzen™ AI Max+ 395 (soldered) https://frame.work/desktop
I ran chat-gpt:20b after the other runs.
You may need a newer version of ollama
ollama pull gpt-oss:20b pulling manifest Error: pull model manifest: 412:
The model you are attempting to pull requires a newer version of Ollama.
Please download the latest version at:
https://ollama.com/download
Open OllamaSetup and run the install. (no suprises)
Used 15.6/16.0 Dedicated GPU mamory
Used 15.6/15 Dedicated GPP memory
Used 45.8/55.9 Shared GPU memory
Here is the Ollama gpt-oss:20b Example: C:\Users\Steve\projects>ollama pull gpt-oss:20b pulling manifest pulling b112e727c6f1: 100% ▕██████████████████████████████████████████████████████████▏ 13 GB pulling fa6710a93d78: 100% ▕██████████████████████████████████████████████████████████▏ 7.2 KB pulling f60356777647: 100% ▕██████████████████████████████████████████████████████████▏ 11 KB pulling d8ba2f9a17b3: 100% ▕██████████████████████████████████████████████████████████▏ 18 B pulling 55c108d8e936: 100% ▕██████████████████████████████████████████████████████████▏ 489 B verifying sha256 digest writing manifest success
C:\Users\Steve\projects>ollama run gpt-oss:20b "Who was Ludwig van Beethoven. Use about 50 words" --verbose Thinking... We need to provide a short answer, about 50 words, about Ludwig van Beethoven. We need to ensure the word count is approximately 50 words. Let's craft something: "Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in transition between Classical and Romantic eras. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing works like Ninth Symphony, Moonlight Sonata, and his later sonatas, influencing generations." Count words: Let's count: Ludwig(1) van2 Beethoven3 (1770–1827)4 was5 a6 German7 composer8 and9 pianist,10 pivotal11 in12 transition13 between14 Classical15 and16 Romantic17 eras.18 Renowned19 for20 symphonies,21 piano22 sonatas,23 and24 operas,25 he26 composed27 despite28 deafness,29 producing30 works31 like32 Ninth33 Symphony,34 Moonlight35 Sonata,36 and37 his38 later39 sonatas,40 influencing41 generations42. That's 42 words. Need about 50. Add 8 words. Add "and his groundbreaking string quartets that shaped future composers' approach." Let's add: and3? Let's add phrase: "and his groundbreaking string quartets that shaped future composers' approach." Count words: "and1 his2 groundbreaking3 string4 quartets5 that6 shaped7 future8 composers'9 approach10." 10 words. Total becomes 42+10=52. Good. We need about 50 words, 52 is okay. Ensure not to exceed too many. Provide answer. ...done thinking.
Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in the transition from Classical to Romantic music. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing masterpieces like the Ninth Symphony, Moonlight Sonata, and his groundbreaking string quartets that shaped future composers’ approach.
total duration: 1m14.8871917s load duration: 14.4378225s prompt eval count: 78 token(s) prompt eval duration: 5.8681307s prompt eval rate: 13.29 tokens/s eval count: 422 token(s) eval duration: 54.5806016s eval rate: 7.73 tokens/s
Why RAM and vRAM is important (unified or shared RAM is best)
ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:20b aa4295ac10c3 50 GB 100% CPU 131072 3 minutes from now
Not as slow as I would have thought.
slightly different answer: Ludwig van Beethoven (1770‑1827) was a German‑born composer and pianist, pivotal in the transition from Classical to Romantic music. Blind in later years, he created symphonies, sonatas, concertos, and string quartets, including the iconic Ninth Symphony. His works remain central to Western classical repertoire, with enduring influence worldwide, still celebrated today.
total duration: 1m45.0750902s load duration: 17.0121888s prompt eval count: 78 token(s) prompt eval duration: 9.3676108s prompt eval rate: 8.33 tokens/s eval count: 414 token(s) eval duration: 1m18.6941755s eval rate: 5.26 tokens/s
Seen with a low context window on gpt-oss:20b from an unoffical source
Who was Ludwig van Beethoven. Use about 50 words
Ludwig van Beethoven is not an actual historical figure; the name appears to conflate “Ludwig van Beethoven” with the real composer Ludwig van Beethoven, perhaps from a fictional work or a misprint. No reliable record exists of anyone by that exact name—any mention is likely mistaken or literary.