Skip to content

An Easy Way to Try and Demonstrate various LLM with Ollama. Will work with Windows and Linux. Change the Bat files to .sh files for linux

License

Notifications You must be signed in to change notification settings

jksingh72/local_llm_examples

 
 

Repository files navigation

local-llm-example

A lightweight set of programs to test different Large Language models with Ollama.

This project will use a list of models in models.txt You can create a stop.bat to make sure no models are in RAM You then create a get.bat to pull the models You then create a run.bat to run the models.

More info below

GPT-OSS Results

Many are here to see how well GPT-OSS runs on a standard computer. Results run on my hardware.

Table Emphasizing GPU is faster!

Model Execution Tool Processor Quantized Variant Tokens/Second
gpt-oss:20b Ollama 100% CPU MXFP4 7.73
gpt-oss:20b LM Studio 100% GPU MXFP4 23.32
gpt-oss:120b Ollama 100% CPU MXFP4 5.26
gpt-oss:120b LM Studio 100% GPU MXFP4 14.87

Links for information about the models from Ollama.

https://ollama.com/library/gpt-oss:20b https://ollama.com/library/gpt-oss:120b

More Information

Ollama as used here uses the command line. LM Studio can be run from the command line. An exercise for another day. The above results were run by the GUI.

Yes, with more work you can make a very nice table. An exercise left for the user.

Installation of Ollama and this project

I assume you have a place on your disk for your project

Installing git on your system

Windows git https://git-scm.com/downloads/win Download the installer

Linux git Use your package installer. Example ubuntu sudo apt install git -y ; sudo apt install git -y Example Fedora sudo dnf update; sudo dnf install git

Using Git to download the project

git clone https://github.com/steve100/local_llm_examples.git cd local-lmm-examples

If you have not done so already Please download the latest version at:

    https://ollama.com/download

Open OllamaSetup and run the install.
It is very simple:

More information on Installation of Ollama: (from Google)

To install Ollama, download the appropriate installer for your operating system (Windows, macOS, or Linux) from the official Ollama website. For Windows, you can download the setup.exe file and run the installer. Once installed, Ollama can be started by running the ollama serve command in your terminal. After that, you can download and run language models using commands like ollama pull <model_name> and ollama run <model_name>.

This video demonstrates how to install Ollama on Windows and run a language model: https://youtu.be/3W-trR0ROUY

Udemy course

https://www.udemy.com/course/meta-llama-3-demystified-beginner-to-pro-with-ollama/

How to run Qwen - qwen has a /think or /nothink option

https://www.datacamp.com/tutorial/qwen3-ollama

Ollama discussing the new gpt-oss

https://ollama.com/blog/gpt-oss https://ollama.com/library/gpt-oss

Testing these programs

Tested Under Windows 11 pro It should run Under Linux - change the .bat files to .sh files

I used python to create the files to run because it is cross platform.

My Hardware to run models

Mini PC AMD Ryzen™ 7 8845HS 8C/16T 3.8GHz-5.1GHz Frequency 16MB L3 Cache AMD Radeon™ 780M 12-Core 2700 MHz

https://www.bee-link.com/products/beelink-ser8-8845hs RAM: More RAM makes it more interesting. I have 128 Gb NPU: I did not buy the NPU. Debatable use case: Only use for now: Windows 11 for copilot.

Graphics Cards

This project is designed to be simple. It used ollama - and uses the command line.

If it detects your GPU .. that is great!
Else it will run in the CPU.

Usage - of these programs

Modify Models.txt for the list of models you would like to test input1: models.txt

input2: Modify the python program run_models.py to change your query

Keep an eye on how much disk space you will be using with the models The models will be stored in folder .ollama in your home directory This is why you want a lot of fast storage for your disk drive

Not your average windows comamnd - available from windows git or wsl2 du -ch .ollama

Erase the *.bat files if you want to create your own.

-- The script stop_run_bat will write three(3) small bat files. stop_models.py >stop.bat get-models.py >get.bat run_models.py > run.bat

-- stop.bat stops any of the models in models.txt -- get.bat pulls the models from ollama's repository -- run.bat runs each model

Output - all output is to standard out

stop.bat.output.txt - shows the stopping of the models get.bat.output.txt - show the pulling of the models run.bat - runs each model

ps. run_models_fancy.py - a way to run ollama commands from ollama. I found it interesting but not useful.

Things to watch for while running

You can bring up windows task manager and watch Performance metrics.

Other Interesting metrics to watch:

  1. The eval rate. It looks like this. eval rate: 13.53 tokens/s

  2. The memory used by Ollama and if you used your GPU. qwen3:32b 030ee887880f 45 GB 100% CPU 131072 4 minutes from now

  3. We could have used the serve and rest calls. That for another day. ollama serve

  4. On my hardware LM Studio works better than Ollama

  5. Ollama is very easy to setup on Linux and Windows. LM Studio for now is very easy to setup on Windows. Difficult on Linux.

More on AMD graphics cards (gpu)

Ideally you would want to run in the GPU

AMD and Intel "stuck" on gamers and gaming. They just now are making AI cards.

Most cards you will find "consumer grade" cards. Often called RDNA4 They often do not work with the "ROCM" ROCM drivers although AMD is improving

Use Vulcan drivers for now.

LM Studio will install them for you. Ollama will not Neither will use the AMD graphics card under Windows 11.

Recommanded For Linux or Windows .. use a recent version of Win 11 or Fedora Linux https://www.amd.com/en/products/graphics/desktops/radeon.html

Other hardware

A 1Tb disk makes everything much easier.

If you can afford it buy:

  1. An Apple M4 pro or better with at least 64 Gb or 96Gb of RAM 512 Mb or 1Tb disk.

  2. Framework Desktop ( Apple Mac M4 pro speeds ) Mini PC or Desktop AMD Ryzen™ AI Max 385 (soldered) and AMD Ryzen™ AI Max+ 395 (soldered) https://frame.work/desktop

addendum

I ran chat-gpt:20b after the other runs.

You may need a newer version of ollama

ollama pull gpt-oss:20b pulling manifest Error: pull model manifest: 412:

The model you are attempting to pull requires a newer version of Ollama.

Please download the latest version at:

    https://ollama.com/download

Open OllamaSetup and run the install. (no suprises)

GPT-OSS Runs

Ollama

gpt-oss:20b -- Ollama running cpu 7.73 tokens/s

gpt-oss:12ob -- Ollama running cpu 5.26 tokens/s

LM Studo

gpt-oss:20b -- 100% gpu memory 23.32 tokens/s

Used 15.6/16.0  Dedicated GPU mamory

gpt-oss:12ob -- 100% gpu memory split 14.87 tok/sec

Used 15.6/15    Dedicated GPP memory
Used 45.8/55.9  Shared    GPU memory

Here is the Ollama gpt-oss:20b Example: C:\Users\Steve\projects>ollama pull gpt-oss:20b pulling manifest pulling b112e727c6f1: 100% ▕██████████████████████████████████████████████████████████▏ 13 GB pulling fa6710a93d78: 100% ▕██████████████████████████████████████████████████████████▏ 7.2 KB pulling f60356777647: 100% ▕██████████████████████████████████████████████████████████▏ 11 KB pulling d8ba2f9a17b3: 100% ▕██████████████████████████████████████████████████████████▏ 18 B pulling 55c108d8e936: 100% ▕██████████████████████████████████████████████████████████▏ 489 B verifying sha256 digest writing manifest success

C:\Users\Steve\projects>ollama run gpt-oss:20b "Who was Ludwig van Beethoven. Use about 50 words" --verbose Thinking... We need to provide a short answer, about 50 words, about Ludwig van Beethoven. We need to ensure the word count is approximately 50 words. Let's craft something: "Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in transition between Classical and Romantic eras. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing works like Ninth Symphony, Moonlight Sonata, and his later sonatas, influencing generations." Count words: Let's count: Ludwig(1) van2 Beethoven3 (1770–1827)4 was5 a6 German7 composer8 and9 pianist,10 pivotal11 in12 transition13 between14 Classical15 and16 Romantic17 eras.18 Renowned19 for20 symphonies,21 piano22 sonatas,23 and24 operas,25 he26 composed27 despite28 deafness,29 producing30 works31 like32 Ninth33 Symphony,34 Moonlight35 Sonata,36 and37 his38 later39 sonatas,40 influencing41 generations42. That's 42 words. Need about 50. Add 8 words. Add "and his groundbreaking string quartets that shaped future composers' approach." Let's add: and3? Let's add phrase: "and his groundbreaking string quartets that shaped future composers' approach." Count words: "and1 his2 groundbreaking3 string4 quartets5 that6 shaped7 future8 composers'9 approach10." 10 words. Total becomes 42+10=52. Good. We need about 50 words, 52 is okay. Ensure not to exceed too many. Provide answer. ...done thinking.

Ludwig van Beethoven (1770–1827) was a German composer and pianist, pivotal in the transition from Classical to Romantic music. Renowned for symphonies, piano sonatas, and operas, he composed despite deafness, producing masterpieces like the Ninth Symphony, Moonlight Sonata, and his groundbreaking string quartets that shaped future composers’ approach.

total duration: 1m14.8871917s load duration: 14.4378225s prompt eval count: 78 token(s) prompt eval duration: 5.8681307s prompt eval rate: 13.29 tokens/s eval count: 422 token(s) eval duration: 54.5806016s eval rate: 7.73 tokens/s

Why RAM and vRAM is important (unified or shared RAM is best)

ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gpt-oss:20b aa4295ac10c3 50 GB 100% CPU 131072 3 minutes from now

gpt-oss:120b

Not as slow as I would have thought.

slightly different answer: Ludwig van Beethoven (1770‑1827) was a German‑born composer and pianist, pivotal in the transition from Classical to Romantic music. Blind in later years, he created symphonies, sonatas, concertos, and string quartets, including the iconic Ninth Symphony. His works remain central to Western classical repertoire, with enduring influence worldwide, still celebrated today.

total duration: 1m45.0750902s load duration: 17.0121888s prompt eval count: 78 token(s) prompt eval duration: 9.3676108s prompt eval rate: 8.33 tokens/s eval count: 414 token(s) eval duration: 1m18.6941755s eval rate: 5.26 tokens/s

Example of a hallucination

Seen with a low context window on gpt-oss:20b from an unoffical source

Who was Ludwig van Beethoven. Use about 50 words

Ludwig van Beethoven is not an actual historical figure; the name appears to conflate “Ludwig van Beethoven” with the real composer Ludwig van Beethoven, perhaps from a fictional work or a misprint. No reliable record exists of anyone by that exact name—any mention is likely mistaken or literary.

About

An Easy Way to Try and Demonstrate various LLM with Ollama. Will work with Windows and Linux. Change the Bat files to .sh files for linux

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.8%
  • Batchfile 26.2%