Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
8498429
`main`/`server`: rename to `llama` / `llama-server` for consistency w…
Jun 6, 2024
f298cc6
server: update refs -> llama-server
Jun 6, 2024
f5f19a2
server: simplify nix package
Jun 6, 2024
8b7c734
main: update refs -> llama
Jun 6, 2024
9a03341
main/server: fix targets
Jun 6, 2024
8695bae
update more names
Jun 6, 2024
a0a7f2b
Update build.yml
Jun 6, 2024
fbd8313
Merge remote-tracking branch 'origin/master' into bins
Jun 6, 2024
99df4cc
rm accidentally checked in bins
Jun 7, 2024
7fbe600
update straggling refs
Jun 7, 2024
af8f016
Update .gitignore
Jun 7, 2024
0dba582
Update server-llm.sh
Jun 7, 2024
fe93cc9
Merge remote-tracking branch 'origin/master' into bins
Jun 8, 2024
23d0df5
main: target name -> llama-cli
Jun 8, 2024
ab5efbb
Prefix all example bins w/ llama-
Jun 8, 2024
78bca8c
fix main refs
Jun 8, 2024
10650b6
rename {main->llama}-cmake-pkg binary
Jun 8, 2024
81222f0
prefix more cmake targets w/ llama-
Jun 8, 2024
b648243
add/fix gbnf-validator subfolder to cmake
Jun 8, 2024
eef922e
sort cmake example subdirs
Jun 8, 2024
b0eb3b8
rm bin files
Jun 8, 2024
efaa441
fix llama-lookup-* Makefile rules
Jun 8, 2024
78eae7f
gitignore /llama-*
Jun 8, 2024
347f308
rename Dockerfiles
Jun 8, 2024
5265c15
rename llama|main -> llama-cli; consistent RPM bin prefixes
Jun 10, 2024
daeaeb1
Merge remote-tracking branch 'origin/master' into bins
Jun 10, 2024
0bb2a3f
fix some missing -cli suffixes
Jun 10, 2024
0fcf2c3
rename dockerfile w/ llama-cli
Jun 10, 2024
1cc6514
rename(make): llama-baby-llama
Jun 10, 2024
051633e
update dockerfile refs
Jun 10, 2024
b8cb44e
more llama-cli(.exe)
Jun 10, 2024
4881a94
fix test-eval-callback
Jun 10, 2024
b843639
rename: llama-cli-cmake-pkg(.exe)
Jun 10, 2024
f9cfd04
address gbnf-validator unused fread warning (switched to C++ / ifstream)
Jun 10, 2024
0be5f39
add two missing llama- prefixes
Jun 10, 2024
e7e0373
Updating docs for eval-callback binary to use new `llama-` prefix.
HanClinto Jun 10, 2024
2fd66b2
Updating a few lingering doc references for rename of main to llama-cli
HanClinto Jun 10, 2024
72660c3
Updating `run-with-preset.py` to use new binary names.
HanClinto Jun 10, 2024
70de0de
Updating documentation references for lookup-merge and export-lora
HanClinto Jun 10, 2024
82df7f9
Merge pull request #1 from HanClinto/bins-rename-nits
ochafik Jun 10, 2024
1f5ec2c
Updating two small `main` references missed earlier in the finetune d…
HanClinto Jun 10, 2024
8cf8c12
Update apps.nix
Jun 10, 2024
2a9c4cd
Merge remote-tracking branch 'origin/master' into bins
Jun 11, 2024
166397f
update grammar/README.md w/ new llama-* names
Jun 11, 2024
ee3a086
Merge pull request #2 from HanClinto/bins-nits-2
ochafik Jun 11, 2024
e474ef1
update llama-rpc-server bin name + doc
Jun 11, 2024
be66f9e
Revert "update llama-rpc-server bin name + doc"
Jun 12, 2024
ceb2859
Merge remote-tracking branch 'origin/master' into bins
Jun 12, 2024
08da184
add hot topic notice to README.md
Jun 12, 2024
ecdde74
Update README.md
ochafik Jun 12, 2024
1910241
Update README.md
ochafik Jun 12, 2024
48e5009
rename gguf-split & quantize bins refs in **/tests.sh
Jun 12, 2024
73d4a4a
Merge branch 'bins' of https://github.com/ochafik/llama.cpp into bins
Jun 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
server: update refs -> llama-server
gitignore llama-server
  • Loading branch information
Olivier Chafik committed Jun 6, 2024
commit f298cc63d2cec6cfa72446b8e7f4ec5448f3fd54
6 changes: 3 additions & 3 deletions .devops/server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ ENV LLAMA_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make -j$(nproc) server
RUN make -j$(nproc) llama-server

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1

COPY --from=build /app/server /server
COPY --from=build /app/llama-server /llama-server

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
4 changes: 2 additions & 2 deletions .devops/server-intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRO
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

COPY --from=build /app/build/bin/server /server
COPY --from=build /app/build/bin/llama-server /llama-server

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
4 changes: 2 additions & 2 deletions .devops/server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,6 @@ ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev

RUN make -j$(nproc)
RUN make -j$(nproc) llama-server

ENTRYPOINT [ "/app/server" ]
ENTRYPOINT [ "/app/llama-server" ]
4 changes: 2 additions & 2 deletions .devops/server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ RUN cmake -B build -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \

# Clean up
WORKDIR /
RUN cp /app/build/bin/server /server && \
RUN cp /app/build/bin/llama-server /llama-server && \
rm -rf /app

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
6 changes: 3 additions & 3 deletions .devops/server.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ COPY . .

ENV LLAMA_CURL=1

RUN make -j$(nproc) server
RUN make -j$(nproc) llama-server

FROM ubuntu:$UBUNTU_VERSION as runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1

COPY --from=build /app/server /server
COPY --from=build /app/llama-server /llama-server

ENV LC_ALL=C.utf8

ENTRYPOINT [ "/server" ]
ENTRYPOINT [ "/llama-server" ]
2 changes: 1 addition & 1 deletion .devops/tools.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
fi
done
elif [[ "$arg1" == '--server' || "$arg1" == '-s' ]]; then
./server "$@"
./llama-server "$@"
else
echo "Unknown command: $arg1"
echo "Available commands: "
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ models-mnt
/quantize-stats
/result
/save-load-state
/server
/llama-server
/simple
/batched
/batched-bench
Expand Down
2 changes: 1 addition & 1 deletion examples/json-schema-pydantic-example.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Usage:
#! ./server -m some-model.gguf &
#! ./llama-server -m some-model.gguf &
#! pip install pydantic
#! python json-schema-pydantic-example.py

Expand Down
2 changes: 1 addition & 1 deletion examples/server-llama2-13B.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ GEN_OPTIONS="${GEN_OPTIONS:---ctx_size 4096 --batch-size 1024}"


# shellcheck disable=SC2086 # Intended splitting of GEN_OPTIONS
./server $GEN_OPTIONS \
./llama-server $GEN_OPTIONS \
--model "$MODEL" \
--threads "$N_THREAD" \
--rope-freq-scale 1.0 \
Expand Down
22 changes: 11 additions & 11 deletions examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,41 +80,41 @@ The project is under active development, and we are [looking for feedback and co

## Build

`server` is built alongside everything else from the root of the project
`llama-server` is built alongside everything else from the root of the project

- Using `make`:

```bash
make server
make llama-server
```

- Using `CMake`:

```bash
cmake -B build
cmake --build build --config Release -t server
cmake --build build --config Release -t llama-server
```

Binary is at `./build/bin/server`
Binary is at `./build/bin/llama-server`

## Build with SSL

`server` can also be built with SSL support using OpenSSL 3
`llama-server` can also be built with SSL support using OpenSSL 3

- Using `make`:

```bash
# NOTE: For non-system openssl, use the following:
# CXXFLAGS="-I /path/to/openssl/include"
# LDFLAGS="-L /path/to/openssl/lib"
make LLAMA_SERVER_SSL=true server
make LLAMA_SERVER_SSL=true llama-server
```

- Using `CMake`:

```bash
cmake -B build -DLLAMA_SERVER_SSL=ON
cmake --build build --config Release -t server
cmake --build build --config Release -t llama-server
```

## Quick Start
Expand All @@ -124,13 +124,13 @@ To get started right away, run the following command, making sure to use the cor
### Unix-based systems (Linux, macOS, etc.)

```bash
./server -m models/7B/ggml-model.gguf -c 2048
./llama-server -m models/7B/ggml-model.gguf -c 2048
```

### Windows

```powershell
server.exe -m models\7B\ggml-model.gguf -c 2048
llama-server.exe -m models\7B\ggml-model.gguf -c 2048
```

The above command will start a server that by default listens on `127.0.0.1:8080`.
Expand Down Expand Up @@ -629,11 +629,11 @@ bash chat.sh

### OAI-like API

The HTTP `server` supports an OAI-like API: https://github.com/openai/openai-openapi
The HTTP `llama-server` supports an OAI-like API: https://github.com/openai/openai-openapi

### API errors

`server` returns errors in the same format as OAI: https://github.com/openai/openai-openapi
`llama-server` returns errors in the same format as OAI: https://github.com/openai/openai-openapi

Example of an error:

Expand Down
2 changes: 1 addition & 1 deletion examples/server/bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ The `bench.py` script does several steps:
It aims to be used in the CI, but you can run it manually:

```shell
LLAMA_SERVER_BIN_PATH=../../../cmake-build-release/bin/server python bench.py \
LLAMA_SERVER_BIN_PATH=../../../cmake-build-release/bin/llama-server python bench.py \
--runner-label local \
--name local \
--branch `git rev-parse --abbrev-ref HEAD` \
Expand Down
2 changes: 1 addition & 1 deletion examples/server/bench/bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ def start_server(args):

def start_server_background(args):
# Start the server
server_path = '../../../build/bin/server'
server_path = '../../../build/bin/llama-server'
if 'LLAMA_SERVER_BIN_PATH' in os.environ:
server_path = os.environ['LLAMA_SERVER_BIN_PATH']
server_args = [
Expand Down
4 changes: 2 additions & 2 deletions examples/server/public_simplechat/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,12 @@ http module.

### running using examples/server

bin/server -m path/model.gguf --path ../examples/server/public_simplechat [--port PORT]
./llama-server -m path/model.gguf --path examples/server/public_simplechat [--port PORT]

### running using python3's server module

first run examples/server
* bin/server -m path/model.gguf
* ./llama-server -m path/model.gguf

next run this web front end in examples/server/public_simplechat
* cd ../examples/server/public_simplechat
Expand Down
2 changes: 1 addition & 1 deletion examples/server/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ It's possible to override some scenario steps values with environment variables:
| variable | description |
|--------------------------|------------------------------------------------------------------------------------------------|
| `PORT` | `context.server_port` to set the listening port of the server during scenario, default: `8080` |
| `LLAMA_SERVER_BIN_PATH` | to change the server binary path, default: `../../../build/bin/server` |
| `LLAMA_SERVER_BIN_PATH` | to change the server binary path, default: `../../../build/bin/llama-server` |
| `DEBUG` | "ON" to enable steps and server verbose mode `--verbose` |
| `SERVER_LOG_FORMAT_JSON` | if set switch server logs to json format |
| `N_GPU_LAYERS` | number of model layers to offload to VRAM `-ngl --n-gpu-layers` |
Expand Down
4 changes: 2 additions & 2 deletions examples/server/tests/features/steps/steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -1272,9 +1272,9 @@ def context_text(context):

def start_server_background(context):
if os.name == 'nt':
context.server_path = '../../../build/bin/Release/server.exe'
context.server_path = '../../../build/bin/Release/llama-server.exe'
else:
context.server_path = '../../../build/bin/server'
context.server_path = '../../../build/bin/llama-server'
if 'LLAMA_SERVER_BIN_PATH' in os.environ:
context.server_path = os.environ['LLAMA_SERVER_BIN_PATH']
server_listen_addr = context.server_fqdn
Expand Down
2 changes: 1 addition & 1 deletion grammars/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GBNF Guide

GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `examples/main` and `examples/server`.
GBNF (GGML BNF) is a format for defining [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) to constrain model outputs in `llama.cpp`. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in `examples/main` and `examples/llama-server`.

## Background

Expand Down