Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
76290d9
initial porting of previous LLG patch
mmoskal Jan 25, 2025
f19655c
update for new APIs
mmoskal Jan 25, 2025
f4dc4b8
build: integrate llguidance as an external project
mmoskal Jan 25, 2025
afb6cac
use '%llguidance' as marker to enable llg lark syntax
mmoskal Jan 26, 2025
b5399d4
add some docs
mmoskal Jan 26, 2025
adc4aed
clarify docs
mmoskal Jan 26, 2025
2a92bfb
code style fixes
mmoskal Jan 26, 2025
8cb12d4
remove llguidance.h from .gitignore
mmoskal Jan 26, 2025
de269a1
fix tests when llg is enabled
mmoskal Jan 26, 2025
a7be666
pass vocab not model to llama_sampler_init_llg()
mmoskal Jan 26, 2025
3675050
copy test-grammar-integration.cpp to test-llguidance.cpp
mmoskal Jan 26, 2025
58006dd
clang fmt
mmoskal Jan 26, 2025
036b91f
fix ref-count bug
mmoskal Jan 26, 2025
f245ca2
build and run test
mmoskal Jan 26, 2025
16a5484
gbnf -> lark syntax
mmoskal Jan 26, 2025
2937537
conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
mmoskal Jan 26, 2025
c7ebf57
rename llguidance test file to test-grammar-llguidance.cpp
mmoskal Jan 26, 2025
0a211fc
add gh action for llg test
mmoskal Jan 26, 2025
8e027f8
align tests with LLG grammar syntax and JSON Schema spec
mmoskal Jan 26, 2025
ca88ce7
llama_tokenizer() in fact requires valid utf8
mmoskal Jan 26, 2025
44e1973
update llg
mmoskal Jan 26, 2025
c9e9853
format file
mmoskal Jan 26, 2025
efc36c9
add $LLGUIDANCE_LOG_LEVEL support
mmoskal Jan 26, 2025
08fefd1
fix whitespace
mmoskal Jan 26, 2025
1afc53a
fix warning
mmoskal Jan 26, 2025
00fcd98
include <cmath> for INFINITY
mmoskal Jan 26, 2025
437ff31
add final newline
mmoskal Jan 26, 2025
5475357
fail llama_sampler_init_llg() at runtime
mmoskal Jan 29, 2025
d06448a
Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
mmoskal Jan 29, 2025
59da969
simplify #includes
mmoskal Jan 30, 2025
d59d939
improve doc string for LLAMA_LLGUIDANCE
mmoskal Jan 30, 2025
6b2de55
Merge branch 'master' into llg
mmoskal Jan 31, 2025
a049afb
typo in merge
mmoskal Jan 31, 2025
7057589
bump llguidance to 0.6.12
mmoskal Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
  • Loading branch information
mmoskal committed Jan 29, 2025
commit d06448a06a55fcf2fc306c0f1e584c02b62648d3
50 changes: 5 additions & 45 deletions docs/llguidance.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[LLGuidance](https://github.com/guidance-ai/llguidance) is a library for constrained decoding (also called constrained sampling or structured outputs) for Large Language Models (LLMs). Initially developed as the backend for the [Guidance](https://github.com/guidance-ai/guidance) library, it can also be used independently.

LLGuidance supports JSON Schemas and arbitrary context-free grammars (CFGs) written in a [variant](https://github.com/guidance-ai/llguidance/blob/main/parser/src/lark/README.md) of Lark syntax. It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench) and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/parser/src/json/README.md) JSON Schema coverage but requires the Rust compiler, which complicates the llama.cpp build process.
LLGuidance supports JSON Schemas and arbitrary context-free grammars (CFGs) written in a [variant](https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md) of Lark syntax. It is [very fast](https://github.com/guidance-ai/jsonschemabench/tree/main/maskbench) and has [excellent](https://github.com/guidance-ai/llguidance/blob/main/docs/json_schema.md) JSON Schema coverage but requires the Rust compiler, which complicates the llama.cpp build process.

## Building

Expand All @@ -19,6 +19,8 @@ This requires the Rust compiler and the `cargo` tool to be [installed](https://w

There are no new command-line arguments or modifications to `common_params`. When enabled, grammars starting with `%llguidance` are passed to LLGuidance instead of the [current](../grammars/README.md) llama.cpp grammars. Additionally, JSON Schema requests (e.g., using the `-j` argument in `llama-cli`) are also passed to LLGuidance.

For your existing GBNF grammars, you can use [gbnf_to_lark.py script](https://github.com/guidance-ai/llguidance/blob/main/scripts/gbnf_to_lark.py) to convert them to LLGuidance Lark-like format.

## Performance

Computing a "token mask" (i.e., the set of allowed tokens) for a llama3 tokenizer with 128k tokens takes, on average, 50μs of single-core CPU time for the [JSON Schema Bench](https://github.com/guidance-ai/jsonschemabench). The p99 time is 0.5ms, and the p100 time is 20ms. These results are due to the lexer/parser split and several [optimizations](https://github.com/guidance-ai/llguidance/blob/main/docs/optimizations.md).
Expand All @@ -38,53 +40,11 @@ Unsupported schemas result in an error message—no keywords are silently ignore
GBNF lacks the concept of a lexer.

Most programming languages, including JSON, use a two-step process: a lexer (built with regular expressions) converts a byte stream into lexemes, which are then processed by a CFG parser. This approach is faster because lexers are cheaper to evaluate, and there is ~10x fewer lexemes than bytes.

LLM tokens often align with lexemes, so the parser is engaged in under 0.5% of tokens, with the lexer handling the rest.

However, the user has to provide the distinction between lexemes and CFG symbols. In [Lark](https://github.com/lark-parser/lark), lexeme names are uppercase, while CFG symbols are lowercase.

For example, a simplified C grammar in Lark:

```lark
%llguidance {}

start: program

program: (function_definition | declaration)*

function_definition: type ID "(" parameter_list? ")" "{" statement* "}"
parameter_list: parameter ("," parameter)*
parameter: type ID

declaration: type variable_list ";"
variable_list: ID ("," ID)*

type: "int" | "float" | "char" | "void"

statement: declaration
| assignment ";"
| "return" expr ";"
| if_statement
| while_statement
| expr ";"

assignment: ID "=" expr
expr: term (("+" | "-") term)*
term: factor (("*" | "/") factor)*
factor: ID | NUMBER | "(" expr ")"

if_statement: "if" "(" expr ")" "{" statement* "}" ("else" "{" statement* "}")?
while_statement: "while" "(" expr ")" "{" statement* "}"

ID: /[a-zA-Z_][a-zA-Z0-9_]*/
NUMBER: /[0-9]+/

%ignore /[ \t\f\r\n]+/
```

In GBNF, lexemes like `ID` and `NUMBER` are typically lowercase and converted to CFG rules instead of remaining regular expressions. Ignoring whitespace would need to be explicitly specified everywhere.

Writing grammars without lexemes would be slower and might result in "single-byte lexeme" errors in LLGuidance, fixable by renaming symbols to uppercase.
The [gbnf_to_lark.py script](https://github.com/guidance-ai/llguidance/blob/main/scripts/gbnf_to_lark.py) can often take care of this automatically.
See [LLGuidance syntax docs](https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md#terminals-vs-rules) for more details.

## Error Handling

Expand Down
Loading