GitHub

gptwc: wc for GPT tokens

The wc utility counts words or characters. The gptwc utility functions similarly but counts tokens. Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.

Use gptwc to check the number of tokens in a string, in order to remain under the token limit for your large language model API. Uses tiktoken.

Installation

$ pip install gptwc

$ echo "Simple is better than complex." | gptwc
6

Example Usage

$ cat LICENSE | gptwc
200
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165


$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470

curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
38071


$ cat LICENSE | gptwc --model o200k_base
200
$ cat LICENSE | gptwc --model cl100k_base
201


$ cat README.md | pbcopy
$ gptwc -c
517

Options

usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]

Count tokens in text files using OpenAI's tiktoken library.

positional arguments:
  FILE             Text files to count tokens in

options:
  -h, --help       show this help message and exit
  --files0-from F  Read input from the files specified by NUL-terminated names in file F
  --model MODEL    Encoding or model to use for tokenization (default: o200k_base, used by GPT-5 and GPT-4o)
  -c, --clipboard  Read input from the system clipboard
  --version        show program's version number and exit

Which Tokenizer Does Each Model Use?

See tiktoken/model.py for an up-to-date list.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
gptwc		gptwc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gptwc: wc for GPT tokens

Installation

Example Usage

Options

Which Tokenizer Does Each Model Use?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

lwneal/gptwc

Folders and files

Latest commit

History

Repository files navigation

gptwc: wc for GPT tokens

Installation

Example Usage

Options

Which Tokenizer Does Each Model Use?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages