Tokenizers Ruby

🙂 Fast state-of-the-art tokenizers for Ruby

Installation

Add this line to your application’s Gemfile:

gem "tokenizers"

Getting Started

Load a pretrained tokenizer

tokenizer = Tokenizers.from_pretrained("bert-base-cased")

Encode

encoded = tokenizer.encode("I can feel the magic, can you?")
encoded.ids
encoded.tokens

Decode

tokenizer.decode(ids)

Load a tokenizer from files

tokenizer = Tokenizers::CharBPETokenizer.new("vocab.json", "merges.txt")

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/ankane/tokenizers-ruby.git
cd tokenizers-ruby
bundle install
bundle exec rake compile
bundle exec rake download:files
bundle exec rake test

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
ext/tokenizers		ext/tokenizers
lib		lib
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
tokenizers.gemspec		tokenizers.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tokenizers Ruby

Installation

Getting Started

History

Contributing

About

Uh oh!

Releases

Packages

Languages

License

airopshq/tokenizers-ruby

Folders and files

Latest commit

History

Repository files navigation

Tokenizers Ruby

Installation

Getting Started

History

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages