ENSNormalize.java

0-dependency ENSIP-15 in Java

Reference Implementation: adraffy/ens-normalize.js
- Unicode: 17.0.0
- Spec Hash: 4febc8f5d285cbf80d2320fb0c1777ac25e378eb72910c34ec963d0a4e319c84
Passes 100% ENSIP-15 Validation Tests
Passes 100% Unicode Normalization Tests
Space Efficient: ~58KB .jar using binary resources via make.js
JDK Support: 8+
Maven Central Repository: io.github.adraffy — 0.3.1

import io.github.adraffy.ens.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)

Primary API ENSIP15

// String -> String
// throws on invalid names
ENSNormalize.ENSIP15.normalize("RaFFY🚴‍♂️.eTh"); // "raffy🚴‍♂.eth"

// works like normalize()
ENSNormalize.ENSIP15.beautify("1⃣2⃣.eth"); // "1️⃣2️⃣.eth"

Additional NormDetails (Experimental)

// works like normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.normalizeDetails("💩ì.a");

String name; // normalized name
boolean possiblyConfusing; // if name should be carefully reviewed
HashSet<Group> groups; // unique groups in name
HashSet<EmojiSequence> emojis; // unique emoji in name
String groupDescription() = "Emoji+Latin"; // group summary for name
boolean hasZWJEmoji(); // if any emoji contain 200D

Output-based Tokenization Label

// String -> List<Label>
// never throws
List<Label> labels = ENSNormalize.ENSIP15.split("💩Raffy.eth_");
// [
//   Label {
//     input: [ 128169, 82, 97, 102, 102, 121 ],  
//     tokens: [
//       OutputToken { cps: [ 128169 ], emoji: EmojiSequence { ... } }
//       OutputToken { cps: [ 114, 97, 102, 102, 121 ] }
//     ],
//     normalized: [ 128169, 114, 97, 102, 102, 121 ],
//     group: Group { name: "Latin", ... }
//   },
//   Label {
//     input: [ 101, 116, 104, 95 ],
//     tokens: [ 
//       OutputToken { cps: [ 101, 116, 104, 95 ] }
//     ],
//     error: NormException { kind: "underscore allowed only at start" }
//   }
// ]

Normalization Properties

Group — ENSIP15.groups: List<Group>
EmojiSequence — ENSIP15.emojis: List<EmojiSequence>
Whole — ENSIP15.wholes: List<Whole>

Error Handling

All errors are safe to print. NormException { kind: string, reason: string? } is the base exception. Functions that accept names as input wrap their exceptions in InvalidLabelException { start, end, error: NormException } for additional context.

"disallowed character" — DisallowedCharacterException { cp }
"illegal mixture" — IllegalMixtureException { cp, group, other? }
"whole-script confusable" — ConfusableException { group, other }
"empty label"
"duplicate non-spacing marks"
"excessive non-spacing marks"
"leading fenced"
"adjacent fenced"
"trailing fenced"
"leading combining mark"
"emoji + combining mark"
"invalid label extension"
"underscore allowed only at start"

Utilities

Normalize name fragments for substring search:

// String -> String
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.normalizeFragment("AB--");
ENSNormalize.ENSIP15.normalizeFragment("..\u0300");
ENSNormalize.ENSIP15.normalizeFragment("\u03BF\u043E");
// note: normalize() throws on these

Construct safe strings:

// int -> String
ENSNormalize.ENSIP15.safeCodepoint(0x303); // "◌̃ {303}"
ENSNormalize.ENSIP15.safeCodepoint(0xFE0F); // "{FE0F}"
// int[] -> String
ENSNormalize.ENSIP15.safeImplode(0x303, 0xFE0F); // "◌̃{FE0F}"

Determine if a character shouldn't be printed directly:

// ReadOnlyIntSet 
ENSNormalize.ENSIP15.shouldEscape.contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => true

Determine if a character is a combining mark:

// ReadOnlyIntSet
ENSNormalize.ENSIP15.combiningMarks.contains(0x20E3); // COMBINING ENCLOSING KEYCAP => true

Unicode Normalization Forms NF

import io.github.adraffy.ens.ENSNormalize;

// String -> String
ENSNormalize.NF.NFC("\u0065\u0300"); // "\u00E8"
ENSNormalize.NF.NFD("\u00E8");       // "\u0065\u0300"

// int[] -> int[]
ENSNormalize.NF.NFC(0x65, 0x300); // [0xE8]
ENSNormalize.NF.NFD(0xE8);        // [0x65, 0x300]

Publish Instructions

Sync and Compress
Update Gradle: ./gradlew wrapper --gradle-version {VERSION}
Run Tests: ./gradlew test
Ensure Access Token
Stage: ./gradlew publish
Publish: ./gradlew jreleaserDeploy

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
compress		compress
dist		dist
docs		docs
gradle/wrapper		gradle/wrapper
lib		lib
.gitattributes		.gitattributes
.gitignore		.gitignore
FUNDING.json		FUNDING.json
LICENSE		LICENSE
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ENSNormalize.java

Primary API ENSIP15

Additional NormDetails (Experimental)

Output-based Tokenization Label

Normalization Properties

Error Handling

Utilities

Unicode Normalization Forms NF

Publish Instructions

About

Uh oh!

Releases 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

adraffy/ENSNormalize.java

Folders and files

Latest commit

History

Repository files navigation

ENSNormalize.java

Primary API ENSIP15

Additional NormDetails (Experimental)

Output-based Tokenization Label

Normalization Properties

Error Handling

Utilities

Unicode Normalization Forms NF

Publish Instructions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Languages