String.normalize

You're seeing just the function normalize, go back to String module for more information.

Converts all characters in string to Unicode normalization form identified by form.

Invalid Unicode codepoints are skipped and the remaining of the string is converted. If you want the algorithm to stop and return on invalid codepoint, use :unicode.characters_to_nfd_binary/1, :unicode.characters_to_nfc_binary/1, :unicode.characters_to_nfkd_binary/1, and :unicode.characters_to_nfkc_binary/1 instead.

Normalization forms :nfkc and :nfkd should not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.

  • :nfc - Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.

  • :nfkd - Normalization Form Compatibility Decomposition. Characters are decomposed by compatibility equivalence, and multiple combining characters are arranged in a specific order.

  • :nfkc - Normalization Form Compatibility Composition. Characters are decomposed and then recomposed by compatibility equivalence.

Examples

iex> String.normalize("yêṩ", :nfd)
"yêṩ"

iex> String.normalize("leña", :nfc)
"leña"

iex> String.normalize("fi", :nfkd)
"fi"

iex> String.normalize("fi", :nfkc)
"fi"