Skip to content

UTF8

The UTF8 class helps for rune-aware string and bidirectional text operations.


Synopsis

Static Methods


Static Methods

UTF8.char_at(str[, index])

Get the rune at given index in a string.

Parameters:

Parameter Type Required Description
str string Yes The string to extract a character from.
index int No Index of the rune to get. Default: 0.
  • Returns: string: The rune at given index, or an empty string if index is out of bound.

UTF8.insert(str, index, text)

Insert a string into another one at given index.

Parameters:

Parameter Type Required Description
str string Yes The string to insert the other one into.
index int Yes Index of where to insert the other string.
text string Yes The string to insert.
  • Returns: string: A new string, result of text insertion.

UTF8.len(str)

Compute the number of Unicode code points (runes) in a string.

Parameters:

Parameter Type Required Description
str string Yes The string to count runes for.
  • Returns: int: Number of runes in the string.

UTF8.reorder_bidi(str)

Converts a string from Logical Order (the order characters are typed and stored in memory) into Visual Order (the physical left-to-right sequence required for screen display).

When to use this method?

Nutshell Font class already handle character ordering automatically, you don't have to use this to method to display Right-to-Left text correctly.

Just use the rtl = true option in the font.draw(x, y, text[, options]) call!

Use this method only when manually drawing Right-To-Left text character by character (e.g. using a custom bitmap font with the Image class).

Parameters:

Parameter Type Required Description
str string Yes The string to apply the BiDi algorithm to.
  • Returns: string: A new string in visual display order.

UTF8.reverse(str)

Reverses a string by grapheme clusters.

Parameters:

Parameter Type Required Description
str string Yes The string to reverse.
  • Returns: string: The reversed string.

UTF8.slice(str[, start[, end]])

Create a rune-based substring of a string.

Parameters:

Parameter Type Required Description
str string Yes The string to slice.
start int No Start index, inclusive. Default: 0.
end int No End index, exclusive. A negative end index means until the end of the string. Default: -1.
  • Returns: string: The substring.

Examples

local text = "Hello Nutshell! 🐿️";

print(UTF8.len(text)+"\n")                  // 18

print(UTF8.slice(text, 6, 15)+"\n")         // "Nutshell!"
print(UTF8.slice(text, 6)+"\n")             // "Nutshell! 🐿️"

print(UTF8.char_at(text, 16)+"\n")                          // "🐿"
print(UTF8.char_at(text, 16)+UTF8.char_at(text, 17)+"\n")   // "🐿️" = Base Squirrel Rune (U+1F43F) + Invisible Emoji Modifier (U+FE0F)
print(UTF8.slice(text, 16, 18)+"\n")                        // "🐿️"

print(UTF8.insert(text, 5, " 🐿️ ")+"\n")    // "Hello 🐿️ Nutshell! 🐿️"
print(UTF8.reverse(text)+"\n")              // "🐿️ !llehstuN olleH"


// Bidirectional / RTL Visual Reordering ---------------------------------------
// "مرحبا بالعالم" is Arabic for "Hello World" (Logical order in memory)
// [م] [ر] [ح] [ب] [ا] [ ] [ب] [ا] [ل] [ع] [ا] [ل] [م]
local arabicText = "مرحبا بالعالم"

// Converts to connected script ligatures and flips to Visual LTR order
local visualText = UTF8.reorder_bidi(arabicText);

// Print each character individually to the console on the same line
// to prevent the terminal from automatically re-ordering the text layout.
local length = UTF8.len(visualText)
for (local i = 0; i < length; i++) {
    print("[" + UTF8.char_at(visualText, i) + "] ")
}
print("\n");

// Console Output will safely display:
// [ﻣ] [ﺮ] [ﺣ] [ﺒ] [ﺎ] [ ] [ﺑ] [ﺎ] [ﻟ] [ﻌ] [ﺎ] [ﻟ] [ﻢ]