So I’ve been fortunate to have some free time recently - and put a few days into this book idea.
I’ve been “runinating” (pun intended) as I needed to put together a code example on the rune
type for said book.
Now, I’ve always found rune examples a bit abstract, a bit meaningless. I think one problem is the terminology - rune - what is that, but the bigger problem is that runes get discussed in isolation and I’m left thinking “so what”?
But explaining the relationship between ASCII, byte, Unicode, rune and UTF-8 concisely isn’t simple, I’ve found.
So here’s a short example, which you can also find on the Playground here: https://go.dev/play/p/luDIj6DwPAG
package main
import (
"encoding/hex"
"fmt"
)
func main() {
myString := "🚀"
fmt.Println("Unicode codepoint represented by rune:", []rune(myString))
fmt.Println("UTF-8 code represented by up to 4 bytes:", []byte(myString))
fmt.Println("UTF-8 code represented as Hexadecimal:", hex.EncodeToString([]byte(myString)))
fmt.Println("length", len(myString))
}
If you cross-reference the output of this program against this page https://codepoints.net/U+1F680 is it helpful?
Can you pick out what data is stored by the rune, why it can’t be a byte like an ASCII character, and why Go will use up to four bytes to hold the UTF-8 representation of the character?
Feedback welcome!