Tokenizer comparison
Self Note
This note is for myself to understand the concepts
Very good resource Karpathy's Tokenizer Video
It is always clear that LLMs use different tokenizer , i want to test it.
I have downloaded from gutenberg.org/cache/epub/100/pg100.txt
comparison of My Colab Notebook