Tokeniser

Tokenization is used in natural language processing to split paragraphs and sentences into smaller units that can be more easily assigned meaning. The first step of the NLP process is gathering the data (a sentence) and breaking it into understandable parts. Tokenization is the process of breaking text into smaller pieces called tokens. These smaller pieces can be sentences, words, or sub-words. For example, the sentence “I won” can be tokenized into two word-tokens “I” and “won”.

What is Tokenization? - TechTarget
Tokenization in NLP: Types, Challenges, Examples, Tools
Tokenization - Coursera
NLP for Developers: Tokenization
What is tokenization and how does it work? Tokenizers explained.