A collection of text is known is as a corpus. Determines the llms vocabulary and the words it can generate.
Bigram model only uses the last word of a given prompt to make its prediction on the next word and ignores the rest of the prompt.
This is also known as the **Markov Assumption**
Trigram model uses two words for context to predict the next word.
A 8-gram model would use seven words for context to predict the next word.