A collection of text is known is as a corpus. Determines the llms vocabulary and the words it can generate. Bigram model only uses the last word of a given prompt to make its prediction on the next word and ignores the rest of the prompt. This is also known as the **Markov Assumption** Trigram model uses two words for context to predict the next word. A 8-gram model would use seven words for context to predict the next word.