Couldn't LLMs predict the next array of tokens rather than each token individually?

When we ask a regular individual "the square of a hypotenuse is" they say "the sum of the squares of the other two sides" without even thinking. Sure giving some thought to an answer is great, but sometimes you just have the answer ready and don't need to think to predict the next phrase

Could perhaps LLMs predict the next array of tokens rather than each token individually? I know tokens are the smallest unit of the language, or something like that, but perhaps a "chunk approach" could work

I'm sure there is a valid reason why that isn't the case, i'm just wondering why isn't it the case