I think only hobbyists and researchers will want to run LMMs locally. For most people, we only run them for a few seconds per hour. Even if you are a writer and need the LMM the style/spell/grammar check your work and you work all day, you only run the model for a few minutes. The usage is VERY "peaky", where you need a lot of computation for a few seconds and then for many minutes, you need zero.
With high peak demands and then nothing, the most economical thing to to is to share the large computer. This is why the cloud "works", one large server can Support hundreds of users and the average load on the server is high. Keeping the average load high is the only way to justify the cost.
Today if I wanted local performance as good as I can get on the cloud I'd need to spend about $10,000 for a fast Xeon-based computer with an Nvidia RTX6000 GPU. Yes, the hardware price will come down over the years but the size of the LMMs we run will grow. It will ALWAYS be expensive to run these large models locally so there will always be more economical to share the large servers.
What we might do is locally pre-process the data. Certainly, it makes sense to tokenize the data locally. Maybe there is more that cane done locally. ut the model itself needs a lot of "compute" power.