LLMs: Why Apple may finally be able to upgrade base Macs to 16/512

senttoschool · May 6, 2023

Background assumptions:

For over a decade, RAM requirements did not increase as they did before. The demand for RAM on laptops plateaued because most common applications can be comfortably used on an 8GB or 16GB machine.
Apple relies on the 8/265 to 16/512 $400 upgrade in order to have the profit margins they normally have for their products. Without this upgrade, Apple's Mac business would be much less lucrative.
The 8/256 base is by design because it's just enough for a comfortable experience but will require a $400 upgrade if you want to do anything beyond the basics. The requirements for a "comfortable experience" must raise in order for Apple to want to raise the base specs.
If Apple sets the base at 16/512, then they need to replace those who would have upgraded from 8/265 to 16/512, to now upgrade from 16/512 to 32/512 or 16/512 to 16/2TB.

Enter LLMs (large language models such as ChatGPT). These are new kinds of applications that require a significant increase in high-bandwidth RAM. LLMs such as LLaMA, Vicuna, etc. require approximately 30GB of RAM to run well for a 7 billion model. There are versions with 7, 13, 33, 65 billion parameters.

If the future is such that everyone will be running an LLM like ChatGPT on local hardware, then RAM demand will increase drastically. And I mean drastically.

An 8GB Macbook Air is simply not enough to run any decent LLM. The minimum may be as high as 32GB of RAM. Ideally, you're using 128GB+ of RAM in order to run better and larger LLMs.

Hence, if Apple raises the base to 16/512, then there will still be plenty of demand for upgrades to 32/1TB and beyond, preserving their profit margins.

By next year, I predict that threads here will go from "Is 8GB enough for my use case?" to "Is 64GB enough to run Vicuna 33b parameter model?".

Note: LLMs require high bandwidth RAM such as those found on a GPU. However, because Apple has a unified memory model, all system RAM already has high bandwidth. For PCs, they will have to drastically increase VRAM on a GPU, not system RAM. For Apple, they just need to increase system RAM.

skaertus · May 6, 2023

senttoschool said:
Background assumptions:

For over a decade, RAM requirements did not increase as they did before. The demand for RAM on laptops plateaued because most common applications can be comfortably used on an 8GB or 16GB machine.

Apple relies on the 8/265 to 16/512 $400 upgrade in order to have the profit margins they normally have for their products. Without this upgrade, Apple's Mac business would be much less lucrative.

The 8/256 base is by design because it's just enough for a comfortable experience but will require a $400 upgrade if you want to do anything beyond the basics. The requirements for a "comfortable experience" must raise in order for Apple to want to raise the base specs.

If Apple sets the base at 16/512, then they need to replace those who would have upgraded from 8/265 to 16/512, to now upgrade from 16/512 to 32/512 or 16/512 to 16/2TB.

Enter LLMs (large language models such as ChatGPT). These are new kinds of applications that require a significant increase in high-bandwidth RAM. LLMs such as LLaMA, Vicuna, etc. require approximately 30GB of RAM to run well for a 7 billion model. There are versions with 7, 13, 33, 65 billion parameters.

If the future is such that everyone will be running an LLM like ChatGPT on local hardware, then RAM demand will increase drastically. And I mean drastically.

An 8GB Macbook Air is simply not enough to run any decent LLM. The minimum may be as high as 32GB of RAM. Ideally, you're using 128GB+ of RAM in order to run better and larger LLMs.

Hence, if Apple raises the base to 16/512, then there will still be plenty of demand for upgrades to 32/1TB and beyond, preserving their profit margins.

By next year, I predict that threads here will go from "Is 8GB enough for my use case?" to "Is 64GB enough to run Vicuna 33b parameter model?".

Note: LLMs require high bandwidth RAM such as those found on a GPU. However, because Apple has a unified memory model, all system RAM already has high bandwidth. For PCs, they will have to drastically increase VRAM on a GPU, not system RAM. For Apple, they just need to increase system RAM.

There is also the issue of competition. Here in Brazil, Apple sells an 8 GB/256 GB MacBook Air for about the same price Dells sells a 32 GB/1 TB 13-inch XPS Plus. One may say that a Mac has no competitors and blah blah blah, but this is starting to look ridiculous.

mi7chy · May 6, 2023

Increasing local model size not only requires more RAM but also more compute performance unless you have a lot of time to waste and reduced models are gimmicks for chit chat and not for anything useful like code generation/debugging. Anyone serious will pick up Nvidia 80GB A100 multi-GPU setup for cheaper when data centers dump theirs for latest and greatest H100. So, unlikely the general public is going to waste their time with local reduced models when top tier GPT-4 via free Bing Chat or even paid ChatGPT Plus is significantly more powerful and responsive.

dmccloud · May 6, 2023

The entirety of the OPs post is based on two assumptions: a) ChatGPT/similar will start to be used by a sizeable percentage of the userbase, and b) such usage will be on locally run instances instead of via internet-based methods.

Given the sheer backend resources needed to run a complete GPT instance, the only way to get the full experience is via internet-based methods, because even the most powerful CPUs available to consumers wouldn't handle a full GPT installation. Even if any manufacturer wanted to build a system capable of hosting GPT locally, the cost would be extremely prohibitive because of how much storage, RAM, and even cooling would be needed. At the same time, increased demand for RAM would make DDR4/DDR5 prices jump again, adding even more to the cost of capable machines.

With the internet-based instances of ChatGPT, the bulk of the processing is being done on that remote server farm rather than on a local machine (distributed computing across multiple network nodes), which accomplishes two things: a) moves the burden and expenses of hardware support to the service providers rather than the consumers/end users, and b) makes the technology accessible to more people. Accessibility does not equal usage though - that's something to keep in mind.

Instead of the scenario the OP describes, I feel that ChatGPT usage overall will actually decrease as the bulk of currently interested users decide it's not for them and/or doesn't fit their needs. That will leave usage of the service to the researchers, high-end developers, etc. rather than the market as a whole.

AgentMcGeek · May 7, 2023

I think a more realistic deployment scenario for local instances of LLMs would be their integration into third party apps. For example Office using a local LLM to improve its text correction and generation functions.

AgentMcGeek · May 7, 2023

Besides, there already exists lightweight LLMs. I’m using this one locally on my iPhone: https://mlc.ai/mlc-llm/

Joe Dohn · May 7, 2023

AgentMcGeek said:
I think a more realistic deployment scenario for local instances of LLMs would be their integration into third party apps. For example Office using a local LLM to improve its text correction and generation functions.

Yes please. PLEASE. The spell checkers of Word processors are SO outdated. They work fine for more mechanical mistakes, but are an absolute pain if you need e.g, to replace text.

Joe Dohn · May 7, 2023

dmccloud said:
The entirety of the OPs post is based on two assumptions: a) ChatGPT/similar will start to be used by a sizeable percentage of the userbase, and b) such usage will be on locally run instances instead of via internet-based methods.

Given the sheer backend resources needed to run a complete GPT instance, the only way to get the full experience is via internet-based methods, because even the most powerful CPUs available to consumers wouldn't handle a full GPT installation. Even if any manufacturer wanted to build a system capable of hosting GPT locally, the cost would be extremely prohibitive because of how much storage, RAM, and even cooling would be needed. At the same time, increased demand for RAM would make DDR4/DDR5 prices jump again, adding even more to the cost of capable machines.

With the internet-based instances of ChatGPT, the bulk of the processing is being done on that remote server farm rather than on a local machine (distributed computing across multiple network nodes), which accomplishes two things: a) moves the burden and expenses of hardware support to the service providers rather than the consumers/end users, and b) makes the technology accessible to more people. Accessibility does not equal usage though - that's something to keep in mind.

Instead of the scenario the OP describes, I feel that ChatGPT usage overall will actually decrease as the bulk of currently interested users decide it's not for them and/or doesn't fit their needs. That will leave usage of the service to the researchers, high-end developers, etc. rather than the market as a whole.

If you run a scaled down version, e.g, Alpaca 7b, the requirements decrease dramatically. Yes, their responses tend to be of inferior quality, but if the models are specialized, that can absolutely be countered (GPT-Bio, which is better than your average human expert in interpreting biomedical papers, is a fork of GPT-2 trained on high quality data).

In other words, if you get a lightweight model for a very specialized and train it enough, you can definitely make it run on older computers. Those models won't be versatile at all, but if you delegate them to very specific tasks? No problem!

senttoschool · May 9, 2023

dmccloud said:
The entirety of the OPs post is based on two assumptions: a) ChatGPT/similar will start to be used by a sizeable percentage of the userbase, and b) such usage will be on locally run instances instead of via internet-based methods.

The reason ChatGPT has to run in the cloud is that consumer chip companies have not focused on building affordable AI hardware for the masses. There wasn't a huge demand until now. It all changes now because AMD, Intel, Nvidia, Apple, Qualcomm, and countless AI chip startups will focus on making NPUs far bigger and more efficient. Previously, the focused on the CPU and GPU while NPUs were an afterthought. NPUs are really the new brains of computers - not CPUs.

I'm betting that due to privacy concerns, cost of use, latency, and better NPUs, LLMs will move to local hardware in the future.

But it isn't just LLMs. Even your normal apps like Photoshop and Excel are getting LLM and generative AI features. The need for RAM will finally increase after over a decade of stagnation.

dmccloud said:
Instead of the scenario the OP describes, I feel that ChatGPT usage overall will actually decrease as the bulk of currently interested users decide it's not for them and/or doesn't fit their needs. That will leave usage of the service to the researchers, high-end developers, etc. rather than the market as a whole.

I disagree. Look up tools like AutoGPT. People are building some incredible things with LLMs after only 5 months. Chegg, the textbook company, just had their stock drop by 50% because students are using ChatGPT to do their homework.

We will not go back. LLMs are the real deal.

Xiao_Xi · May 9, 2023

LLM models could face a monetization problem.

Google “We Have No Moat, And Neither Does OpenAI”

Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI The text below is a very recent leaked document, which was shared by an anonymous individual on a public Disc…

www.semianalysis.com

dmccloud · May 9, 2023

senttoschool said:
The reason ChatGPT has to run in the cloud is that consumer chip companies have not focused on building affordable AI hardware for the masses. There wasn't a huge demand until now. It all changes now because AMD, Intel, Nvidia, Apple, Qualcomm, and countless AI chip startups will focus on making NPUs far bigger and more efficient. Previously, the focused on the CPU and GPU while NPUs were an afterthought. NPUs are really the new brains of computers - not CPUs.

I'm betting that due to privacy concerns, cost of use, latency, and better NPUs, LLMs will move to local hardware in the future.

But it isn't just LLMs. Even your normal apps like Photoshop and Excel are getting LLM and generative AI features. The need for RAM will finally increase after over a decade of stagnation.

I disagree. Look up tools like AutoGPT. People are building some incredible things with LLMs after only 5 months. Chegg, the textbook company, just had their stock drop by 50% because students are using ChatGPT to do their homework.

We will not go back. LLMs are the real deal.

Consumer hardware hasn't and likely won't improve to the point where a full GPT-based model can be run locally for the vast majority of the PC/Mac user base, at least not any time in the near future. When you factor in higher demand for RAM as a result of the drive towards AI leading to even higher prices for DDR5 (possibly even DDR4 as a spillover effect), then a lot of the prospective market for AI would be priced out of the market. That would actually increase the cost of use rather than lowering it.

As far as Chegg goes, there's two reasons students are turning towards ChatGPT (and often getting caught in the process): laziness in terms of not wanting to do any work themselves and Chegg's price structure itself.

Kazgarth · May 9, 2023

Just because ChatGPT is popular doesn't mean every average user needs to run it locally.

It's doesn't cost anything but few bytes of internet bandwidth to get the same output from a data center with 1000x the speed and RAM of your supposed Mac machine.

mi7chy · May 9, 2023

Doesn't make economical sense either. With fewer unit sales they need to maximize per unit profit so equipping with higher base RAM and storage does the opposite of reducing per unit profit.

purplerainpurplerain · May 9, 2023

if you need to run local language models or AI enabled features you don’t need something the size of chatgpt. It’s overly large just because it’s a data hungry brute force way for them to get attention and is built on super large amounts of debt.

There are much leaner models that you can run local if you tailor them for your specific area or tasks.

As always you need to be super careful when using cloud based ones because their data policies are dangerous and even their terms and conditions try as best as they can to sound polite about it. A lot of businesses will be harmed by the simple fact their data will be shared with third parties who are unknown to them or difficult to take action against.

You will not be able to delete data from many of these kind of services. OpenAI and Midjourney won’t even let users delete images even the most garbage ones. Just mountains of data growing and growing.

A lot of large businesses will simply not be able to use them because of these legal issues. Samsung already had a bad data leak and there will be much worse examples to come.

This page here sounds like a mine field. It is one of the worst privacy policy pages I have read.

Auto-GPT - The next evolution of data driven Chat AI

Auto-GPT goes beyond traditional language generation tools by incorporating data from multiple sources, including news articles, scientific research papers, and social media feeds. This allows the system to generate text that is not only accurate but also reflects the latest trends and...

auto-gpt.ai

ahurst · May 10, 2023

Joe Dohn said:
If you run a scaled down version, e.g, Alpaca 7b, the requirements decrease dramatically. Yes, their responses tend to be of inferior quality, but if the models are specialized, that can absolutely be countered (GPT-Bio, which is better than your average human expert in interpreting biomedical papers, is a fork of GPT-2 trained on high quality data).

At risk of getting off-topic, I found that claim about BioGPT pretty astonishing as someone who works in the sciences, so I decided to look it up myself.

From what I can tell, the main evidence the researchers have that it exceeds the abilities of the average human expert is that it did ~3% better on a specific series of yes/no questions than a *single expert* from a 2019 paper. I also found a piece on BioGPT that paints a somewhat… different picture of its abilities than Microsoft’s press release. Some choice quotes:

Asked about the average number of ghosts haunting an American hospital, for example, it cited nonexistent data from the American Hospital Association that it said showed the "average number of ghosts per hospital was 1.4." Asked how ghosts affect the length of hospitalization, the AI replied that patients "who see the ghosts of their relatives have worse outcomes while those who see unrelated ghosts do not."

Asked about the topic, it replied that "vaccines are one of the possible causes of autism." (However, it hedged in a head-scratching caveat, "I am not advocating for or against the use of vaccines.")

So, uh, yeah. Not quite at the point where I’d want to use anything like it for actual research (or medical decisions). Reminds me a lot of how Meta’s scientific-journal-trained Galactica model imploded under scrutiny after only 3 days online for similar reasons (people quickly got it to write summaries on the health benefits of eating crushed glass, among other things).

anshuvorty · May 10, 2023

senttoschool said:
Background assumptions:

For over a decade, RAM requirements did not increase as they did before. The demand for RAM on laptops plateaued because most common applications can be comfortably used on an 8GB or 16GB machine.

Apple relies on the 8/265 to 16/512 $400 upgrade in order to have the profit margins they normally have for their products. Without this upgrade, Apple's Mac business would be much less lucrative.

The 8/256 base is by design because it's just enough for a comfortable experience but will require a $400 upgrade if you want to do anything beyond the basics. The requirements for a "comfortable experience" must raise in order for Apple to want to raise the base specs.

If Apple sets the base at 16/512, then they need to replace those who would have upgraded from 8/265 to 16/512, to now upgrade from 16/512 to 32/512 or 16/512 to 16/2TB.

Enter LLMs (large language models such as ChatGPT). These are new kinds of applications that require a significant increase in high-bandwidth RAM. LLMs such as LLaMA, Vicuna, etc. require approximately 30GB of RAM to run well for a 7 billion model. There are versions with 7, 13, 33, 65 billion parameters.

If the future is such that everyone will be running an LLM like ChatGPT on local hardware, then RAM demand will increase drastically. And I mean drastically.

An 8GB Macbook Air is simply not enough to run any decent LLM. The minimum may be as high as 32GB of RAM. Ideally, you're using 128GB+ of RAM in order to run better and larger LLMs.

Hence, if Apple raises the base to 16/512, then there will still be plenty of demand for upgrades to 32/1TB and beyond, preserving their profit margins.

By next year, I predict that threads here will go from "Is 8GB enough for my use case?" to "Is 64GB enough to run Vicuna 33b parameter model?".

Note: LLMs require high bandwidth RAM such as those found on a GPU. However, because Apple has a unified memory model, all system RAM already has high bandwidth. For PCs, they will have to drastically increase VRAM on a GPU, not system RAM. For Apple, they just need to increase system RAM.

Your whole argument falls apart when you consider that you can run LLM in the cloud in AWS or Azure or Google Cloud...

TechnoMonk · May 10, 2023

What LLM runs with 16 GB unified memory? I can barely get decent results and performance with 64 GB. LLM running on 16 Gb with decent accuracy and performance is years away.

senttoschool · May 11, 2023

anshuvorty said:
Your whole argument falls apart when you consider that you can run LLM in the cloud in AWS or Azure or Google Cloud...

You can run any application in the cloud. You can even run browsers on the cloud. So what?

senttoschool · May 11, 2023

TechnoMonk said:
What LLM runs with 16 GB unified memory? I can barely get decent results and performance with 64 GB. LLM running on 16 Gb with decent accuracy and performance is years away.

Exactly! Hence, memory requirements could drastically increase with the advent of AI applications.

senttoschool · May 11, 2023

Kazgarth said:
Just because ChatGPT is popular doesn't mean every average user needs to run it locally.

It's doesn't cost anything but few bytes of internet bandwidth to get the same output from a data center with 1000x the speed and RAM of your supposed Mac machine.

Progress is being made to drastically lower the requirements of LLMs so that you don't need a data center to run it. In addition, progress is being made on consumer AI hardware (Apple Silicon, for example).

These two forces will converge.

But like others have said, you don't need to run GPT4. There will be plenty of other AI applications as well that require a drastic step up in RAM.

Xiao_Xi · May 11, 2023

Market competition, not the ability to run LLM locally, will force Apple to upgrade entry-level Macs to 16/512. However, the ability to run LLM will help Apple sell higher spec Macs, just as gaming and 3D rendering help sell premium GPUs.

TechnoMonk · May 11, 2023

senttoschool said:
Exactly! Hence, memory requirements could drastically increase with the advent of AI applications.

You don’t need to have Ram or use a full blown model. I can see run time compact vector Databases as future or consumer/Device AI. Just vector tokenize subsets of information/books, package in a vector database with app. Open AI Vectorization is much better than ChatGPT if you get past the hype.
Microsoft and Google will milk it in the cloud, rather than giving the users offline device level processing.

senttoschool · May 11, 2023

TechnoMonk said:
Microsoft and Google will milk it in the cloud, rather than giving the users offline device level processing.

I can see Apple's model moving AI to local for latency, privacy, and because Apple has a killer SoC that they can improve on for AI.

floral · May 11, 2023

I don't use LLMs on device, I usually just look one up on the web and use it when or if I need to...

anshuvorty · May 11, 2023

senttoschool said:
You can run any application in the cloud. You can even run browsers on the cloud. So what?

In my opinion, LLMs demand significant RAM and GPU resources, making it more logical to operate them on the cloud rather than on the device. Therefore, it is unlikely that Apple will feel compelled to offer the 16/512 configuration as the standard configuration for their devices.

LLMs: Why Apple may finally be able to upgrade base Macs to 16/512

macrumors 68030

macrumors 601

Suspended

macrumors 68040

macrumors 6502

macrumors 6502

macrumors 6502a

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors 68040

macrumors 6502

Suspended

Suspended

macrumors 6502

macrumors 68040

macrumors 68040

macrumors 68030

macrumors 68030

macrumors 68030

macrumors 68000

macrumors 68040

macrumors 68030

macrumors 65816

macrumors 68040

Our Staff