Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MacRumors

macrumors bot
Original poster
Apr 12, 2001
68,504
39,347


Apple researchers have developed a new method for training large language models (LLMs) that seamlessly integrates both text and visual information.

hey-siri-banner-apple.jpg

The company's findings, detailed in a research paper titled "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training," showcase a new approach to creating more intelligent and flexible AI systems. By utilizing a diverse dataset comprising image-caption pairs, interleaved image-text documents, and text-only data, Apple's claims that the MM1 model sets a new standard in AI's ability to perform tasks such as image captioning, visual question answering, and natural language inference with a high level of accuracy.

Apple's research focuses on the combination of different types of training data and model architectures, which enables the AI to understand and generate language based on a mix of visual and linguistic cues. This capability is vital for tasks that require a nuanced comprehension of the world, such as interpreting complex images or answering questions that involve visual elements.

The paper also highlights the MM1 model's exceptional in-context learning abilities, particularly in the largest 30 billion parameter configuration of the model. This version apparently exhibits remarkable capabilities for multi-step reasoning over multiple images using few-shot "chain-of-thought" prompting, a technique that allows the AI to perform complex, open-ended problem solving based on minimal examples.

This research emerges as part of Apple's broader initiative to enhance its AI capabilities amid growing competition. Earlier today, Bloomberg's Mark Gurman reported that Apple is in discussions with Google to license Google's Gemini generative large-language models to power new features coming to the iPhone as part of iOS 18.

Article Link: Apple Publishes Details About New 'MM1' AI Model
 
Apple has recently been killing in Deep learning space. Apple released MM1, and they have released more information than most open source LLM companies. I have been testing MLX for some of my workflows, it’s probably the fastest among other python libraries. It runs open source LLM models on my iPad Pro. Gonna be interesting once it gets to iPhone and other devices. With recent updates, I can run A Falcon 180 B on my M1 Max and my Nvidia RTX 4090 GPU can only dream. I hope Apple keeps up with the releases.
 
I won’t be surprised to hear Google is going to pay Apple to use Gemini. Obvious opportunity to deal a body blow to OpenAI.
Gemini is trash, I have been trying Gemini pro free for two months, which also gives Google one subscription. GPT3.5 is leaps and bounds better and GPT4 is in another universe compared to Gemini Pro. Anthropic is pretty decent, and could be a candidate for Acquisition. Bar is so low on Gemini, Apple can easily make something better.
 
Gemini is trash, I have been trying Gemini pro free for two months, which also gives Google one subscription. GPT3.5 is leaps and bounds better and GPT4 is in another universe compared to Gemini Pro. Anthropic is pretty decent, and could be a candidate for Acquisition. Bar is so low on Gemini, Apple can easily make something better.
Just confirmed what I was thinking. Thanks.
 
  • Haha
Reactions: wilhoitm
Again this is all propaganda, Apple is not going to release the key to their secret sauce until at least WWDC, and maybe not even then.

Cook needs any goodwill the media will give him. I see him running from department to department asking, "What can we release through the backdoor, I'm sinking here!!!!"
 
Again this is all propaganda, Apple is not going to release the key to their secret sauce until at least WWDC, and maybe not even then.

Cook needs any goodwill the media will give him. I see him running from department to department asking, "What can we release through the backdoor, I'm sinking here!!!!"
We have no idea what's going on at Apple or what Apple is planning for iOS 18 and everything you wrote is hyperbole. And I would take any new development in iOS 18 with a grain salt, the feature set would have been locked down months ago.
 
Again this is all propaganda, Apple is not going to release the key to their secret sauce until at least WWDC, and maybe not even then.

Cook needs any goodwill the media will give him. I see him running from department to department asking, "What can we release through the backdoor, I'm sinking here!!!!"
I don’t know what secret sauce means, but the amount of info Apple released puts to shame the companies that call themselves open or open source. It was shock to many, just have to wait and watch how they proceed with further releases. Apple so far has been surprisingly open about their DL initiatives.
 
everyone makes fun of google and what it does with its data yet nobody here blinks an eye that it could be the backbone of apples AI iPhone integration???

Anyone?
 
  • Wow
Reactions: gusmula
everyone makes fun of google and what it does with its data yet nobody here blinks an eye that it could be the backbone of apples AI iPhone integration???

Anyone?
Gemini doesn’t run on devices. I would be as careful if Apple sends the data to cloud for running models. If it runs on devices, and Apple doesn’t train, it’s different. I can see Apple keeping models open, but make them on run on devices(iPhone, iPad and Mac).
 
Will be extremely funny to read all this PR from Apple about the big important ML research they are doing and then find out at WWDC they are just gonna outsource Siri to Microsoft, lmao
The former head of AI at Google is an SVP at Apple for the past near decade. He like the PA Semi purchase have been hard at work for their respective teams.

Apple hasn't become who they are with such a vast treasure trove of expertise and wealth by sheer dumb luck.
 
I got idea to see what people would like to use AI for.

So lets make test and use emojis as vote. What will you use AI on iPhone on regular basis the most?

👍 - text generation
❤️ - text editing/manipulation
🤣 - smart search
😲 - image/video generation
☹️ - image/video editing/manipulation
😡 - controlling your devices/home
👎 - creating automation/shortcuts


There is no more reactions but I have not got any other idea for general use anyway. You can choose only one ....
 
exceptional in-context learning abilities, particularly in the largest 30 billion parameter configuration of the model
On one hand, 30 billion isn't a lot compared to open-source models like Meta's Llama 2 (70 billion) or commercial models like OpenAI's GPT-4 (and its whopping 1.76 trillion parameters)...

...but then, if it's running on a phone, it doesn't need all that power; it just has to be smarter than the Siri we all know and, uh... love...? Yeah. Honestly, you could run one of the really tiny open-source 2 billion parameter models that are already publicly available, and it would still vastly outperform Siri. :p
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.