Windows š„ļø, Android š±... Llama š¦
With Llama 3.1, Meta takes another step towards becoming the āWindows of AI".
Zuckerberg's (AKA - the Zuck's) operating thesis for the AI space has always included a āopen vs. closed" system mental model. Think Microsoft/Windows and Google/Android vs. Appleās Mac/iPhone (both powered by Linux, a huge inspiration for Zuck's approach here).
Open vs Closed Platforms
Open Platforms
On one side, a compute platform thatās open, customizable, high interoperability between devices and providers. Open platform ecosystems see a large diversity of products and providers. Theyāre also typically cheaper than their closed platform counterparts. Open platforms also tend to (though not exclusively) lean more enterprise-friendly (outside of tech - most of corporate America uses Windows at work).
Closed Platforms
The closed model, in contrast, would be companies creating āwalled gardenā type ecosystems. A suite of products that all work well together, but donāt let the user plug and play with different providers if they wanted. These companies are typically highly vertically integrated, self-serving many functions that open platforms would lean on the ecosystem for. Closed platforms have been successful at driving premium-feeling consumer products. While not especially customizable, this usually affords greater security and performance.
For example, consider Google's Android (open) vs. Apple's Mac/iOS (closed). Itās far easier for a developer to publish a mobile app to the Play store, than on the iOS app store. The Android ecosystem is much more diverse and interoperable than Appleās (in terms of companies and devices I can use with Android). And Androids are far more customizable than iPhones.
Appleās closed model also has advantages. The App store is safer and more mature than the Play store - an iPhone user can safely explore and download new apps with reckless abandon in a way Android users simply cannot. Apple products work better together than Google products do, despite being less widely interoperable. And while iPhones arenāt (historically) customizable, they are generally more consumer-friendly, secure and durable.
This isnāt to say one model is better than another. Typically both closed and open systems have a part to play as any new computing platform enters the market.
Meta is angling to be the "open platform" of AI.
Zuck has bet on Meta being the āopen platformā of the emerging AI space. The open, interoperable, enterprise-friendly platform. Llama 3.1 shows Meta doubling down on this strategy in 3 ways -
1. Can be run locally
A major blocker for enterprises adopting LLMs thus far, has been the requirement to lose control of your data to another company. Previously, even simple GPT calls requires providing OpenAI valuable user-data in the prompt context. And finetuning a private model for your own purposes requires giving up huge parts of your data to both the foundational model, and whatever service youāre using to finetune. Meta faces an additional headwind here, given their previous track record around using user data and respecting privacy.
Meta made Llama 3.1 available to be downloaded and run locally. This lets enterprises run this powerful foundational model on their own premises, eliminating any concerns about data moving between companies. This means finetuning and distilling Lama 3.1 can also be done locally, and arguably for free (if enterprises have the compute ready).
Releasing their biggest model (405B) also available to download and use locally, is particulalry telling here. 405B is over 231 GB - impossible for any hacker to play with locally. Evidence - here's (opens in a new tab) 405B running on an NVIDIA 4090, taking 30 minutes to produce the word "the".
Meta is aiming squarely at enterprises here - large companies are the only ones with the infrastructure to run this model. It also makes Llama3.1 an extremely formidable from a cost perspective as well. Instead of paying another company per inference for access to a closed model, why not download an equally strong model for free and run it yourself?
2. Distributing via inference as a service platforms
Meta also made the decision to not host Llama 3.1 themselves, and instead lean on the existing inference as a service players for distribution. Unlike OpenAIās GPTs, which can be called from OpenAIās APIs, Llama3.1 is only available via platforms like Groq (opens in a new tab), togetherAI (opens in a new tab) and AWS bedrock (opens in a new tab).
Signals like this help dissuade concerns around vendor lock in. It also affirms Metaās commitment to focusing on developing foundational models, and relying on the ecosystem for distribution and powering inference.
On a practical note, this also helps small-mid companies that may not have the compute to run Llama 3.1 locally, but are still concerned about data privacy. Inference as a Service platforms like groq have committed to providing inference while retaining 0 data.
3. Easily customizable
Llama 3.1 is easier and cheaper to customize for specific needs, than any similarly sized foundational model. Because Llama is available to download and the model weights are open, it can easily be finetuned on your own data. And in contrast with the āfinetune as a serviceā vendors, developers have a lot of control over how they want to finetune Llama 3.1 (how much GPU to provide for finetuning, how long youād like it to take, how much it costs etcā¦). Dell (opens in a new tab) also noted that Llama 3.1 is particularly suited for synthetic data generation, effectively letting it assist in itās own finetuning
Meta also released 3 versions of Llama 3.1 of varying sizes - and each is publicly available to be distilled, quantized or further finetuned. Assuming teams have the GPU, this affords a ton of flexibility over model size and performance against specific tasks.
Itās clear Meta wants you to use Llama 3.1 as a starter model to be customzied for your own needs, because theyāve publicly released cookbooks for finetuning (opens in a new tab) and quantizing (opens in a new tab) Lama 3.1
What's Meta's long term play here?
Meta's approach does feel unique compared with its peers. Meta has no hand in the compute, inference or "LLMOps" side of things. They seem to be focused on producing useful models to the open source commnity. For example, last year they release LlamaGuard, a small language model finetuned to evaluate conversations between humans and agents. This is a small language model, finetuned from Llama2, and then released to the OSS community. This makes me wonder how (or if?) Meta plans on ever 'cashing in' or monetizing on its AI work.Amazon, Google and Apple have all made steps towards more obviously monetizable aspects of AI - compute, inference, evaluation, distribution etc... Meta, in contrast, is clearly forgoing all of these opportunities (for now), in favor of focusing only on releasing models (foundational models such as Llama 3.1, or specialized smaller models like LlamaGuard.)
Perhaps Meta believes the future of AI is multiple models of varying sizes; some large general purpose models, alongside more specialized smaller models. Maybe they believe that more of these models being by, or at least from, Meta can only be a good thing.
Or perhaps Meta is learning from its experience with React and React Native. The open source community helped make React/React Native better, which in turn benefitted Meta's own products. It also helped widen their talent funnel (more React devs to hire), and position their brand as "thought leaders" of the ecosystem.
Very grateful for all the OSS contributions Meta will make to the space while they figure this out!
Ā© lytix