Be part of prime executives in San Francisco on July 11-12, to listen to how leaders are integrating and optimizing AI investments for achievement. Learn More
Meta, the social media large previously often called Fb, has been a pioneer in synthetic intelligence (AI) for greater than a decade, utilizing it to energy its services reminiscent of Information Feed, Fb Advertisements, Messenger and digital actuality. However because the demand for extra superior and scalable AI options grows, so does the necessity for extra progressive and environment friendly AI infrastructure.
On the AI Infra @ Scale occasion right this moment — a one-day digital convention hosted by Meta’s engineering and infrastructure groups — the corporate introduced a collection of recent {hardware} and software program tasks that intention to assist the subsequent era of AI purposes. The occasion featured audio system from Meta who shared their insights and experiences on constructing and deploying AI methods at giant scale.
Among the many bulletins was a brand new AI knowledge middle design that might be optimized for each AI coaching and inference, the 2 primary phases of growing and operating AI fashions. The brand new knowledge facilities will leverage Meta’s personal silicon, the Meta coaching and inference accelerator (MTIA), a chip that may assist to speed up AI workloads throughout numerous domains reminiscent of laptop imaginative and prescient, pure language procession and advice methods
Meta additionally revealed that it has already constructed the Analysis Supercluster (RSC), an AI supercomputer that integrates 16,000 GPUs to assist practice giant language fashions (LLMs) just like the LLaMA project, which Meta introduced on the finish of February.
“We’ve been constructing superior infrastructure for AI for years now, and this work displays long run efforts that may allow much more advances and higher use of this know-how throughout every little thing we do,” Meta CEO Mark Zuckerberg stated in a press release.
Constructing AI infrastructure is desk stakes in 2023
Meta is much from being the one hyperscaler or giant IT vendor that is considering purpose-built AI infrastructure. In November, Microsoft and Nvidia introduced a partnership for an AI supercomputer within the cloud. The system advantages (not surprisingly) from Nvidia GPUs, related with Nvidia’s Quantum 2 InfiniBand networking know-how.
A number of months later in February, IBM outlined particulars of its AI supercomputer, codenamed Vela. IBM’s system is utilizing x86 silicon, alongside Nvidia GPUs and ethernet-based networking. Every node within the Vela system is full of eight 80GB A100 GPUs. IBM’s aim is to construct out new basis fashions that may assist serve enterprise AI wants.
To not be outdone, Google has additionally jumped into the AI supercomputer race with an announcement on Could 10. The Google system is utilizing Nvidia GPUs together with customized infrastructure processing models (IPUs) to allow fast knowledge stream.
Meta is now additionally leaping into the customized silicon house with its MTIA chip. Customized constructed AI inference chips are additionally not a brand new factor both. Google has been constructing out its tensor processing unit (TPU) for a number of years and Amazon has had its personal AWS inferentia chips since 2018.
For Meta, the necessity for AI inference spans a number of facets of its operations for its social media websites, together with information feeds, rating, content material understanding and proposals. In a video outlining the MTIA silicon, Meta analysis scientist for infrastructure Amin Firoozshahian commented that conventional CPUs will not be designed to deal with the inference calls for from the purposes that Meta runs. That’s why the corporate determined to construct its personal customized silicon.
“MTIA is a chip that’s optimized for the workloads we care about and tailor-made particularly for these wants,” Firoozshahian stated.
Meta can be an enormous consumer of the open supply PyTorch machine studying (ML) framework, which it initially created. Since 2022, PyTorch has been below the governance of the Linux Basis’s PyTorch Basis effort. A part of the aim with MTIA is to have extremely optimized silicon for operating PyTorch workloads at Meta’s giant scale.
The MTIA silicon is a 7nm (nanometer) course of design and may present as much as 102.4 TOPS (Trillion Operations per Second). The MTIA is a part of a extremely built-in method inside Meta to optimize AI operations, together with networking, knowledge middle optimization and energy utilization.
The information middle of the long run is constructed for AI
Meta has been constructing its personal knowledge middle for over a decade to satisfy the wants of its billions of customers. Up to now, it has been doing simply superb, however the explosive progress in AI calls for means it’s time to do extra.
“Our present era of knowledge middle designs is world class, vitality and energy environment friendly,” Rachel Peterson, VP for knowledge middle technique at Meta stated throughout a roundtable dialogue on the Infra @ scale occasion. “It’s truly actually supported us by means of a number of generations of servers, storage and community and it’s actually in a position to serve our present AI workloads rather well.”
As AI use grows throughout Meta, extra compute capability might be wanted. Peterson famous that Meta sees a future the place AI chips are anticipated to eat greater than 5x the facility of Meta’s typical CPU servers. That expectation has induced Meta to rethink the cooling of the info middle and supply liquid cooling to the chips in an effort to ship the correct degree of energy effectivity. Enabling the correct cooling and energy to allow AI is the driving power behind Meta’s new knowledge middle designs.
“As we glance in direction of the long run, it’s all the time been about planning for the way forward for AI {hardware} and methods and the way we will have probably the most efficiency methods in our fleet,” Peterson stated.