AMD Instinct MI350P: High-Performance AI Accelerator for Conventional Servers

Sports News » AMD Instinct MI350P: High-Performance AI Accelerator for Conventional Servers

2 weeks ago 7

Preview AMD Instinct MI350P: High-Performance AI Accelerator for Conventional Servers

AMD has unveiled the Instinct MI350P, a new AI accelerator designed for conventional air-cooled servers in a PCIe Add-in Card format. This GPU boasts impressive capabilities from the MI350 family, notably its 144 GB of HBM3E memory and a substantial 600W power consumption. Significantly, it’s the first AMD GPU to feature a 12V-2×6 power connector, a 16-pin plug necessary for its high power demands. This release targets enterprises seeking to run inference, RAG, or generative AI workloads on-premise without the need for OAM platforms, dedicated racks, or liquid cooling systems.

The MI350P is a full-height, dual-slot PCI card equipped with a passive cooling system, making it suitable for standard servers with internal airflow. AMD positions it as a direct integration solution for existing infrastructure, supporting up to eight cards per server. It’s optimized for small, medium, and large LLM models, enterprise inference, and RAG pipelines. Major players like Dell have already confirmed support for the MI350P in their PowerEdge XE7745 and R7725 servers, starting July 2026, enabling generative and agentic AI deployments without significant data center redesigns.

Page Contents

AMD Instinct MI350P Specifications

The AMD Instinct MI350P is built using a combination of TSMC’s 3nm and 6nm nodes. Internally, it features a graphics chip with 8,192 Stream Processors based on the AMD CDNA 4 architecture. These are complemented by 128 Compute Units, 512 Matrix Cores, and a maximum clock speed of 2,200 MHz. Its standout feature is its memory capacity, offering a substantial 144 GB of HBM3E with a 4,096-bit interface, delivering a memory bandwidth of up to 4 TB/s. It also includes 128 MB of last-level cache and has a power consumption of up to 600W, which is configurable down to 450W. As mentioned, it utilizes the new 12V-2×6 power connector.

In terms of theoretical performance, AMD claims up to 4.6 PFLOPs for MXFP4 and MXFP6, 2.3 PFLOPs for MXFP8/OCP-FP8, 1.15 PFLOPs for FP16/BF16 Matrix, 72 TFLOPs for FP32/FP16, and 36 TFLOPs for FP64. AMD’s presentation slides also differentiate between peak and delivered performance. For instance, MXFP4 is listed at 2,299 TFLOPs delivered versus 4,600 TFLOPs peak, and memory bandwidth at 3.6 TB/s delivered versus 4.0 TB/s peak. It’s important to note that AMD designates these figures as preliminary estimates based on engineering projections or early measurements, and they may be subject to change.

The MI350P essentially acts as a scaled-down version of the MI350X, adapted for the PCIe form factor. AMD is not using defective MI350X chips; instead, it features a smaller configuration derived from a chiplet design: a single IOD with four XCDs, compared to the MI350X’s two IODs and eight XCDs. This explains why the MI350P has half the Compute Units, Matrix Cores, memory, and memory bandwidth of the MI350X, which is an OAM card with 288 GB of HBM3E and 8 TB/s bandwidth.

Rivalry with NVIDIA GPUs

The AMD Instinct MI350P does not directly compete with high-end NVIDIA offerings like the B200 SXM/HGX. Instead, it addresses a less catered-to segment: server AI accelerators in a PCIe format with substantial HBM memory. NVIDIA’s offering in this space includes the RTX PRO 6000 Blackwell Server Edition, also a 600W card, but it features 96 GB of GDDR7 memory and is not an HBM GPU. The NVIDIA H200 offers 141 GB of HBM3E and 4.8 TB/s bandwidth, but it’s part of the Hopper generation and deployed on more specialized platforms.

In essence, AMD is targeting a niche that has been overlooked in the race towards increasingly dense AI racks. While OAM, SXM, UBB, and HGX platforms offer maximum performance, their power, cooling, validation, and integration requirements are prohibitive for many customers. The Instinct MI350P provides a less extreme, more modular, and easier-to-integrate alternative for existing servers. This marks AMD’s return to the PCIe Instinct format after nearly half a decade, positioning it well in a segment where NVIDIA currently lacks a direct equivalent in the PCIe server GPU with HBM memory space.

Furthermore, there are ample supplies of both chips and memory for this segment. The entire industry has shifted production from consumer hardware to AI data centers, recognizing the immense profitability and immediate sales of these components. It’s an opportune moment for AMD to capitalize on the demand for AI hardware.