Unity™ — Proximal Cloud Hardware Platform

Built for population scale

One platform.
A 50× trajectory.

Unity is engineered to bend the cost-and-performance curve of enterprise AI — delivering more tokens, more parameters and more reach per rack, every year through 2030.

Platform improvement

0×

Compounding gains across silicon, memory and fabric by 2030.

Total cost

0×

Lower cost per token than today's single-vendor rack-scale stacks.

Performance

0×

More raw performance for the workloads enterprises actually run.

The enterprise question

Real decisions span all of your data.

Fast, private, on-prem. The answers that move a business don't live in one database — they live across every modality at once.

How do the latest tariffs or steel prices affect my quarterly results, my ability to deliver products, my revenue and my cost?

— paraphrasing Larry Ellison on the enterprise AI workload

Enterprise truth lives across four data modalities

Relational SQL

Transactions, financials, ERP.

Vector DB

Semantic search, embeddings.

Graph DB

Relationships, supply chain.

LLM Inference

Reasoning, synthesis.

Answering these questions requires running all four — together, securely, at scale. Today's infrastructure can't.

Today's leading approach

Rack-scale, single-vendor compute.

The current state of the art orchestrates three specialized rack types across separate fabrics — powerful, but heavy, costly and far from sovereign.

GPU RackPREFILL / ATTENTION

Vera Rubin NVL72

NVLink spine

72 high-bandwidth GPUs co-located for KV-cache-heavy work.

Inference RackDECODE / FFN

Rubin CPX

Direct chip-to-chip spine

Token-by-token decode at scale, fed by interim activations.

CPU RackORCHESTRATION

Vera CPU

Spectrum-X Ethernet

x86-class CPUs running the control plane and data plumbing.

Orchestrated by Nvidia Dynamo · KV-aware routing (ATTN ⇄ FFN)

Powerful — but multi-rack, GPU-heavy, single-vendor, and far from sovereign or affordable for the rest of the world.

The directional shift

From racks of silos
to memory-centric nodes.

CPU, GPU and xPU stop living in separate buildings — and start living next to memory, each specialized for the part of the workload it's best at.

Yesterday

Rack-scale silos

GPU RACK

XPU RACK

CPU RACK

3 racks, 3 fabrics, 3 vendors. Compute travels far to reach memory.

→

Tomorrow

Memory-centric integrated node

HBM · 128–512 GB

CPU

x86

Agentic Compute

GPU

Tensor

LLM · Prefill

xPU

Custom

Decode

One-hop memory fabric

DDR · SRAM · HBF

One node. All compute. Memory at the center.

KV-aware routing happens inside the node — not across three buildings of fabric.

The Unity™ solution

Two views.
One node.

The same memory-centric node, seen two ways. As an enterprise workload — four data modalities running side-by-side. And as a runtime — every LLM execution function sitting one hop from HBM.

View 01 · Workload

Four modalities, one node.

Unity™ Node · Compute.AI

HBM · 128 – 512 GB

MEMORY AT THE CENTER

LLM Inference

GPU

Relational SQL

CPU

Compute.AI

xPU

x86 · 64–128 cores
orchestrator

Vector DB

CPU / GPU

Graph DB

CPU

↕ 800G

4× 400G / 800G ETHERNET · UEC

View 02 · Execution

Every function, one hop.

Unity™ Node · One-hop runtime

HBM · 128 – 512 GB

MEMORY AT THE CENTER

▴ One hop ▴

▲

KV-cache

Attention

▲

LLM

Tensor

▲

Agents

x86

▲

Prefill

GPU

▲

Decode

xPU

▲

SQL

CPU

↕ 800G

4× 400G / 800G ETHERNET · UEC

Memory at the center

Every workload, every function reaches HBM in one hop — no rack-crossing penalties, no fabric tax.

Function-aware silicon

CPU runs agents and SQL. GPU runs LLM and prefill. xPU runs custom decode. Each function on the right engine.

Open, heterogeneous compute

x86 + GPU + xPU coexist on a shared memory plane. No single-vendor lock-in.

Standard 2U, standard Ethernet

4× 400G→800G uplinks over the Ultra Ethernet Consortium standard. Drops into any datacenter.

Meet the appliance

Compute.AI.
The Unity™ node, in a box.

A standard rack appliance with everything the platform needs — CPU, GPU, xPU, HBM and HBF — built around a one-hop memory plane.

Front

Signature red status line. Hot-swap cooling intakes. Tool-free rails.

3/4 perspective

Standard 2U / 3U form factor. Drops into any 19" rack.

Top

Full-width airflow. Optimized for high-density rack deployment.

Form factor

2U / 3U

Standard 19" rack

Power

1 KW

per node, nominal

Cooling

Air

Front-to-back

Network

4× 800G

UEC Ethernet

Platform roadmap — 50×

Three generations.
Costs cut by 3×. Performance up by 15×.

Result: a platform 50× better than today — with 10× more capacity. Memory falls in cost, the bandwidth fabric grows, and every layer of the node steps up together.

& 3×lower cost × 15×more performance = 50×platform improvement @ 10×more Capacity

GEN 12026

128 GB HBM

24 x86

220 Tensor

400 GbE

MEMORY

HBM128 GB1 TB/s

DDR5512 GB64 GB/s/ch

Memory cost $400K

GEN 22028

256 GB HBM

64 x86

512 Tensor

800 GbE

MEMORY

HBM256 GB2 TB/s

DDR6512 GB96 GB/s/ch

HBF-11 TB128 GB/s

Memory cost $300K

GEN 32030

512 GB HBM

96 x86

1024 Tensor

1600 GbE

MEMORY

HBM512 GB4 TB/s

DDR61 TB128 GB/s/ch

HBF-24 TB256 GB/s/card

Memory cost $200K

UNITY™ HARDWARE PLATFORM · 50× ROADMAP

Explore in 3D

Pull the appliance apart.

An interactive model of the Compute.AI node. Drag to rotate, scroll to zoom, hover any part to inspect it — or hover the list to highlight it on the model.

Drag · Scroll · Hover

Front bezel

Red status line · vents

x86 CPU

Agentic Compute · 24–128 cores

GPU

LLM · Tensor · Prefill

xPU

Custom · Decode

HBM memory

128 – 512 GB · at the center

HBF cards

4 / 8 / 16 configurable

800G UEC ports

4× uplinks · 3.2 Tbit / node

Inside the chassis

One chassis.
Three engines. N cards.

Pop the lid and the same one-hop architecture is laid out in silicon — x86 CPU for agentic compute, GPU for tensor work, and a stack of HBF cards sitting next to high-bandwidth memory.

Compute.AI · internal subsystem

01

x86 CPU

Agentic Compute

24 – 128 cores. Runs the orchestration, agents and control plane.

02

GPU

LLM · Tensor · Prefill

High-bandwidth tensor engine for prefill, attention and LLM inference.

03

xPU + HBF cards

Custom · Decode

Configurable stack of accelerator + High-Bandwidth Flash cards, sitting directly on the memory plane.

4

HBF cards

Entry — for compact inference workloads.

8

HBF cards

Standard — most enterprise deployments.

16

HBF cards

Max — large-model, long-context inference.

The platform, exploded

Every piece, in one frame.

A blowup of every component in the Compute.AI appliance — the silicon, the memory, the fabric and the network — stacked the way they actually sit.

Front bezel

Compute.AI · status line

Layer 01

Bezel & chassis frame

2U / 3U standard form, hot-swap intakes, tool-free rail rails.

x86 CPU board

Agentic compute · 24 – 128 cores

Layer 02

CPU · Agents & orchestration

The agentic compute plane — orchestrators, control logic, SQL and graph workloads.

GPU module

LLM · Tensor · Prefill

Layer 03

GPU · LLM & prefill

Tensor cores for prefill, attention, vector and LLM inference workloads.

xPU module

Custom · Decode

Layer 04

xPU · Decode engine

Custom silicon for token-by-token decode — the cheapest, most power-efficient path to high throughput.

HBM memory

128 – 512 GB · memory at the center

Layer 05

HBM · memory plane

One hop from every engine. Every workload, every function, lives next to HBM.

HBF card stack

4 / 8 / 16 cards · High-Bandwidth Flash

Layer 06

HBF · tiered memory

Configurable High-Bandwidth Flash cards extend the memory plane for long-context, large-model inference.

Network & backplane

4× 800G · UEC Ethernet

Layer 07

Backplane & uplinks

4× 400G→800G Ultra Ethernet uplinks. Standard, open, datacenter-ready.

Compute.AI · the Unity™ hardware platform, in seven layers

Scale · open interconnect

One rack.
Trillion-parameter scale.

An open 800G Ultra Ethernet (UEC) fabric stitches Compute.AI appliances into a single, coherent inference engine — multi-trillion-parameter models, sovereign and ready to deploy, in a single rack.

Compute.AI rack with Arista 800G UEC top-of-rack switch

800G UEC switch

3.2 Tbit / node

48 Compute.AI nodes

Open 800G UEC fabric.
One rack. Many models.

No proprietary spine. No vendor lock-in. Standard 800G Ultra Ethernet wires every appliance to every other — and to the top-of-rack switch — at line rate.

Ultra Ethernet Consortium · open standard

800G per port

Up to 3.2 Tbit/sec per node across 4 uplinks.

Open standard

Multi-vendor switches, optics and NICs — no single-vendor spine.

RDMA + lossless

Modern transport for KV-cache exchange and tensor parallelism.

Rack-scale models

3.3T parameters served from a single rack — no spine fabric needed.

48

nodes / rack

3.3T

parameters

2.5M

tokens / sec

48 KW

total power

Unity™ · Hardware Platform

Four principles. One node.

Memory-centric

HBM at the core. One-hop memory fabric across the node.

Heterogeneous

x86 + GPU + xPU under one roof, each on the right workload.

Open

Standard 2U, standard 800G UEC Ethernet, no lock-in.

Sovereign

Made in India.

0M

tokens / sec per rack

0T

parameters per rack

0×

platform improvement by 2030

Tech specs

Unity™ Node at a glance.

Form factor

Standard 2U / 3U rack node — drops into any datacenter

Memory

128 – 512 GB HBM · DDR · SRAM · HBF, memory at the center

CPU

x86 · 24–128 cores · agentic compute, orchestration & control plane

GPU

Tensor · LLM, prefill and attention workloads

xPU

Custom silicon · token-by-token decode engine

HBF cards

4 / 8 / 16 High-Bandwidth Flash cards — configurable per workload

Memory fabric

One-hop · every engine sits next to HBM

Networking

4× 400G / 800G UEC Ethernet · up to 3.2 Tbit / node

Workloads

LLM, KV-cache, agents, prefill, decode — all one hop from memory

Per rack

48 nodes · 48 KW · 3.3T parameters · 2.5M tokens/sec

Origin

Made in India · sovereign by design

One platform.A 50× trajectory.

Real decisions span all of your data.

Relational SQL

Vector DB

Graph DB

LLM Inference

Rack-scale, single-vendor compute.

From racks of silosto memory-centric nodes.

Rack-scale silos

Memory-centric integrated node

Two views.One node.

Memory at the center

Function-aware silicon

Open, heterogeneous compute

Standard 2U, standard Ethernet

Compute.AI.The Unity™ node, in a box.

Three generations.Costs cut by 3×. Performance up by 15×.

Pull the appliance apart.

One chassis.Three engines. N cards.

x86 CPU

GPU

xPU + HBF cards

Every piece, in one frame.

Bezel & chassis frame

CPU · Agents & orchestration

GPU · LLM & prefill

xPU · Decode engine

HBM · memory plane

HBF · tiered memory

Backplane & uplinks

One rack.Trillion-parameter scale.

Open 800G UEC fabric.One rack. Many models.

800G per port

Open standard

RDMA + lossless

Rack-scale models

Four principles. One node.

Memory-centric

Heterogeneous

Open

Sovereign

The dot in dot.ai —built in India, for population scale.

Unity™ Node at a glance.

One platform.
A 50× trajectory.

From racks of silos
to memory-centric nodes.

Two views.
One node.

Compute.AI.
The Unity™ node, in a box.

Three generations.
Costs cut by 3×. Performance up by 15×.

One chassis.
Three engines. N cards.

One rack.
Trillion-parameter scale.

Open 800G UEC fabric.
One rack. Many models.

The dot in dot.ai —
built in India, for population scale.