Inference Economics
Articles (5)

DeepSeek V4-Pro on Ascend 950PR: The Two-Stack AI Reality
DeepSeek V4-Pro runs on Huawei Ascend 950PR as the State Department pivots export controls from chip access to model IP, describing two parallel AI stacks.

UCCL-EP vs. NCCL EP: Portability or Consolidation for MoE Communication?
Two new expert-parallel efforts point to different futures for MoE systems: one built for heterogeneous fleets, the other folded into NVIDIA’s stack.

AI Costs Are Falling 1,000x. It Is Not Enough.
AI inference costs have fallen 1,000x yet agentic workloads still cost hundreds daily, as Anthropic blocking OpenClaw from subscriptions proves consumer pricing cannot absorb real infrastructure economics.

FlatAttention Claims 4× Speedup Over FlashAttention-3 — But on What Hardware?
FlatAttention claims 4× speedup over FlashAttention-3 on unnamed tile-based accelerators. No code, no hardware vendor, no deployment path yet.

d-Matrix Acquires GigaIO's Data Center Business, Betting That Inference Is a Systems Problem
d-Matrix acquired GigaIO's data center business, gaining FabreX PCIe memory fabric and SuperNODE rack-scale technology to build a vertically integrated AI inference platform around its Corsair accelerator.