Inference Economics
Articles (7)

Three Bets Against Nvidia's Inference Margin, One Shared Dependency
OpenAI, Qualcomm, and Etched are betting against Nvidia's inference margin. Escaping it means queuing for the same TSMC packaging and memory. Most ASIC challengers die on software, not silicon.

Thermodynamic Computing's First Silicon Is Back from the Fab. The Power Math Comes Next.
Normal Computing's CN101 is in characterization. Extropic has a prototype platform, an MIT-co-authored arXiv preprint, and an ETH Zurich hackathon in June. After two years as a manifesto, thermodynamic computing is producing the kind of artifacts readers can evaluate.

DeepSeek V4-Pro on Ascend 950PR: The Two-Stack AI Reality
DeepSeek V4-Pro runs on Huawei Ascend 950PR as the State Department pivots export controls from chip access to model IP, describing two parallel AI stacks.

UCCL-EP vs. NCCL EP: Portability or Consolidation for MoE Communication?
Two new expert-parallel efforts point to different futures for MoE systems: one built for heterogeneous fleets, the other folded into NVIDIA’s stack.

AI Costs Are Falling 1,000x. It Is Not Enough.
AI inference costs have fallen 1,000x yet agentic workloads still cost hundreds daily, as Anthropic blocking OpenClaw from subscriptions proves consumer pricing cannot absorb real infrastructure economics.

FlatAttention Claims 4× Speedup Over FlashAttention-3 — But on What Hardware?
FlatAttention claims 4× speedup over FlashAttention-3 on unnamed tile-based accelerators. No code, no hardware vendor, no deployment path yet.

d-Matrix Acquires GigaIO's Data Center Business, Betting That Inference Is a Systems Problem
d-Matrix acquired GigaIO's data center business, gaining FabreX PCIe memory fabric and SuperNODE rack-scale technology to build a vertically integrated AI inference platform around its Corsair accelerator.