Inference Economics

Articles (5)

Two parallel rows of AI server infrastructure diverging from a central fault line, representing frontier AI compute splitting between established and emerging hardware ecosystems.
AIanalysis

DeepSeek V4-Pro on Ascend 950PR: The Two-Stack AI Reality

DeepSeek V4-Pro runs on Huawei Ascend 950PR as the State Department pivots export controls from chip access to model IP, describing two parallel AI stacks.

Export Controls & Trade PolicyAI Infrastructure
SCN Staff·
Illustration of token streams routing through a central AI communication layer to a small set of active compute nodes inside a larger data center, representing sparse activation and expert-parallel communication.
AIanalysis

UCCL-EP vs. NCCL EP: Portability or Consolidation for MoE Communication?

Two new expert-parallel efforts point to different futures for MoE systems: one built for heterogeneous fleets, the other folded into NVIDIA’s stack.

AI InfrastructureInference Economics
SCN Staff·
Macro view of a red lobster embedded among AI chips, cooling elements, and high-bandwidth memory, illustrating the hardware bottlenecks that keep agentic AI expensive.
AInews

AI Costs Are Falling 1,000x. It Is Not Enough.

AI inference costs have fallen 1,000x yet agentic workloads still cost hundreds daily, as Anthropic blocking OpenClaw from subscriptions proves consumer pricing cannot absorb real infrastructure economics.

AI InfrastructureAgentic AI
SCN Staff·
SambaNova Systems CEO Rodrigo Liang holds the SN40L Reconfigurable Dataflow Unit (RDU), the company's fourth-generation AI inference chip. SambaNova's dataflow architecture makes it one of the most likely candidates to demonstrate whether FlatAttention's collective-primitive approach generalizes beyond the unnamed hardware tested in the April 2026 paper.
AIanalysis

FlatAttention Claims 4× Speedup Over FlashAttention-3 — But on What Hardware?

FlatAttention claims 4× speedup over FlashAttention-3 on unnamed tile-based accelerators. No code, no hardware vendor, no deployment path yet.

AI InfrastructureInference Economics
SCN Staff·
d-Matrix Corsair accelerator card and GigaIO™ SuperNODE™ rack
AInews

d-Matrix Acquires GigaIO's Data Center Business, Betting That Inference Is a Systems Problem

d-Matrix acquired GigaIO's data center business, gaining FabreX PCIe memory fabric and SuperNODE rack-scale technology to build a vertically integrated AI inference platform around its Corsair accelerator.

AI InfrastructureInference Economics
SCN Staff·