Inference Economics

Articles (7)

Three dark custom AI chips mounted on one glowing substrate with memory stacks beneath it, three diverging paths behind them.

Three Bets Against Nvidia's Inference Margin, One Shared Dependency

OpenAI, Qualcomm, and Etched are betting against Nvidia's inference margin. Escaping it means queuing for the same TSMC packaging and memory. Most ASIC challengers die on software, not silicon.

AI InfrastructureNVIDIA

SCN Staff·Jul 2, 2026

Thermodynamic ASIC die on a black probe-station carrier, with copper traces and a green stochastic-path pattern.

Emergingnews

Thermodynamic Computing's First Silicon Is Back from the Fab. The Power Math Comes Next.

Normal Computing's CN101 is in characterization. Extropic has a prototype platform, an MIT-co-authored arXiv preprint, and an ETH Zurich hackathon in June. After two years as a manifesto, thermodynamic computing is producing the kind of artifacts readers can evaluate.

AI InfrastructureSemiconductor Manufacturing

SCN Staff·May 19, 2026

Two parallel rows of AI server infrastructure diverging from a central fault line, representing frontier AI compute splitting between established and emerging hardware ecosystems.

AIanalysis

DeepSeek V4-Pro on Ascend 950PR: The Two-Stack AI Reality

DeepSeek V4-Pro runs on Huawei Ascend 950PR as the State Department pivots export controls from chip access to model IP, describing two parallel AI stacks.

Export Controls & Trade PolicyAI Infrastructure

SCN Staff·Apr 28, 2026

Illustration of token streams routing through a central AI communication layer to a small set of active compute nodes inside a larger data center, representing sparse activation and expert-parallel communication.

AIanalysis

UCCL-EP vs. NCCL EP: Portability or Consolidation for MoE Communication?

Two new expert-parallel efforts point to different futures for MoE systems: one built for heterogeneous fleets, the other folded into NVIDIA’s stack.

AI InfrastructureInference Economics

SCN Staff·Apr 13, 2026

Macro view of a red lobster embedded among AI chips, cooling elements, and high-bandwidth memory, illustrating the hardware bottlenecks that keep agentic AI expensive.

AInews

AI Costs Are Falling 1,000x. It Is Not Enough.

AI inference costs have fallen 1,000x yet agentic workloads still cost hundreds daily, as Anthropic blocking OpenClaw from subscriptions proves consumer pricing cannot absorb real infrastructure economics.

AI InfrastructureAgentic AI

SCN Staff·Apr 8, 2026

SambaNova Systems CEO Rodrigo Liang holds the SN40L Reconfigurable Dataflow Unit (RDU), the company's fourth-generation AI inference chip. SambaNova's dataflow architecture makes it one of the most likely candidates to demonstrate whether FlatAttention's collective-primitive approach generalizes beyond the unnamed hardware tested in the April 2026 paper.

AIanalysis

FlatAttention Claims 4× Speedup Over FlashAttention-3 — But on What Hardware?

FlatAttention claims 4× speedup over FlashAttention-3 on unnamed tile-based accelerators. No code, no hardware vendor, no deployment path yet.

AI InfrastructureInference Economics

SCN Staff·Apr 7, 2026

d-Matrix Corsair accelerator card and GigaIO™ SuperNODE™ rack

AInews

d-Matrix Acquires GigaIO's Data Center Business, Betting That Inference Is a Systems Problem

d-Matrix acquired GigaIO's data center business, gaining FabreX PCIe memory fabric and SuperNODE rack-scale technology to build a vertically integrated AI inference platform around its Corsair accelerator.

AI InfrastructureInference Economics

SCN Staff·Apr 2, 2026