NVIDIA Dynamo is a reality check on the broken economics of agentic coding
NVIDIA is rebuilding the inference stack with KV-aware routing because traditional architectures cannot survive the hidden cost of agentic API loops.
news, tips, and reviews that make thinking machines useful
XTag archive
Everything we’ve published under Infrastructure so far.
Follow this lane
Infrastructure appears in infrastructure and deployment, a useful buying-intent lane for relevant AI products and services.
NVIDIA is rebuilding the inference stack with KV-aware routing because traditional architectures cannot survive the hidden cost of agentic API loops.
OpenAI pushed GPT-5.5 to Chat Completions and Responses with a 1M context window, while putting GPT-5.5-pro behind Responses. The real product is fewer retries — and a nudge off legacy chat endpoints.
Perplexity is deploying GPT-5.5 as the default orchestrator for its agentic tier. It proves the next phase of AI architecture is a barbell: heavy routers delegating to cheap generators.
The introduction of inline audio tags in Gemini 3.1 TTS isn't just a formatting trick. It is a fundamental shift from probabilistic guessing to deterministic steering, aimed directly at the hidden costs of inference.
Google’s TPU 8i and 8t announcement sounds like a hardware story. It's actually a confession that AI agents turn latency and serving costs into your biggest product bottlenecks.