The Power Switcher Report

Section 1

Who is the Power Switcher?

The default mental model of an OpenRouter user is someone who picks a single LLM and stays there — pin Claude, pin GPT, pin Gemini, never look back. That is not what the data shows.

Across N_users users and token_volume, e.g. "12 trillion" tokens routed in OpenRouter's date_range data, a clear cohort emerges: users who actively switch models — across providers, across sessions, sometimes within a single conversation. We call them Power Switchers.

Power Switchers are a small fraction of the total user base — roughly cohort_pct — but they generate token_share_pct of OpenRouter's tokens and revenue_share_pct of its revenue. They use a median of median_models_pw_per_week distinct models per week, against median_models_typical for the typical user. They are the cohort whose behavior most closely tracks where the AI market is going — because they are the ones doing the picking.

Chart 1-1

Histogram. Distribution of distinct models used per user per week. Power Switcher band highlighted.

x: distinct models per user per week (1, 2, 3, 4, 5, 6+) y: share of users in bucket

Chart 1-2

Stacked bar. Power Switchers vs Rest, across share of users, share of tokens, share of revenue, share of distinct apps. Visualizes the disproportionate weight of the cohort.

x: metric (users / tokens / revenue / apps) y: share, stacked Power Switchers vs Rest

tweetable line for this section, e.g. "The cohort that drives the market is also the smallest one in it."

Section 2

When they switch — the triggers

Power Switchers do not switch at random. They switch when something about the work changes — a longer context arrives, a task pivots from chat to code, a model returns a structurally broken JSON object, latency spikes on a paid endpoint, or a coding agent fails its third retry.

We classified N_session_transitions mid-session model transitions across the cohort and grouped them into five trigger families:

01 Task pivot The user's prompt changes domain (writing → code, short-form → long-form, ideation → execution). pct of transitions
02 Length pressure Context length crosses the comfortable window of the current model. pct of transitions
03 Quality failure The previous response was structurally wrong (JSON schema violation, tool-call malformation, hallucinated reference, refusal). pct of transitions
04 Cost optimization The user falls back to a cheaper model after the expensive one delivered. pct of transitions
05 Latency / availability The chosen provider is slow or down. pct of transitions

Chart 2-1

Horizontal bar chart. Trigger families ranked by share of mid-session transitions. Color-coded.

x: share of transitions y: trigger family

Chart 2-2

Sankey. Top 10 source-destination model pairs in mid-session switches. Lets readers see "Sonnet → Opus when X" as a visual story.

left nodes: source model right nodes: destination model width: transition volume

most surprising single-trigger finding, e.g. "Quality failure drives more switches than cost — by a factor of N."

Section 3

How they route — by task

This is the section dev-Twitter will screenshot. We classified the top tasks Power Switchers route on, then computed the model with the largest revealed-preference share for each — the model the market actually picks, not the model that wins the benchmark.

Chart 3-1 — the headline cross-tab

Heatmap. Rows = task families. Columns = top 8 models. Cells = share of revealed preference (color intensity). Tasks: Coding (Python), Coding (Rust / systems), Coding (SQL), Coding (frontend / React), Coding (agentic loops), Long-form writing, Short-form writing, Reasoning / math, Long-context (>100k), Roleplay / character. Annotate the surprising cells.

x: top 8 models y: task family color: revealed-preference share

The clearest patterns:

finding_coding_python — e.g. "On Python data work, Power Switchers pick {model} {X}% of the time, despite {other model} leading {benchmark}."
finding_long_context — e.g. "Above 100k tokens, {model} captures {Y}% of routes — even when {other model} costs less per million tokens."
finding_agentic — e.g. "On agentic coding loops inside {harness}, {model} wins {Z}% — the market has already picked, while public benchmarks still show parity."

Chart 3-2

Programming-language cut. Top 3 models per language with their revealed-preference share. Shows how the cohort lens sharpens the State of AI 2025 "programming" finding.

x: language y: top 3 models per language with share

single most quotable cross-model coding fact — the headline of the dev-Twitter campaign.

Section 4

How they route — by cost

Power Switchers are not uniformly cost-sensitive. We see two clear sub-cohorts inside the cohort: frontier-only switchers (who route exclusively across the top of the price curve) and cost-aware switchers (who fall back to OSS or smaller models on tasks where quality holds).

Chart 4-1

Scatter. Each dot = a Power Switcher. Two visible clusters labeled (frontier-only / cost-aware).

x: median cost per token (USD) y: median distinct models per week

The cost-aware sub-cohort tells the OSS-share story. When they fall back, they fall back to:

top OSS escalation target — e.g. "Llama 3.x: N% of OSS fallbacks"
2nd OSS escalation target
3rd OSS escalation target

Provider economics cross-cut (the Anthropic 12%/46% finding, applied to this cohort): Power Switchers route provider_X_token_share of their tokens to provider_X but pay provider_X_dollar_share of their dollars to that provider. The token-to-dollar gap by provider:

Chart 4-2

Diverging bar. Each bar = a provider. Length above zero = dollar share. Length below zero = token share. The gap is the headline visual.

x: provider y: dollar share (above zero) vs token share (below zero)

single most quotable provider-economics fact for this cohort.

Section 5 · The new conceptual frame

Power Switchers as leading indicator

This is the section that justifies the report's existence. Power Switchers do not just consume the market — they predict it.

For every model launched in time_window, we measured how long it took for Power Switchers to reach a stable share of routes vs how long it took the broader user base. The gap is consistent: Power Switchers reach steady-state adoption a median of lead_days days before the rest of the user base.

Chart 5-1

Time-series, repeated for 3 model launches. Two lines per panel: Power Switchers (lead) vs Rest (lag). Launches: GPT-5.5, Opus 4.7, third launch.

x: days since model launch y: share of cohort routing to model

This is the framing that makes the cohort interesting rather than just valuable. Watching what the Power Switchers route to this week is watching what the rest of the market will route to next quarter.

Chart 5-2

Retention curve. Mirrored from the Cinderella Glass Slipper finding in State of AI 2025. Cohort retention by week since first switch, by initial-model-pair. Connects this report to the prior franchise.

x: weeks since first switch y: cohort retention

leading-indicator headline — e.g. "The Power Switchers reach {model X} {N} days before everyone else does."

Section 6 · The bait

Three predictions for Q3–Q4 2026

Three falsifiable, dated, named-model claims — engineered to be argued with on Twitter and re-checked in the Q4 follow-up report. Each prediction includes (a) the trigger condition, (b) the metric, (c) the threshold, (d) the date the report will check it.

1

prediction_1 — must name a specific model, a specific metric, a specific date. e.g. "By the Q4 2026 report, {OSS model} will pass {Anthropic / OpenAI model} in raw token share among Power Switchers on agentic coding tasks."

Check date: Q4 2026 follow-up

2

prediction_2 — same shape. Different domain (long-context, reasoning, multimodal).

Check date: Q4 2026 follow-up

3

prediction_3 — same shape. Different domain (provider economics, geographic, OSS share).

Check date: Q4 2026 follow-up

We will publish the check in the Q4 Power Switcher Report. Public predictions, public scores.

Section 7

Methodology

Power Switcher cohort definition: a user is classified as a Power Switcher in week W if they routed to ≥ N distinct models in W and ≥ M distinct models in the preceding 4-week rolling window. The threshold was chosen rationale — e.g. "to capture the top decile of model diversity while excluding bot/eval traffic".

Privacy-preserving aggregation: all reported figures are computed on user-anonymized session data, with all prompts and responses excluded from the analysis pipeline. Only metadata (model, token count, timestamp, task classification) flows to the analysis surface. The classification pipeline itself is open-sourced alongside this report.

Time window

date_range

Sample — users

N_users

Sample — sessions

N_sessions

Sample — tokens

token_volume

Bot / CI filter

filter description; residual contamination bound at {pct}

Task classifier accuracy

accuracy on hand-labeled held-out set

Open dataset

N anonymized session records on Hugging Face under MIT license — URL

Preprint

arXiv URL — full methodology

Limitations. Self-hosted models routed through OpenRouter are included; users routing entirely outside OpenRouter are not visible. Task classification is a learned classifier with accuracy as noted above on a hand-labeled held-out set. Bot and CI / eval traffic was filtered using the description above; residual contamination is bounded.

Section 8

Acknowledgements + advisory bench

Co-author: Stanford HAI / MIT IDE co-author.

Advisory reviewers — these reviewers provided pre-publication feedback. They did not author the report and the conclusions are OpenRouter's:

Tamay BesirogluEpoch AI
Anton KorinekBrookings · Anthropic Economic Advisory Council
Ethan MollickWharton · Anthropic Economic Advisory Council
routing literature reviewer — e.g. Tatsu Hashimoto (Stanford)Routing / evaluation
optional 5th revieweraffiliation

Data team: Megamind · Justin Summerville · data team attribution.

Section 1

Who is the Power Switcher?

Section 2

When they switch — the triggers

Section 3

How they route — by task

Section 4

How they route — by cost

Section 5 · The new conceptual frame

Power Switchers as leading indicator

Section 6 · The bait

Three predictions for Q3–Q4 2026

Section 7

Methodology

Section 8

Acknowledgements + advisory bench

Want this kind of analysis on your team's usage?