Foundation models are updated, retrained and silently swapped behind the APIs your business relies on. Shoothill is launching a free tool that catches it when they slip.
There’s a quiet assumption running through almost every business currently using AI: that the model you bought into is the model you’re still using.
It often isn’t.
The large language models that sit behind ChatGPT, Claude, Gemini, Copilot and the rest are not static products. They’re updated, retrained, fine-tuned and routed through changing infrastructure on a continuous basis. Most of those changes aren’t announced. A handful are flagged in release notes that almost nobody reads. The version that drafted your contract reviews flawlessly in March may behave noticeably differently in May, with no banner, no email, no warning.
That matters because the AI tools you’ve embedded into your business, the ones writing customer emails, summarising reports, processing forms, generating quotes, checking compliance, all depend on the model behind them behaving consistently. When the model shifts, the output shifts. And the first person to notice is rarely you. It’s usually a customer, a regulator, or somebody on your team who happens to spot something off in a draft they were about to send.
We’ve been integrating AI into UK businesses for long enough now to have seen this happen more than once. A workflow that was reliable on Monday is producing subtly wrong outputs by Friday. Sometimes the change is small enough to live with. Sometimes it isn’t. Either way, finding out in real time matters.
So we built the thing we wished existed.
What the AI Signal does
signal.shoothill.ai is a continuous, public benchmark of every major AI model. Every hour, ten of the most widely-used models, from the flagship ones (GPT-5.5, Gemini 2.5 Pro, Claude Opus 4.7) to the cheaper, faster variants most businesses actually deploy at scale (GPT-5.4 mini, Gemini 2.5 Flash, Claude Haiku 4.5), are put through the same fixed library of 50 test prompts. Each answer is scored automatically. Pass, fail or partial credit. No human grading, no judgement calls, no room for the score to drift on us.
The dashboard surfaces six headline measures, in plain English:
- Gets facts right. How often the model gives the correct answer rather than making something up.
- Thinks through problems. How well it handles multi-step logic, maths and planning.
- Follows instructions. How closely the output matches the format and constraints it was given.
- Stays consistent. How stable its answers are when the same question is asked repeatedly.
- Business ready. How well it handles real office work: extraction, calculation, redaction, refusals.
- Made-up answers. The percentage of responses where the model invented a fact. Lower is better.
These feed five weighted pillars (Truthfulness 30%, Reasoning 25%, Discipline 15%, Stability 15%, Business readiness 15%) which combine into a single AI Signal score from 0 to 100 per model. The weights are fixed. A score today is directly comparable to one from last month, last quarter or this time next year.
The public leaderboard sorts models by AI Signal by default, with the top three highlighted gold, silver and bronze. When a model you’ve told us you depend on slips, we email you. That’s it. No dashboard you have to log into daily, no alerts about things that don’t matter to you.
Why we publish the methodology, and why 40 of the 50 questions are private
The full methodology is on the site, including 10 of the 50 test prompts as worked examples. The remaining 40 are deliberately kept private. If we published every prompt, the providers we’re testing could train against them and the index would tell you nothing. Withholding them is what keeps the score honest.
That trade-off, transparent enough to be defensible, opaque enough to be useful, is the one piece of design we spent the longest getting right.
Free, forever
We aren’t charging for it. The AI Signal is free for any business that wants to use it, and it will stay that way.
We do this work because Shoothill builds AI integrations for clients who need them to keep working. The AI Signal makes our own work better, and it makes the wider UK AI ecosystem a bit more honest. You can sign up at signal.shoothill.ai, pick the models you want to monitor, and we’ll do the rest. If the model you actually depend on isn’t on the list, you can request it.
If you’re running AI in your business and you don’t have a way of knowing when it changes, that’s a gap worth closing. Quietly, this week, before you find out the way many businesses do: through somebody else.