Stop pretending candidates aren't using AI. Probe runs realistic engineering tasks where AI is allowed — and a second AI silently watches, scores against your rubric, and tells you who can actually think.
asyncio.wait_for with a bounded budget, then surface the timeout as a typed exception so the retry loop can decide whether to back off.Every candidate has Claude open in another tab. Whiteboarding tests recall. Take-homes test patience. Neither tells you whether someone can decompose a real problem, prompt well, or notice when their assistant is wrong.
Probe assumes the AI. The interview becomes a test of judgment — exactly what you were trying to measure all along.
You write the rubric. We run the interview. Your candidates work in a real editor with a real AI assistant they can ask anything — and a second AI silently watches and grades against the dimensions you care about.
The watcher agent has full context — the diff, the prompt history, the test runs — and never speaks to the candidate. It builds an evidence-cited score against your rubric and flags the moments that actually matter: shortcuts that hide tradeoffs, tests that don't really exercise the thing they claim to.
test_concurrent passes but the mock returns instantly — not real concurrency signal. Citation captured.Every score cites a moment from the transcript. Every recommendation can be traced back to specific prompts, edits, and test runs. No black-box "8.4/10" — defensible signal you can take to a hiring committee.
The honest answers — including what we don't claim and what we're still building.
Set up in under five minutes.