For engineering hiring teams

Technical interviews built for the AI era.

Stop pretending candidates aren't using AI. Probe runs realistic engineering tasks where AI is allowed — and a second AI silently watches, scores against your rubric, and tells you who can actually think.

Get started

Live, not pre-recorded

No installs for candidates

interview · 32:14 · live

watcher observing

You → Assistant12:04

The retry decorator is timing out under load. Before I touch it, what's the safest way to add a per-call timeout without breaking the existing tests?

Assistant12:05

Wrap the inner call in asyncio.wait_for with a bounded budget, then surface the timeout as a typed exception so the retry loop can decide whether to back off.

You → Assistant12:31

Sketch the smallest test that would catch unbounded concurrency. I want to see it fail before I change anything.

The problem

AI broke the technical interview. Banning it doesn't fix it.

Every candidate has Claude open in another tab. Whiteboarding tests recall. Take-homes test patience. Neither tells you whether someone can decompose a real problem, prompt well, or notice when their assistant is wrong.

Probe assumes the AI. The interview becomes a test of judgment — exactly what you were trying to measure all along.

82%

Use AI anyway

of candidates report using AI in "AI-prohibited" interviews.

3.4×

Faster signal

vs. take-home assessments. No grading queue.

Whiteboards

No leetcode, no tricks, no pre-recorded video reviews.

45m

Per session

One sitting, one scorecard. No grading queue, no take-home limbo.

How it works

A 45-minute session. Then a scorecard you can defend.

You write the rubric. We run the interview. Your candidates work in a real editor with a real AI assistant they can ask anything — and a second AI silently watches and grades against the dimensions you care about.

Pick a task or write your own

Start from our library of production-realistic tasks (Python, Java, C++) or upload a small repo from your codebase. Configure the rubric — speed, prompt quality, decomposition, verification, communication.

configure · 5 min

Candidate runs the session

They get a link. No install. They can ask the AI assistant anything. A second AI silently watches every edit, prompt, and test run — building the evidence the scorecard will cite.

live · 30–60 min

You read the scorecard

Recommendation, dimension scores, and a forensic timeline of every prompt, edit, and test run. Every score is backed by a transcript citation. Hire, pass, or escalate to a human round — with evidence.

delivered · < 60 sec after submit

The novel part

A silent watcher that grades like a senior engineer.

The watcher agent has full context — the diff, the prompt history, the test runs — and never speaks to the candidate. It builds an evidence-cited score against your rubric and flags the moments that actually matter: shortcuts that hide tradeoffs, tests that don't really exercise the thing they claim to.

Calibrated to your bar. Train it on a handful of past interviews you'd label "hire" / "no hire" — it adopts your team's standard, not ours.
Doesn't interrupt thinking. No probes, no pop-ups, no chat. The candidate works the way they actually work — with their assistant, uninterrupted.
Every score is auditable. You see the verbatim transcript moment behind each dimension score, and can override the recommendation with a single click.

Watcher observing

4 signals

Verification08:42

Accepted the assistant's regex and shipped without an edge case test — flagged for the scorecard.

Decomposition17:15

test_concurrent passes but the mock returns instantly — not real concurrency signal. Citation captured.

Prompt quality31:02

Specific, contextual prompt asking the assistant to sketch the failing test first. Strong signal.

What you actually see

A scorecard built like an audit, not a vibe.

Every score cites a moment from the transcript. Every recommendation can be traced back to specific prompts, edits, and test runs. No black-box "8.4/10" — defensible signal you can take to a hiring committee.

Jordan Li

productionize-this · 38m 11s · 2026-05-04

Hire

Composite

4.0/5

weighted across 4 dimensions

Tests passing

4 / 5

at submission time

AI prompts

avg. 18 words · 2 follow-ups

Edits tracked

full forensic timeline

Prompt Quality

"Just sketch the retry decorator first. I'll wire concurrency myself."

5/5

Decomposition

"Three things I'd surface, in priority order: timeouts, bounded concurrency, typed errors."

4/5

Verification

"test_fetch_typed_return failed (0.04s) — submitted anyway."

3/5

Common questions

What hiring teams ask before getting started.

The honest answers — including what we don't claim and what we're still building.

01Validity & trustThe biggest reason teams hesitate. Worth answering first.

How do you prevent candidates from cheating?

There's almost nothing to cheat on — the AI is already built into the session. We watch how candidates use it, not whether. Outside-tool usage is detectable via clipboard and focus events, but the real defense is the transcript itself: candidates who outsource thinking leave a trail of vague prompts, unverified pastes, and code they can't justify.

How accurate is the AI's scoring?

Every score is backed by a transcript citation — you can audit the exact moment that produced the rating. We don't claim to replace your senior engineers' judgment. We claim to surface the right ten minutes for them to review, rather than asking them to sit through the whole thing.

Will engineers on my team trust the recommendation?

They will if they can see the work. Every scorecard links to the full session: every prompt, every edit, every test run, every flagged moment. We've found engineers go from skeptical to convinced after reviewing two or three real transcripts. We recommend running Probe in shadow mode alongside your existing loop for the first month.

Won't candidates just have the AI do everything for them?

That's the point — and the candidates who try it fail. The watcher tracks vague prompts, unverified pastes, and decisions the candidate can't justify. "Use the AI" and "have the AI do it" look very different in the transcript.

02Candidate experienceWhat it feels like on the other side of the link.

How do candidates feel about being interviewed by an AI?

Better than they expect. The session is realistic work, the timer is fair, and — unlike take-homes — there's no "we'll get back to you in three weeks." Most candidates finish the interview and immediately see their own transcript. We're still pre-launch, so we'll have real survey data later this year.

What if a candidate has never used AI tools before?

The welcome page walks them through the tools — inline complete, chat, terminal agent, web search — before the timer starts. AI usage isn't required, just permitted. Strong candidates who prefer not to use AI still pass; the rubric rewards good judgment, and that includes knowing when to skip the assistant.

Is it accessible for candidates with disabilities?

The editor is screen-reader compatible (we use the same accessibility tree as VS Code). Candidates can request extended time at the link — you're notified and approve in one click. We don't require video or audio.

03Integration & controlHow Probe fits into how you already hire.

Can we use our own coding tasks?

Not yet. The task catalog is currently authored and calibrated entirely by Probe — each task is validated across hundreds of senior engineers before it ships. Custom tasks are on the roadmap, but they'll launch with the same calibration discipline. In the meantime, you can configure which preset task type runs in each round.

Does it integrate with Greenhouse, Ashby, or Lever?

Not at launch. We're starting with the simplest possible workflow — send a link, get a scorecard — and adding ATS connectors based on what design partners actually use most. If you want a specific integration, tell us and we'll prioritize.

Can we override the AI's recommendation?

Always. The scorecard is a recommendation, not a decision. Every dimension is editable; every override is logged with your name and reasoning, so future hiring debriefs have the full context.

04Legal & dataCompliance and what happens to candidate information.

Is this EEOC-compliant? Can the AI introduce bias?

We don't claim to be unbiased — no AI system is. We do claim to be auditable: every score has a transcript citation, every rubric weight is visible, and you can mask candidate names from your reviewers. We publish our bias methodology and update it as we gather more data. If your jurisdiction requires NYC Local Law 144-style audits, talk to us before signing.

What happens to candidate data?

Stored encrypted on US infrastructure; never used to train models; deletable on candidate request within 30 days. GDPR and CCPA covered. Transcripts auto-expire 180 days after a role closes unless you pin them. Detailed DPA available before contract signing.

Run your first AI-native interview this week.

Set up in under five minutes.

Get started