I build AI systems that survive real traffic — then I harden them for scale.

CTO @ Techjays. Previously global-award-winning engineering manager at Akamai. Design Thinking @ MIT Sloan.

Selected work

Reduced human screening by ~96%

Role: CTO & Lead Architect · HireFinch — AI voice interviewer

The problem

Recruiters were drowning in early screens. Hiring managers lacked structured signal. Ad-hoc interviews wasted everyone's time and produced inconsistent data that made panel prep a guessing game.

What we built

An agentic voice interviewer that maps to job-specific rubrics. Every score cites the exact transcript span and rubric level — managers get an executive summary in minutes, not hours. We chose multi-model failover (OpenAI ⇄ Gemini) because no single provider meets our uptime bar. A provider-agnostic inference shim with circuit breakers handles routing; the candidate never notices a failover.

The evidence

Realtime reply latency at p95 ≤ 1.2s with failover. Availability SLO of 99.95% with error-budget guardrails. Rubric-adherence F1 ≥ 0.85. PII leakage = 0 on the policy suite. Proctoring blends stylometry, latency profiles, and webcam snapshots with ≤7-day retention.

Read the full build notes
$5M saved, 91% live task success

Role: Product Engineering Lead · Enterprise LLM Copilot for Ops, Freight & Finance

The problem

Ops, freight, and finance teams burned time chasing scale tickets and reconciling buy/sell lines. Month-end spikes delayed invoicing. Chase emails were the default workflow — and they didn't scale.

What we built

An agentic workflow with RAG over OMS/TMS, email, and knowledge bases. The copilot handles tool calls for TMS status, proof-of-delivery retrieval, and portal/EDI fetches — all read-only. We enforced PII scrubbing on ingest, human-in-the-loop gates for sensitive actions, and audit logs on every tool call.

The evidence

Scale-ticket chase emails dropped 45%. Invoice cycle time fell 1.3 days (~28%). Exception rate cut from 12% to 6%. Interactive latency at p95 1.2–2.0s, availability 99.95%, zero critical incidents post-GA. Offline eval accuracy improved from 92% to 97%; hallucination rate under 1.5% after red-teaming.

Read the full build notes

More case studies on the Techjays site →

How I build dependable AI

  1. Eval gates before GA.

    No model ships without offline eval sets, live disagreement tracking, CI gates, and canary releases. HireFinch and the Ops Copilot ship with automated eval gates on every PR.

  2. Humans own the hard calls.

    PII decisions, rubric overrides, and release approvals stay with people. Read-only posture for copilots. Audit trails on every tool call and prompt.

  3. SLOs or it didn't ship.

    99.95% availability, p95 latency budgets, and cost-per-action targets from day one. Multi-model routing with circuit breakers and schema-normalized failover.

  4. Discovery → Pilot → GA (with gates).

    Start in the user's workflow. Ship a HITL pilot. Add release gates for accuracy, latency, privacy, and fairness before GA. Prefer organic pull over mandates.

  5. Red-team everything.

    Periodic red-teaming, bias checks, parity audits across cohorts, and clear rollback playbooks. If you can't roll it back in minutes, it's not ready to ship.

Background

Akamai — infrastructure at scale

Worked on telemetry and alerting flows for massive infrastructure: high-cardinality data, noisy signals, near-real-time extraction. One of five global managers recognized with the GSS Pinnacle Award for leadership in Global Services & Support.

Assemble Effective Teams Bottom-Line Results Customer First

Google — developer experience

Contributed to developers.google.com, sharpening a bias for clear information architecture and accessible documentation.

MIT Sloan — Mastering Design Thinking

Executive program focused on framing ambiguous problems, prototyping rapidly, and scaling solutions with measurable outcomes.

Connect

Reach out on LinkedIn — I respond within 24 hours.