Services

Hands-on agent engineering past the demo stage

Multi-agent design, fine-tuning, self-improving agents, retrieval, and eval gates, shipped as PRs in your repo. Scope and pricing are agreed on LinkedIn; nothing is listed here.

How to start

Two common ways to start. Deliverables are fixed; we align terms on a short LinkedIn intro.

Agent system diagnostic

5 business days

We map how your agents behave in production: tools, handoffs, retrieval, and eval coverage. You get a clear picture before committing to a build phase.

Eval scorecard on agent workflows (tools, retrieval, generation, multi-step paths)
Failure taxonomy: top regressions ranked by user impact
Fix recommendations with effort estimates
30-min readout with your eng lead

Scope and commercial terms agreed on LinkedIn.

2-week productionization pilot

2 weeks

Hands-on work in your repo, not a slide audit. Week 1 establishes a baseline; week 2 lands fixes and eval integration.

Week 1: failure taxonomy + eval baseline merged in your repo
Week 2: quality gate + 2 critical-path fixes on prod agent or retrieval flows
Handoff runbook your team extends

Often follows the diagnostic when there is a fit. Terms scoped on LinkedIn.

For AI consultancies & dev shops

White-label delivery on fixed SOWs. Your brand with clients; my hands in the codebase.

Multi-agent systems, eval harnesses, fine-tuning, and knowledge layers under your client SOW
ADRs, runbooks, and client-ready documentation included
Commercial terms and SOW shape agreed on LinkedIn
4h ET overlap as a committed daily window

How I work

Typical engagement: 3–6 months
4h US Eastern overlap daily, held as a fixed calendar block
Week 1 includes PRs in your repo (with prod access you scope). First milestone is shipped work, not decks.
Availability and scheduling constraints confirmed in writing before kickoff.

Full service menu

Multi-Agent System Design

For: Teams shipping collaborative agents, workflows, or voice/chat automation past prompt chains.

Agent roles, handoffs, and tool policies for your product
Orchestration patterns (single vs. multi-agent, when to specialize)
Failure recovery and human-in-the-loop checkpoints
ADRs your team can extend (not shelfware)

Typical duration: 1–2 week design sprint or 4–12 week build advisory

LLM Fine-Tuning & Domain Adaptation

For: Products that need models to speak your domain, not generic base-model behavior.

Fine-tuning strategy (QLoRA, full fine-tune, or hybrid with retrieval)
Dataset curation, eval sets, and regression gates for model updates
Integration with your agent stack and release process
Guidance on when fine-tuning beats prompting alone

Typical duration: 3–8 week sprint

Self-Improving & Tool-Using Agents

For: Teams building agents that iterate, call tools, and improve from feedback.

Tool schemas, routing, and guardrails for safe action in prod
Self-improvement loops: critique → revise → verify (with eval boundaries)
Trace-backed logging so you can debug multi-step runs
Patterns for long-horizon tasks without runaway cost

Typical duration: 4–8 week sprint

Knowledge & Retrieval for Agents

For: Agents that need governed context from docs, APIs, or databases.

Hybrid retrieval (structured + unstructured) aligned to agent decisions
Tenant-scoped knowledge boundaries
Ingestion and refresh patterns your team operates
When retrieval belongs in the agent loop vs. offline indexing

Typical duration: 4–8 week sprint

Evaluation & Production Readiness

For: Teams shipping agent changes without knowing what regressed.

Trace-backed evaluation harness for workflows and tool paths
CI quality gates aligned to your release process
Production-readiness checklist (evals, rollback, tenant safety)

Typical duration: 2–4 week bootstrap + optional retainer

Incident Recovery (Agent Programs)

For: Live agent products where behavior, tools, or workflows broke in production.

Hands-on recovery on agent flows, retrieval, and eval gaps
Root-cause write-up and prioritized fix list
Runbooks so the team can detect similar failures earlier

Typical duration: Time-boxed recovery (days–weeks) + optional retainer

Fractional Agent AI Lead

For: Startups and scale-ups needing Staff-level agent ownership without a full-time hire.

Roadmap across agents, fine-tuning, retrieval, and evals
Architecture ownership and review gates on agent PRs
Mentoring on multi-agent design and production iteration
Bridge between research ideas and shippable agent products

Typical duration: 3–6 month engagements, scope on LinkedIn

FAQ

Do you write prompts?: Only when it serves the agent system. Prompting sits inside orchestration, tool design, and fine-tuning, not as a standalone package.
Audit vs. hands-on?: Default is hands-on: PRs in your repo from week one. Read-only reviews are available, but they are not the default entry point.
How do engagements start?: Message on LinkedIn to align on problem, scope, and timeline. Diagnostic or pilot shapes above are common starting points; terms are agreed privately after we connect.
Fine-tuning vs. RAG vs. agents?: I help you choose the layer that fits: orchestration, fine-tunes, retrieval, and evals. This is not a DevOps or platform-infra engagement. The focus is agent behavior in your codebase.
Research vs. consulting?: PhD (MARL) is part-time and informs multi-agent coordination design; consulting engagements are delivery-focused with clear artifacts.

Tell me what is breaking in production, or message on LinkedIn to scope a diagnostic or pilot.

Message on LinkedIn