Lead Specialist Solutions Architect (Applied AI and ML) @ Databricks. I work on the layer between LLM evaluation infrastructure and the agent frameworks built on top: tracing, scorers, judges, retry semantics, safety hardening.
Reviewer for NeurIPS 2026 (main track, Evaluations & Datasets, Position Papers). IEEE Senior Member.
SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants - ACL 2026 Findings. Introduces correction selectivity as a separate evaluation axis from sycophancy. Code, dataset, and per-model results at sycobench-600.
OTel observability finish for omnigent (3 merged, 5 open): GenAI semconv span attributes, W3C TRACEPARENT subprocess propagation, GenAI metric instruments, retry events on the production async path, end-to-end OTLP receiver test, plus the canonical Databricks integration guide.
Practical Machine Learning on Databricks (Packt, 2023)
SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants (ACL 2026 Findings)
Learning to Translate with Products of Novices (TACL, 2013)
- Reproducible Model Dependencies with uv and MLflow
- Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow
- Agent Trace Evaluation with TruLens Scorers in MLflow
- Deterministic Safety and PII Checks with Guardrails AI in MLflow
- Deploy MLflow Models to Serverless GPUs with Modal
- Third-Party Scorers in MLflow





