About
I've spent about 8 years in AI/ML, and the last 3 as an AI Engineer at Prediction Guard, where I focus on AI safety in production for regulated industries. My work spans cost-efficient safeguards for AI systems, model evaluation for production deployments, and privacy-enhancing techniques for sensitive data pipelines.
I co-authored BenchRisk (NeurIPS 2025), where we identified 57 failure modes across 26 popular LLM benchmarks that can lead organizations to unsafe deployment decisions. I'm also an active contributor to open-source AI safety projects.