Senior and Principal Applied Scientists - CoreAI
Microsoft
Mountain View, CA, USA · Redmond, WA, USA
USD 119,800-234,700 / year
Responsibilities
- Develop evaluation and measurement frameworks for single-agent and multi-agent systems, spanning quality, safety, reliability, cost, and behavioral consistency.
- Design methodologies that connect offline evals, online signals, and production telemetry to explain how prompt, tool, model, or orchestration changes affect real-world agent performance.
- Define scientifically grounded quality signals and benchmarks for agent systems, including task success, tool-use effectiveness, plan quality, failure modes, coordination quality, and user-perceived outcomes.
- Build models and analysis techniques that help detect regressions, identify root causes, and characterize agent behavior across diverse workflows and environments.
- Advance observability for AI systems through new approaches to trace analysis, agent health modeling, behavioral clustering, anomaly detection, and multi-agent coordination analysis.
- Partner with engineering teams to operationalize evaluation and observability methods in production systems, enabling safe iteration through staged rollouts, experimentation, A/B testing, and automated regression detection.
- Contribute to instrumentation and semantic standards for agent observability, helping make agent execution more explainable, diagnosable, and comparable across systems.
- Collaborate deeply with product and platform teams across Foundry, Azure Monitor, and agent runtimes to shape end-to-end experiences for evaluation, benchmarking, monitoring, and investigation.
- Act as a technical leader by setting scientific direction, driving research-informed product decisions, mentoring others, and raising the technical bar across the organization.
- Evaluation science for agent and multi-agent systems: offline, online, and continuous evals; benchmark design; synthetic data; task success measurement
- Agent and multi-agent architectures: planners, tool use, memory, orchestration, and coordination patterns
- Applied machine learning and statistical methods for behavioral analysis, anomaly detection, experimentation, and regression detection
- Observability data for AI systems: traces, logs, metrics, evaluations, and cost/performance signals
- Safety and responsible AI signals: policy compliance, risk detection, auditability, and safe logging
- Benchmarking and experimentation for agent systems, including A/B tests, canaries, and staged rollouts
- Explainability and diagnosis for complex agent workflows and model-driven decision paths
Qualifications
- Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
- OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 9+ years related experience (e.g., statistics, predictive analytics, research)
- OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience.
- Experience designing evaluation methodologies, experiments, or measurement systems for complex intelligent or distributed systems
- Experience analyzing large-scale production or experimental data to derive actionable insights and drive product or system improvements
- Strong coding and prototyping skills in Python or similar languages, with the ability to work closely with engineering teams on production-facing systems
- Demonstrated ability to lead cross-team technical direction through scientific depth, influence, and strong problem framing
- Advanced degree in Computer Science, Machine Learning, Statistics, Applied Mathematics, or related field
- Experience building or evaluating LLM- or agent-based systems in production
- Familiarity with agent frameworks such as LangChain, LangGraph, OpenAI SDK, or equivalent orchestration frameworks
- Experience with evaluation frameworks for AI systems, including benchmarking, regression analysis, and human-in-the-loop assessment
- Experience with observability systems, telemetry analysis, or distributed tracing data in large-scale environments
- Background in AI safety, guardrails, and responsible AI measurement
- Experience with experimentation platforms, causal inference, or statistical methods for product and model evaluation
- Experience working with cloud-scale monitoring platforms such as Azure Monitor / Application Insights or equivalent
Applied Sciences IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
Applied Sciences IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.