Senior Applied Scientist
Microsoft
Senior Applied Scientist
Suzhou, Jiangsu, China
Save
Overview
As a Senior Applied Scientist, you will help transform how Copilot features are evaluated and improved. Your team will deliver end-to-end experimentation, evaluation, and insights to Copilot engineers, PMs, and fellow scientists. You’ll work on the data generation platform that creates algorithms and ML pipelines for simulating user actions and dataset that reflect real-world usage and user preferences and develop metrics that go beyond accuracy—capturing nuance, intent, and satisfaction. You’ll work on scalable pipelines that support offline evaluations for Copilot Search, BizChat, Connectors, and Agents (DAs). This opportunity will allow you to dive deep into Copilot technologies, shape how AI quality is measured at scale, build critical skills in the AI-era, and rapidly grow your career.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required Qualifications:
- Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
- OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience.
- Excellent communication and collaboration skills, with the ability to work across engineering and product management.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience.
- 3+ years' experience conducting research as part of a research program (in academic or industry settings).
- 1+ year(s) experience developing and deploying live production systems, as part of a product team.
- Experience with synthetic data generation, data ingestion, and management, especially for evaluation or training purposes.
- Experience designing or implementing evaluation metrics and methodologies for LLMs or generative AI systems.
- Experience developing agentic solutions using LLMs or multi-agent frameworks.
- Familiarity with SEVAL and its application in offline evaluation pipelines.
- Solid analytical mindset with a data-driven approach to problem-solving, consistently upholding high standards of quality and engineering rigor.
- Collaborative and team-oriented, skilled at working across disciplines, levels, and product areas to drive alignment and shared success.
- Proficient in using Azure Machine Learning (AML) for model development, pipeline orchestration, experiment tracking, and compute/resource management.
#M365Core
Responsibilities
- Design and implement offline evaluation strategies that capture real-world usage and reflect end-user preferences.
- Develop scientifically sound metrics that diagnose model regressions, benchmark against baselines (e.g., ChatGPT, Glean), and validate product improvements.
- Manufacture synthetic yet realistic user activity data using LLMs to simulate diverse usage scenarios.
- Collaborate on multi-agent systems or agentic workflows to automate evaluation flows and generate high-signal insights.
- Analyze evaluation outputs to identify gaps in coverage, quality, and usability across Copilot canvases.
- Partner with engineering and PMs to ensure insights are integrated into product workflows and experimentation pipelines.
- Publish learnings in internal forums, external conferences, and contribute to best practices in applied science.