Service Engineer
Microsoft
Redmond, WA, USA
USD 102,100-202,200 / year
Are you a customer-obsessed, AI-curious problem-solver who thrives in an inclusive, collaborative global team? Join Engineering Operations (EngOps) – the organization driving operational excellence across the Microsoft Cloud to strengthen quality, reliability, security, and customer trust. As part of EngOps, you’ll design solutions that prevent issues before they happen, embed AI-powered automation, and turn signals into actions that deliver measurable customer impact. Our culture of empowerment, inclusion, and growth mindset defines how we work.
The Customer Reliability Engineering (CRE) team within Azure EngOps is a top-level pillar of Azure Engineering responsible for world-class live-site management, customer reliability engagements, modern customer-first experiences for scale, and drives deep customer insights and empathy into the broader Azure Engineering organization. Our “no dead-end’s” philosophy ensures that every customer, regardless of size or scale, can realize their full potential through the Microsoft Cloud. Our operations are enabled with AI to drive quality and governance, correlate incidents across services, predict customer impact, and deliver actionable intelligence to CRE engineers and customer-facing teams in real time.
We’re looking for a Service Engineer who blends operational rigor with AI Skills. You will build and manage the end-to-end solutions that power these operations: data pipelines, AI-powered agents, internal dashboards, and automation that enable engineering and customer-facing teams to take decisive action. You will work across the full stack, collaborating with engineers, program managers, and customer-facing teams to turn operational problems into reliable, scalable agent and skills.
Every day, our customers stake their business and reputation on cloud. You can help #EngOps provide our customers with the world-class cloud services they need to succeed.
Responsibilities
Contribute to building intelligent agents, LLM-powered workflows, and AI-assisted coding tools that automate incident triage, customer impact assessment, and operational intelligence. Build proactive systems (automated validators, release gates, monitoring) that eliminate classes of operational failures before they impact customers
Build and maintain data integrations across incident management systems, Azure DevOps, Azure Data Explorer, and other platforms. Identify and automate manual processes, and build monitoring and self-healing capabilities that reduce toil
Work with EngOps operations, program management, customer-facing teams, and partner engineering teams to translate business requirements into technical solutions. Participate in design reviews and code reviews
Cloud operations are unpredictable. You will adapt quickly, reprioritize when incidents demand it, and engage during major cloud incidents when needed
Use metrics to assess operational effectiveness, platform health, and the impact of reliability improvements
Bring an engineering mindset to data operations—balancing agility, scalability, and technical excellence to solve operational challenges
Exhibit strong cross-team collaboration, engineering mindset, and results-oriented execution under pressure
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls
- OR equivalent experience
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Preferred Qualifications:
- 5+ years of experience in cloud operations, incident response, or problem management
- Familiarity with AI/ML technologies such as large language models (LLMs), agentic frameworks (MCP, function calling), AI-assisted coding, or multi-model evaluation and orchestration
- Experience in AI skills and agent development and management
- Experience with operational data and telemetry platforms such as Azure Data Explorer (Kusto), Azure DevOps APIs, or similar monitoring systems
- Proficiency in big data concepts and query writing using Kusto/KQL, data visualization tools (e.g., Power BI), and statistical software (e.g., R, Python)
- Comfort working in ambiguous, high-urgency environments where priorities shift quickly
- Good written and verbal communication skills in English, coupled with sound problem-solving, judgment, and decision-making abilities for high-stakes scenarios
- Relevant certifications in cloud technologies, incident management, or data analytics (preferred)
#azcre
Service Engineering IC3 - The typical base pay range for this role across the U.S. is USD $102,100 - $202,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $133,800 - $219,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.