Senior Site Reliability Engineer
Microsoft
Senior Site Reliability Engineer
Hyderabad, Telangana, India
Save
Overview
We’re seeking a skilled engineer to enhance enterprise customer experience in managing a fleet of Surface devices. Our team is responsible for creating and maintaining online portals, backend APIs, Microservices, Function Apps, Web Jobs, and integrations with Supply Chain systems. Our solutions leverage AI and Copilots to enhance productivity, providing the best experience, and streamline operations for enterprise customers.
As a key member of the team, you will be responsible for designing and deploying reliable distributed platforms, empowering commercial customers to self-serve, manage and monitor Surface devices at scale. This is an exciting opportunity to demonstrate broad leadership and impact across Devices.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required Qualifications:
- Bachelors/ Masters degree in Computer Science or other Engineering field
• 8+ years of technical experience in software engineering and DevOps in developing build, deployment pipelines and infrastructure building and running cloud service at large scale
• 4+ years of experience with software development in programming language C#, WebAPIs, Cosmos, SQL Azure, Microsoft fabric
• Excellent technical design, problem solving and debugging skills
• Excellent leadership, communication, teamwork and collaboration skills across organizations
• Passionate, motivated, self-driven and quick learner
• Ability to deal with the ambiguity associated with working in a fast-paced environment
• Systematic problem-solving approach, coupled with effective communication skills and a sense of curiosity
• Expertise in analyzing, troubleshooting, and automating root cause analysis and mitigation of incidents impacting large-scale distributed systems.
Other Requirements:
Candidates must be able to meet Microsoft, customer and/or government security screening requirements that are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Excellent technical design, problem solving and debugging skills.
- Excellent leadership, communication, teamwork and collaboration skills across organizations.
- Passionate, motivated, self-driven and quick learner.
- Ability to deal with the ambiguity associated with working in a fast-paced environment.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of curiosity.
- Excellent written and verbal communication skills.
- Experience in developing Monitoring & Telemetry tools, Containers(Azure Kubernetes Service),CICDs.
- Experiences with building dashboards, code analysis , secure practices.
Responsibilities
- Champion and implement DevOps and Site Reliability Engineering best practices to ensure system reliability, observability, and operational excellence.
- Own the uptime and performance of applications built on Azure Containers, APIs, and modern UI frameworks, ensuring they meet stringent SLAs and customer expectations.
- Drive incident response, root cause analysis, and postmortem processes to continuously improve system resilience.
- Develop and maintain automation for deployment, monitoring, alerting, and self-healing systems to reduce manual toil and improve efficiency.
- Partner closely with software engineering, product owners, design scalable and fault-tolerant systems.
- Monitor system performance and plan for future growth, ensuring infrastructure is right-sized and cost-effective.
- Ensure systems are secure, compliant, and aligned with Microsoft’s security standards and policies.
- Guide and mentor junior engineers, fostering a culture of learning, ownership, and continuous improvement.