Principal Forensic Engineer
Microsoft
Principal Forensic Engineer
Multiple Locations, United States
Save
Overview
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day and we need you as a Principal Forensic Engineer. The Forensic Engineering Team plays a critical role in ensuring the uptime, reliability, and availability of Microsoft’s cloud infrastructure. We are seeking a Principal Forensic Engineer with a strong electrical engineering background and deep experience in critical environments to lead complex investigations, shape long-term strategies, and influence global standards across our datacenter portfolio. This high visibility individual will function as a subject matter expert in electrical systems within the team and support the team and organization with expertise in data center electrical systems, design, and troubleshooting as well as presenting to updates leadership and other teams.
Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. As a CO+I Forensic Engineer, you will perform a key role in delivering the core infrastructure and foundational technologies for Microsoft's online services including Bing, Office 365, Xbox, OneDrive, and the Microsoft Azure platform. Our infrastructure is comprised of a large global portfolio of more than 200 datacenters in 32 countries and millions of servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide.
With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required/Minimum Qualifications:
- Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 8+ years technical engineering experience
- OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience
- OR Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience
- 5+ years of experience in critical environments.
Other Requirements
- Ability to meet Microsoft, customer, and/or government security screening requirements, including:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Additional or Preferred Qualifications:
Forensic engineering experience
- Deep expertise in Electrical systems in datacenter environments
- Experience with root cause analysis methodologies
- Experience with IT infrastructure in a data center engineering or operations environment
- Experience leading cross-disciplinary reviews to identify and mitigate risks
- Strong sense of urgency and passion for uncovering root causes and systemic triggers
- Strategic thinker with deep understanding of other datacenter topologies and mission-critical systems
- Advanced analytical skills with ability to synthesize complex data across systems and drive insights
- Collaborative leader with ability to influence across engineering, operations, and leadership teams
- Excellent written and verbal communication skills, including the ability to produce and review complex technical documentation
Reliability Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications for the role until November 10th, 2025.
#COICareers | #EPCCareers | #DCDCareers
Responsibilities
Responsibilities:
- Conduct advanced root cause analysis and guide cross-functional teams in mitigation strategies as well as guiding the implementation of corrective and preventive actions.
- Lead and oversee forensic investigations of datacenter infrastructure events, driving correcting solutions and systemic improvements across global operations.
- Evaluate compliance with maintenance programs, staffing models, and procedural readiness to improve operational resilience.
- Review equipment and system performance data to identify issues through trend and data analysis and develop solutions for identified defects
- Assist in troubleshooting issues in the field.
- Drive the creation of novel, scalable solutions to complex and ambiguous engineering and operations challenges, including technical improvements to our infrastructure, process improvements, and development of data analytics tools.
- Act as a strategic advisor to engineering and operations leadership, influencing decisions on risk mitigation, design resiliency, and long-term infrastructure planning.
- Navigate and influence proactive implementation of lessons learned across business units (design, construction, and support service teams)
- Collaborate across disciplines to establish visual standards, process improvements, and error-proofing systems that elevate global datacenter availability.
- Develop and champion methodologies to validate datacenter performance, system control parameters, and operational efficiency against design intent and determine quantifiable deviations.
- Provide expertise in the development of visualization and reporting of information - dashboards, KPIs, and frameworks to monitor trends and inform strategic decisions.
- Mentor engineers and field operations staff through sharing of expertise, fostering a culture of continuous learning and innovation.
-
Other