Connecting people I'd hire with companies I'd work at

Matt Wallaert
companies
Jobs

TPM GPU Operations Standards and Quality

Microsoft

Microsoft

Operations, Quality Assurance
USD 100,600-199k / year + Equity
Posted on Oct 29, 2025

TPM GPU Operations Standards and Quality

Redmond, Washington, United States

Save

Share job

Date posted
Oct 28, 2025
Job number
1900061
Work site
3 days / week in-office
Travel
0-25 %
Role type
Individual Contributor
Profession
Program Management
Discipline
Technical Program Management
Employment type
Full-Time

Overview

Microsoft’s Cloud Operations & Innovation (CO+I) powers cloud services by ensuring datacenter availability and operational continuity. The Global IT Service Transition team standardizes processes so new sites and IT support teams can achieve Day 1 Operational Readiness efficiently. The Technical Program Manager (TPM) for GPU Operations Standards and Quality leads the development and enforcement of operational standards, quality assurance, and readiness for GPU deployments. This role partners across engineering, supply chain, and operations to ensure GPU deployments meet regulatory, security, and performance standards, enabling scalable and reliable operations.

Qualifications

Required/minimum qualifications
  • Bachelor's Degree AND 2+ years experience in engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 1+ year(s) of experience managing cross-functional and/or cross-team projects.
Additional or preferred qualifications
  • Bachelor's Degree AND 5+ years experience engineering, product/technical program management, data analysis, or product development OR equivalent experience.
  • 4+ years of experience managing cross-functional and/or cross-team projects.
  • 1+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos).

Background Check Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Technical Program Management IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until November 4th 2025.

#COICareers | #EPCCareers | #DCDCareers

Responsibilities

Responsibilities:

  • Align with Microsoft’s culture, objectives and Datacenter Operational policies and standards.
  • Deliver a best-in-class, new service transition and onboarding program to achieve site & operational readiness.
  • Define and implement GPU operational standards across deployment, servicing, and lifecycle management.
  • Drive cross-functional programs to define, implement, and validate GPU compliance standards across global datacenter environments.
  • Partner with engineering, supply chain, and operations teams to ensure GPU hardware and software configurations meet internal and external compliance requirements.
  • Lead risk assessments and mitigation strategies related to GPU deployments, including site operational readiness and scaled growth.
  • Develop and maintain documentation for GPU standards, audit procedures, and compliance tracking.
  • Represent CO+I in industry forums and regulatory engagements related to GPU infrastructure.
  • Establish KPIs and reporting mechanisms to monitor compliance health and drive continuous improvement.
  • Lead quality assurance initiatives to ensure compliance with performance, reliability, and safety benchmarks.
  • Develop and maintain readiness scorecards and validation frameworks for GPU infrastructure.
  • Coordinate cross-functional efforts across hardware, serviceability, and tooling teams.
  • Manage escalations, fault code governance, and exception handling for GPU-related incidents.
  • Drive continuous improvement through data-driven insights and stakeholder feedback.
  • Evolve operational excellence with key focus areas of risk management, uptime availability and safety.
  • Build strong working relationships and engagement with our Engineering, Procurement & Construction (EPC) teams, support and tooling partners.
  • Establish operational representation through design, build, commissioning and turnover project phases, as required.
  • Create an environment to promote learning and innovation opportunities.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.