Principal AI Platform Architect
Microsoft
Principal AI Platform Architect
Redmond, Washington, United States
Save
Overview
The Azure Platform Architecture team is at the forefront of technology and system design, leading the way for the next generation of systems and AI super computers. Our mission is to architect the most performant, secure, reliable, and cost and power optimized solutions that are deployed and managed at hyperscale and power Azure. Leading the AI platform architecture for these systems that power one of the largest hardware deployments on earth requires deep technical knowledge and partnership across many teams. This individual will act as the subject matter expert and platform architect for Microsoft internal Artificial intelligence (AI) Accelerator family products, helping articulate and define our next generation platforms. This requires working across multiple domains including product, software, electrical, mechanical, thermal, performance, and deployment to find the right solution trade-offs.
We are looking for a Principal AI Platform Architect to join the team.
Our team is part of a broader hardware and infrastructure organization known as the Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE). SCHIE is the team behind Microsoft’s expanding Cloud Infrastructure and is responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's 200+ online businesses including Teams, OneDrive, Office 365, Xbox Live, Skype, Bing, MSN, and the Microsoft Azure platform globally.
We architect and design the server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions to support these businesses. Our focus is on smart growth, high efficiency, and deliver trusted experience to customers and partners worldwide. As Microsoft's cloud business continues to grow the ability to deploy new offerings and HW infrastructure on time, at hyperscale, with high reliability and the best performance/price level is paramount.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day
Qualifications
Required Qualifications:
- 10+ years of technical engineering experience
o OR Bachelor's degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years of technical engineering experience
o OR Master's degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years of technical engineering experience
o OR Doctorate degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 4+ years of technical engineering experience.
- 10+ years demonstrated expertise in AI platform and/or rack-scale server architecture and design.
- 10+ years demonstrated expertise in co-designing with datacenter, server, silicon, firmware/software orchestration, and manufacturing engineering organizations.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Experience deploying AI or GPU systems at scale within a cloud service provider or hyper-scale company.
- Knowledge of AI training and inference workloads, and understanding of how hardware impacts AI performance, operations, and efficiencies.
- Ability to analyze AI system concepts from a total cost of ownership (TCO), performance per TCO, and performance per watt perspective, including understanding system constraints that drive design tradeoffs.
- Expertise in conducting tradeoff studies for electrical, mechanical, thermal, and hardware systems.
- Experience in PCBA (Printed Circuit Board Assembly) design, including schematic creation, layout, routing, power, and signal integrity.
Hardware Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications for the role until December 23, 2024
#AfroTech2024
Responsibilities
- Drive platform, rack, and datacenter-level architectural concepts and definition for Microsoft AI system products.
- Build relationships with our internal silicon development organizations, technology, and development partners to drive leading edge innovation into our next generation products.
- Partner across Microsoft teams and collaborate to deliver industry leading products.
- Distill and articulate architectural tradeoffs encompassing electrical, signal integrity, mechanical, power, and thermal inputs in terms of key metrics such as Total Cost of Ownership TCO, performance, power efficiency, schedule, and risk.
- Drive and influence technology providers and design partners towards optimal components and solutions to meet the future requirements for Azure’s infrastructure.