Connecting people I'd hire with companies I'd work at

Matt Wallaert
companies
Jobs

Principal Software Engineering Manager- GPU Inference Optimization

Microsoft

Microsoft

Software Engineering, Other Engineering
Posted on Sep 23, 2025

Principal Software Engineering Manager- GPU Inference Optimization

Beijing, China

Save

Share job

Date posted
Sep 23, 2025
Job number
1881372
Work site
4 days / week in-office
Travel
0-25 %
Role type
People Manager
Profession
Software Engineering
Discipline
Software Engineering
Employment type
Full-Time

Overview

The R&D of Search Ads aims to build an online advertising ecosystem of users, advertisers, and the search engine.
Bing Search Ads Understanding team is chartered to deliver world class algorithm using web scale data. Our mission is to drive user satisfaction, advertiser ROI and Bing revenue. A core challenge is to match advertisers' "Ad display" and users' "query" by build an intelligent system to really understand the users need. This is a very hard problem that demands the most advanced AI models and sophisticated engineering systems. Join us to work on projects highly strategic to Bing search in a fun and fast-paced environment!
We are hiring a Principal Software Engineering Manager (GPU Inference Optimization) to lead the team effort on GPU inference optimization of large and small language models to support the GPU serving of the models for Ads tasks including retrieval, relevance and creative generation, etc. As a manager of this team, you will have the opportunity to lead the innovations of the fundamental abstractions, programming models, runtimes, libraries and APIs to enable large scale inferencing and online serving of models on novel AI hardware.
This is a lead role focused on GPU inference optimization of large and small language models: it requires hands-on software development skills and expiernce to lead the team efforts by applying the model-coach-care practices. We’re looking for someone who has a demonstrated history of solving hard technical problems and is motivated to tackle the hardest problems in building a full end-to-end AI stack. An entrepreneurial approach and ability to take initiative and move fast are essential.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate

Qualifications

• Bachelor's degree in computer science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, ROCm or equivalent experience

• Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels

• Quick learning, good communication (fluent in English) and solid problem-solving skills

• Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers

• Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute is a plus

• Familiar with LLM inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM is a plus

Responsibilities

• Lead the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton
• Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions.
• Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.
• Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.