Senior Software Engineer--M365 IDEAS ML Platform
Microsoft
Senior Software Engineer--M365 IDEAS ML Platform
Suzhou, Jiangsu, China
Save
Overview
Microsoft 365 is at the core of Microsoft mission to enable people and organizations to achieve more. Intelligent Data Engineering and Analytics (IDEAs) services handle millions of users and exabytes of data. This is not just some large-scale web service: the implementation ranks among the world’s largest and state of the art distributed systems, spanning across data centers around the world.
Microsoft 365 IDEAs team’s goal is to help customers improve productivity, champion a data-informed culture, and enable the entire Microsoft 365 organization to make more informed decisions through data. We see this effort as a huge opportunity in providing information to both external and internal users that will improve efficiency, empowerment, and helps Microsoft win in the critical cloud business sector.
We are looking for an experienced Senior Software Engineer who will collaborate with Data Scientists, Program Managers, and Platform Engineers to design and implement high-quality end-to-end ML solutions, covering data ingestion, feature engineering, training, scoring, monitoring, and endpoint integration. The candidate should research innovative optimization methods, manage a high-quality feature store, and develop tools for streamlined model onboarding.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- 2+ years experience implementing and optimizing a variety of ML algorithms in production (Regression, Deep Learning, Gradient Boosting, Dimensionality Reduction)
- Solid Python programming skills and coding practices (CI/CD, package design, unit testing, etc)
- Solid experience with common machine learning frameworks: TensorFlow, PyTorch, Keras, Spark ML
- Proven track record of optimizing models on various types of compute (CPU/GPU, Spark, Kubernetes)
- Experience joining and processing terabyte data sources into curated feature datasets (batch and streaming methods)
- Experience with orchestration frameworks like Azure Data Factory, AirFlow or equivalent
- Experience implementing API interfaces for model serving (i.e. OpenAPI, FastAI)
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Knowledge of statistics and experiment design (i.e. A/B testing, causal inference).
- Experience with containerization: Docker, Kubernetes.
- Experience with Azure Machine Learning (or equivalent).
- Experience with ML Ops frameworks like MLFlow, KubeFlow or equivalent.
- Experience with compiled languages: C++, C#, or Java.
- Experience with Microsoft analytic systems: Cosmos, Kusto, Substrate AI, Synapse.
Responsibilities
- Work closely with Data Scientists, Program Managers, and Platform Engineers to design and implement high-quality end-to-end ML solutions in production (including data ingestion, feature engineering, training, scoring, monitoring, telemetry, and endpoint integration).
- Research innovative ways to optimize all aspects of managing models in production to reduce implementation times and ensure best possible model quality and performance (including algorithms, parameter tuning, compute environments, model management tools, feature selection).
- Onboard feature data from a wide variety of sources and manage a high-quality feature store to support production models and data scientist productivity.
- Build a deep understanding of Microsoft ML platforms and open-source frameworks to guide what capabilities we can adopt in our environment.
- Develop packages and tools to streamline the model onboarding and management.
- Monitor production model performance and health, identify where improvements need to be made, and handle production incidents when they occur.