Principal Data Engineer
Capgemini
Get The Future You Want!
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.
Your Role:
We are seeking a highly skilled and motivated Data Engineer with hands-on experience in the Azure Modern Data Platform. The ideal candidate will have a strong foundation in Azure Data Factory, Azure Databricks, Synapse Analytics (Azure SQL DW), and Azure Data Lake, along with proficiency in Python, R, or Scala. This role requires a deep understanding of both traditional and NoSQL databases, distributed data processing, and data transformation techniques.
- Design, develop, and maintain scalable data pipelines using Azure Data Factory, Databricks, and Synapse Analytics.
- Perform data transformation and analysis using Python/R/Scala on Azure Databricks or Apache Spark.
- Optimize Spark jobs and debug performance issues using tools like Ganglia UI.
- Work with structured, semi-structured, and unstructured data to extract insights and build data models.
- Implement data storage solutions using Parquet, Delta Lake, and other optimized formats.
- Collaborate with cross-functional teams to understand data requirements and deliver high-quality solutions.
- Ensure data security and compliance with Information Security principles.
- Utilize version control systems like GitHub and follow Gitflow practices.
- Participate in Agile development methodologies including SCRUM, XP, and Kanban.
Job Profile
- 10 years of experience with Azure Data Factory, Azure Databricks, Apache PySpark, and Azure Synapse Analytics
- Strong programming skills in Python, R, or Scala
- Proficient in NoSQL databases such as MongoDB, Cassandra, Neo4J, CosmosDB, and Gremlin
- Skilled in traditional RDBMS like SQL Server and Oracle, and MPP systems such as Teradata and Netezza
- Hands-on experience with ETL tools including Informatica, IBM DataStage, and Microsoft SSIS
- Excellent communication and collaboration abilities
- Proven track record of working with large, complex codebases and Agile development teams
- Demonstrated leadership in guiding technical teams and mentoring junior engineers
- Familiar with data governance and data quality frameworks
- Certified in Azure Data Engineering or related technologies
About Capgemini
Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world while creating tangible impact for enterprises and society. It is a responsible and diverse group of 350,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market-leading capabilities in AI, cloud, and data, combined with its deep industry expertise and partner ecosystem. The Group reported 2023 global revenues of €22.5 billion.
Get The Future You Want | www.capgemini.com