Senior Products & Systems Technician
Capgemini
Argentina
Posted on Dec 19, 2024
Descripción breve
The Senior Site Reliability Engineer (SRE) at EmployBridge will focus on enhancing the reliability, scalability, and performance of cloud infrastructure and services. This role involves overseeing the AWS platform, ensuring high availability and resilience, and driving the implementation of observability practices. The SRE will also manage the Observability and Platforms practice to optimize operational efficiency.
Key Responsibilities:
- Lead system availability, performance, and efficiency efforts through monitoring, capacity planning, and SLO/SLI management.
- Develop and improve the Observability strategy and automate processes for increased system reliability.
- Collaborate with teams to integrate observability tools and deliver insights for system improvement.
- Implement automation to reduce toil and improve productivity.
- Propose architectural changes based on data-driven analysis for better performance and availability.
- Maintain technical documentation, including design specs and best practices.
- Engage in Agile processes and contribute to team collaboration.
Qualifications:
- 7+ years in SRE and observability, with expertise in AWS and Infrastructure as Code (IaC).
- AWS certification required (e.g., Solutions Architect, Developer, SysOps Administrator).
- Strong knowledge of SRE concepts (e.g., SLOs, error budgets, monitoring, incident management).
- Experience with observability tools like Datadog, New Relic, Prometheus, and CloudWatch.
- Expertise in AWS cloud infrastructure, Kubernetes, and CI/CD practices.
- Proficiency in programming languages like Python, PowerShell, and Bash.
- Excellent communication and mentoring skills, with the ability to influence teams across the organization.
#LI-AU1