Job Description
Canonical is hiring a Senior Site Reliability Engineer to support and scale mission-critical cloud infrastructure used by global customers. This role is fully remote and suited to an experienced engineer with strong Python development skills and deep interest in open-source infrastructure, from bare metal to containers. The position operates in a high-availability, high-pressure environment where reliability, security, and performance are critical.
Key Responsibilities
- Design, operate, and improve large-scale cloud infrastructure using OpenStack, Kubernetes, and software-defined storage.
- Apply DevSecOps best practices across the full infrastructure stack, from bare metal to applications.
- Develop and maintain automation and tooling using Python to improve reliability and operational efficiency.
- Monitor, troubleshoot, and resolve incidents affecting mission-critical production systems.
- Lead and execute infrastructure upgrades to ensure systems remain secure, stable, and up to date.
- Collaborate with distributed engineering teams to enable DevSecOps for applications running on managed infrastructure.
Job Requirements
- Bachelor’s degree in Software Engineering, Computer Science, or a related field.
- Strong experience with Linux systems, including networking and storage concepts.
- Proven Python software development expertise.
- Hands-on operational experience supporting production environments.
- Excellent communication, problem-solving, and collaboration skills.
- Ability to work under pressure while supporting globally distributed services.
- Willingness to travel internationally up to twice a year for company events (up to two weeks per trip).
Nice-to-Have Skills
- Experience deploying or operating OpenStack and/or Kubernetes environments.
- Familiarity with public or private cloud platforms and cloud management tools.
