Job Description
Job Opening: Site Reliability Manager
We are seeking an experienced Site Reliability Manager with 7+ years of hands-on experience in maintaining, optimizing, and scaling infrastructure, ensuring reliability and performance for mission-critical systems. You will collaborate closely with development, operations, and security teams to automate processes, monitor systems, and resolve operational issues while improving the overall user experience.
Work Mode: Hybrid
Employment Type: Full-time
Salary: ₦800,000
Location: Lagos
Responsibilities:
- Maintain the availability and performance of critical infrastructure.
- Implement monitoring, logging, and alerting systems.
- Ensure high uptime and fast response times across services.
- Develop and maintain infrastructure automation tools.
- Drive Infrastructure as Code (IaC) practices (Terraform, Ansible, etc.).
- Implement CI/CD pipelines and automated testing.
- Act as the escalation point for critical system incidents.
- Lead incident response, recovery, and root cause analysis.
- Conduct post-mortems and implement long-term solutions.
- Design scalable systems to grow with demand.
- Optimize systems for cost efficiency and performance improvements.
- Ensure security best practices and compliance with industry standards.
- Manage vulnerability assessments and data protection.
- Work closely with development and product teams to align goals.
- Provide mentorship to junior engineers and promote knowledge sharing.
Requirements:
- 7+ years experience in site reliability, DevOps, or related engineering roles.
- Experience with security practices and compliance (SOC2, GDPR).
- Hands-on experience with CI/CD tools like Jenkins, GitLab CI/CD, or GitHub Actions.
- Familiarity with containerization and orchestration tools like Docker and Kubernetes.
- Experience with cloud platforms such as AWS, Azure and serverless computing.
- Understanding of networking concepts and protocols (TCP/IP, DNS, HTTP, SSL/TLS).
- Proficient in Linux/Unix system administration and troubleshooting.
- Familiarity with monitoring and logging tools like Prometheus, ELK stack (Elasticsearch, Logstash,
Kibana), Grafana or Splunk. - Infrastructure as Code (IaC) proficiency in Terraform, CloudFormation, or Ansible.
- Strong analytical and problem-solving skills with the ability to troubleshoot complex issues in distributed systems.
- Excellent communication and collaboration skills to work effectively in a cross-functional team environment.
- Database Management experience with MySQL, PostgreSQL, MongoDB, etc.
- Ability to work in fast-paced environments and handle multiple projects.
- Leadership and mentorship capabilities.
Qualified candidates should send their CVs to recruitmenthr422@gmail.com