Site Reliability Manager

IT
November 4, 2024
800,000 / month
Apply Now

Job Description

Job Opening: Site Reliability Manager

We are seeking an experienced Site Reliability Manager with 7+ years of hands-on experience in maintaining, optimizing, and scaling infrastructure, ensuring reliability and performance for mission-critical systems. You will collaborate closely with development, operations, and security teams to automate processes, monitor systems, and resolve operational issues while improving the overall user experience.

Work Mode: Hybrid
Employment Type: Full-time
Salary: ₦800,000
Location: Lagos

Responsibilities:

  • Maintain the availability and performance of critical infrastructure.
  • Implement monitoring, logging, and alerting systems.
  • Ensure high uptime and fast response times across services.
  • Develop and maintain infrastructure automation tools.
  • Drive Infrastructure as Code (IaC) practices (Terraform, Ansible, etc.).
  • Implement CI/CD pipelines and automated testing.
  • Act as the escalation point for critical system incidents.
  • Lead incident response, recovery, and root cause analysis.
  • Conduct post-mortems and implement long-term solutions.
  • Design scalable systems to grow with demand.
  • Optimize systems for cost efficiency and performance improvements.
  • Ensure security best practices and compliance with industry standards.
  • Manage vulnerability assessments and data protection.
  • Work closely with development and product teams to align goals.
  • Provide mentorship to junior engineers and promote knowledge sharing.

Requirements:

  •  7+ years experience in site reliability, DevOps, or related engineering roles.
  • Experience with security practices and compliance (SOC2, GDPR).
  • Hands-on experience with CI/CD tools like Jenkins, GitLab CI/CD, or GitHub Actions.
  • Familiarity with containerization and orchestration tools like Docker and Kubernetes.
  • Experience with cloud platforms such as AWS, Azure and serverless computing.
  • Understanding of networking concepts and protocols (TCP/IP, DNS, HTTP, SSL/TLS).
  • Proficient in Linux/Unix system administration and troubleshooting.
  • Familiarity with monitoring and logging tools like Prometheus, ELK stack (Elasticsearch, Logstash,
    Kibana), Grafana or Splunk.
  • Infrastructure as Code (IaC) proficiency in Terraform, CloudFormation, or Ansible.
  • Strong analytical and problem-solving skills with the ability to troubleshoot complex issues in distributed systems.
  • Excellent communication and collaboration skills to work effectively in a cross-functional team environment.
  • Database Management experience with MySQL, PostgreSQL, MongoDB, etc.
  • Ability to work in fast-paced environments and handle multiple projects.
  • Leadership and mentorship capabilities.

Qualified candidates should send their CVs to recruitmenthr422@gmail.com