Site Reliability Engineering Manager
Posted on 2025-02-01
Job Summary
As the Site Reliability Engineering Manager, you will oversee the SRE team and work closely with engineering, product, and infrastructure teams to ensure the continuous operation of our platform. You will be responsible for defining and driving the SRE strategy, implementing best practices, and ensuring that the systems we deliver are highly reliable, scalable, and resilient. Your leadership will be essential in creating a culture of operational excellence across the organization.
Key
Responsibilities:
- Lead, mentor, and develop a high-performing SRE team, fostering a culture of collaboration, accountability, and innovation. Shape the overall strategy for site reliability engineering across the company.
- Own the availability, performance, and scalability of our production systems. Work proactively to identify potential issues and eliminate risks to ensure seamless user experience.
- Lead post-incident reviews and root cause analyses to drive continuous improvement. Implement and refine incident management processes and workflows.
- Champion the development and implementation of automation tools to improve system reliability, reduce manual intervention, and enable faster recovery. Oversee the implementation of monitoring and alerting systems to ensure proactive issue detection.
- Partner with software engineers, infrastructure teams, and product teams to design, build, and maintain systems that align with our high standards for availability, scalability, and performance.
- Establish and drive best practices for system reliability, testing, and incident response. Regularly evaluate and enhance existing processes and tools.
- Drive capacity planning and scaling strategies to meet the demands of our growing user base and business needs. Ensure that the system architecture is built to support future growth.
- Ensure systems are secure and compliant with industry standards, safeguarding user data and privacy.
Required Qualifications:
- BTech/ Degree/ Masters/ PHD in Computer Science, Information Technology, Information Systems, Computer Engineering or related fields.
Experience:
- BTech in Computer Science, Information Technology, Information Systems, Computer Engineering or related fields coupled with 13 years relevant working experience; or Degree in Computer Science, Information Technology, Information Systems, Computer.
- Engineering or related fields coupled with 9 years relevant working experience; or Masters Degree in Computer Science, Information.
- Technology, Information Systems, Computer Engineering or related fields coupled with 7 years relevant working experience; or PHD in Computer Science, Information Technology, Information Systems, Computer Engineering or related fields coupled with 5 years relevant working experience.
- Computer and network infrastructure implementation
- IT service, operations and management, including significant responsibility over Service Level Agreements
- IT Infrastructure or software Team leadership
- IT Architecture and Governance
- Project management
- IT systems engineering, application support, and user management
- IT governance and security
- Data governance and security
- IT availability, resilience and redundancy
- Systems analysis, design and engineering
- Experience in supporting distributed software systems in a production environment such as Cloud and/or Data Centres
- Procurement and IT asset management
Skills:
Essential:
- Experience working with Linux and within the Open Source Software Ecosystem
- Experience with DevOps tools, processes and culture.
- Experience and/or certification and knowledge in SRE, ITIL or related IT Management processes.
- Experience supporting and maintaining large-scale High-Performance Computing (HPC) and storage systems.
- Advanced experience with programming and/or scripting languages such as Python
If you’re ready to make a lasting impact in the human capital development space and have the experience and passion to drive our site reliability initiatives,Apply Now!
Site Reliability Engineering Manager position available in Western Cape, Western Cape. This job position was posted by One Connect Solutions. The job has been posted as a char8000 ad on 2025-02-01 at 01:21:34 in the It Computer category
Click Go Apply to apply online!
You might also like these jobs in the same area.
Apply directly for this position. Please read all instructions carefully.
We do not process job applications; we simply aggregate and display job listings.
More related positions
Cape Town: Site Reliability Engineer posted by Tasiso Consulting
What Youll Do:? Lead the Site Reliability Engineering (SRE) and IT Telescope Operations Team.? Collaborate globally with stakeholders.? Manage operations, service delivery, and infrastructure for telescope construction and deployment.? Support advanced IT
View Job
Site Reliability Engineer
Cape Town City Centre: Site Reliability Engineering Manager
Role Overview: As the Site Reliability Engineering Manager, you will oversee the SRE team and work closely with engineering, product, and infrastructure teams to ensure the continuous operation of our platform. You will be responsible for defining and driv
View Job
Site Reliability Engineering Manager
Midrand: Site Reliability Engineer Snr 1917
What Youll Bring to the Table Essential Skills: Container Expertise: Skilled in Kubernetes or similar container orchestration platforms. Unix/Linux Knowledge: Strong understanding of Unix/Linux internals, administration, and networking stack. Networking Ma
View Job
Site Reliability Engineer Snr 1917
Cape Town City Centre: Site Reliability Engineer (Remote)
What Youll Be Doing: As a Site Reliability Engineer, youll be the backbone of our infrastructure, responsible for designing, maintaining, and optimizing high-availability systems Your role will include: Building Scalable Infrastructure: Craft and manage ro
View Job
Site Reliability Engineer (Remote)
Menlyn: Site Reliability Engineer (Advanced) 2076
What Youll Bring to the Table: Essential Skills: Java 11 with strong Object-Oriented Programming skills. Spring Boot for robust application development. Containerization expertise with Kubernetes and Docker . Proficiency in Git/GitHub version control. Comp
View Job
Site Reliability Engineer (Advanced) 2076
Centurion: Site Reliability Engineer
A leading company in the financial industry is looking for a highly skilled Site Reliability Engineer to join their growing IT team. The ideal candidate will bring 8-10 years of experience in software engineering, platform engineering, and working with cro
View Job
Site Reliability Engineer
Menlyn: Site Reliability Engineer (Senior) 2228
Your Journey Starts Here Contract Start Date : 1 March 2025 Contract End Date : 31 December 2027 Location : South Africa Eligibility : South African citizens or valid work permit holders preferred. Why Youll Love This Role Innovate Daily : Work with the la
View Job
Site Reliability Engineer (Senior) 2228
Pretoria: Site Reliability Engineer – Midrand / Centurion/ Semi-Remote – Contract – R582 Per Hour
A role for a Site Reliability Engineer has been made available for a candidate that has Java development experience of at least 1 year . (OCA preferable, OCP more so). You will be coordinate with internal and external team members, including QA and BA, and
View Job
Site Reliability Engineer – Midrand / Centurion/ Semi-Remote – Contract – R582 Per Hour
Stellenbosch: Site Reliability Engineer (Sre) (Ch1078) – Fully Remote posted by Capital H Staffing and Advisory Solutions
Our client is an innovative cloud-based company that leverages its software to address the legal contracting, compliance, and legal practice challenges faced by listed companies and multinationals. They are seeking a Site Reliability Engineer to join their
View Job
Site Reliability Engineer (Sre) (Ch1078) – Fully Remote
South Africa: Site Reliability Engineer – Sandton/ Remote – R1.2M Pa posted by E-Merge
An opportunity has been made available with one the leading banks offering a role as an Multi - Discipline Specialist to join this dynamic team.Looking for Multi - Discipline Specialist with Site Reliability Engineering capabilities to provide guidance and
View Job
Site Reliability Engineer – Sandton/ Remote – R1.2M Pa
Johannesburg: Site Reliability Engineer (Sre) (Remote) posted by Datafin
Site Reliability Engineer (SRE) (Remote)Engineering/Technical ~ IT - Software DevelopmentCape Town - Western Cape ~ Johannesburg - Gauteng ~ Durban - KwaZulu Natal ~ RemoteENVIRONMENT: AN analytical thinking & solutions-driven Site Reliability Engineer is
View Job
Site Reliability Engineer (Sre) (Remote)
Midrand: Site Reliability Engineer Snr 1917 posted by Opensource
What You’ll Bring to the TableEssential Skills:Container Expertise: Skilled in Kubernetes or similar container orchestration platforms.Unix/Linux Knowledge: Strong understanding of Unix/Linux internals, administration, and network
View Job
Site Reliability Engineer Snr 1917
Western Cape: Site Reliability Engineering Manager posted by One Connect Solutions
Role Overview:As the Site Reliability Engineering Manager, you will oversee the SRE team and work closely with engineering, product, and infrastructure teams to ensure the continuous operation of our platform. You will be responsible for defining and drivi
View Job
Site Reliability Engineering Manager
Cape Town City Centre: Site Reliability Engineer (Remote)
As a Site Reliability Engineer, youll be the backbone of our infrastructure, responsible for designing, maintaining, and optimizing high-availability systems. Your role will include: Building Scalable Infrastructure: Craft and manage robust cloud and on-pr
View Job
Site Reliability Engineer (Remote)
Pretoria: Systems Engineer/ Site Reliability Engineer posted by Hire Resolve
Position: Systems Engineer/ Site Reliability EngineerHire Resolves client is seeking a skilled and experienced Systems Engineer/Site Reliability Engineer to join their team in Pretoria, Gauteng. The successful candidate will be responsible for ensuring the
View Job
Systems Engineer/ Site Reliability Engineer
Gauteng: Site Reliability Engineer (Senior) 2228 posted by Opensource
Your Journey Starts HereContract Start Date: 1 March 2025Contract End Date: 31 December 2027Location: South AfricaEligibility: South African citizens or valid work permit holders preferred.Why You’ll Love This RoleInnovate Daily: Work with the latest
View Job
Site Reliability Engineer (Senior) 2228
Gauteng: Site Reliability Engineer (Advanced) 2076 posted by Opensource
What You’ll Bring to the Table:Essential Skills:Java 11+ with strong Object-Oriented Programming skills.Spring Boot for robust application development.Containerization expertise with Kubernetes and Docker.Proficiency in Git/GitHub version control.Com
View Job
Site Reliability Engineer (Advanced) 2076
Western Cape: Site Reliability Engineer (Remote) posted by Communicate Finance
As a Site Reliability Engineer, you’ll be the backbone of our infrastructure, responsible for designing, maintaining, and optimizing high-availability systems. Your role will include:Building Scalable Infrastructure: Craft and manage robust cloud and
View Job
Site Reliability Engineer (Remote)
Centurion: Site Reliability Engineer posted by Network Finance
A leading company in the financial industry is looking for a highly skilled Site Reliability Engineer to join their growing IT team. The ideal candidate will bring 8-10 years of experience in software engineering, platform engineering, and working with cro
View Job
Site Reliability Engineer
Pretoria: Site Reliability Engineer – Midrand / Semi-Remote – R650 Per Hour
An opportunity for a Site Reliability Engineer with DevOps and Jira experience to join a global leading manufacturing business You will be a team member of a larger product team that focusses on the development and support of a several mission-critical com
View Job
Site Reliability Engineer – Midrand / Semi-Remote – R650 Per Hour
Pretoria: Senior Site Reliability Engineer
Senior Site Reliability Engineer (SSRE) – Remote (12-Month Contract) We are looking for an experienced Senior Site Reliability Engineer (SSRE) to join a dynamic and innovative team. This is a fully remote contract role where you will be responsible for bui
View Job
Senior Site Reliability Engineer
Pretoria: Senior Site Reliability Engineer posted by WatersEdge Solutions
Senior Site Reliability Engineer (SSRE) Remote (12-Month Contract)We are looking for an experienced Senior Site Reliability Engineer (SSRE) to join a dynamic and innovative team. This is a fully remote contract role where you will be responsible for build
View Job
Senior Site Reliability Engineer
Western Cape: Site Reliability Engineer (Remote) posted by Communicate Finance
What You’ll Be Doing:As a Site Reliability Engineer, you’ll be the backbone of our infrastructure, responsible for designing, maintaining, and optimizing high-availability systemsYour role will include:Building Scalable Infrastructure: Craft an
View Job
Site Reliability Engineer (Remote)
Pretoria: Site Reliability Engineer posted by Sabenza IT & Recruitment
Primary responsibility is DevOps, with a strong focus on infrastructure, monitoring, debugging, fault-finding and continuous improvement to ensure a stable and reliable service (sub product / software application).Coordinate with internal and external team
View Job
Site Reliability Engineer