Job Description
- Infrastructure Design and Management
- Design, deploy, and maintain cloud infrastructure using platforms like Microsoft Azure, AWS, or Google Cloud.
- Ensure high availability, scalability, and fault tolerance of applications and services, including managing containerized environments.
- Utilize Rancher for managing Kubernetes clusters, ensuring efficient deployment and orchestration of containerized applications across various environments.
- Automation and CI/CD Pipeline Development
- Build and maintain continuous integration/continuous deployment (CI/CD) pipelines for automated testing, building, and deployment of software.
- Automate infrastructure provisioning using tools like Terraform, Ansible, or ARM templates.
- Leverage Rancher’s CI/CD capabilities and integrations to streamline the deployment process for containerized applications in Kubernetes environments.
- Monitoring and Performance Optimization
- Implement and maintain monitoring, logging, and alerting solutions to track application and infrastructure performance, using tools such as Azure Monitor, Prometheus, or CloudWatch.
- Use Rancher’s built-in monitoring tools to observe Kubernetes clusters and containers, ensuring that applications are performing optimally.
- Optimize infrastructure for cost-efficiency and performance, configuring autoscaling and resource management within Rancher-managed Kubernetes clusters.
- Security and Compliance
- Ensure the security of cloud infrastructure by configuring firewalls, access controls, and encryption for sensitive data.
- Implement security best practices to maintain compliance with industry regulations and standards, including role-based access control (RBAC) within Rancher for managing Kubernetes security.
- Monitor for security vulnerabilities, manage container security, and perform regular security audits of Kubernetes clusters and cloud resources.
- Collaboration with Development and Operations Teams
- Work closely with software development teams to understand application requirements and provide the necessary infrastructure support, particularly for containerized workloads.
- Collaborate with operations teams to ensure the smooth operation of deployed services, particularly within containerized environments managed through Rancher.
- Incident Management and Troubleshooting
- Investigate and resolve platform-related issues, including application outages, network failures, and security incidents.
- Utilize Rancher’s centralized logging and monitoring to quickly identify and troubleshoot issues within Kubernetes clusters.
- Provide on-call support and contribute to incident response strategies, ensuring minimal downtime and fast recovery of services.
- System Upgrades and Patching
- Manage platform updates, patches, and upgrades to ensure systems remain secure and up-to-date.
- Plan and execute Kubernetes cluster upgrades and Rancher version updates to stay current with new features and security patches.
- Ensure that containerized applications remain compatible and functional after updates.
- Documentation and Knowledge Sharing
- Maintain clear, comprehensive documentation of infrastructure configurations, deployment processes, and troubleshooting procedures.
- Share knowledge of Rancher, Kubernetes, and cloud infrastructure best practices with team members to improve platform operations and efficiency.
- Capacity Planning and Scaling
- Monitor resource usage and plan for capacity scaling to meet changing business and application demands.
- Implement scaling strategies for Kubernetes clusters in Rancher, including auto-scaling of pods, nodes, and applications to accommodate varying workloads.
- Cost Management and Optimization
- Track and analyze cloud resource usage and costs to ensure efficient resource allocation.
- Optimize cloud spending by implementing best practices like reserved instances, spot instances, and resource rightsizing.
- Use Rancher to monitor the resource consumption of containerized applications and optimize the deployment of Kubernetes clusters to reduce infrastructure costs.
- Disaster Recovery and Backup Planning
- Implement disaster recovery strategies and data backup solutions to minimize downtime and data loss.
- Regularly test backup systems and recovery procedures to ensure reliability in case of failure, including implementing backup solutions for Kubernetes environments managed through Rancher.
Essential skills
- Several years (typically 3-5 years) of experience in a related field (e.g., systems engineering, DevOps, infrastructure engineering).
- Bachelor’s degree in Computer Science or related field, and or certifications such as Microsoft Certified: Azure Solutions Architect Expert, AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), or Red Hat Certified Engineer (RHCE).
- Strong verbal and written communication skills, with the ability to convey complex ideas clearly and effectively
- Experience working collaboratively in cross-functional teams, with a focus on achieving shared goals
- Expertise in managing multiple projects simultaneously, with a track record of delivering on time and within scope
- Exceptional attention to detail, ensuring high standards of quality in all outputs
- Ability to adapt quickly to changing environments and priorities, maintaining effectiveness in dynamic situations
- Skills in designing highly available and fault-tolerant systems, ensuring platforms are resilient under various conditions.
- Proven working experience with tools like Prometheus, Grafana, Datadog, New Relic, or ELK stack to monitor the health of infrastructure, applications, and services.
- Excellent skills in identifying, diagnosing, and resolving infrastructure issues quickly, especially when systems fail or behave unexpectedly.
- Knowledge of securing infrastructure and applications, including role-based access control (RBAC), encryption, and network security.
- A solid understanding of Git for source code management, collaboration, and version control, is essential
- A strong understanding of container orchestration with Kubernetes, particularly in deploying, managing, and scaling containerized applications across multi-cluster environments.
- Proficiency in configuring and maintaining Rancher for cluster management, along with expertise in implementing security policies, monitoring, and logging within Kubernetes clusters, is essential for optimizing containerized workloads and ensuring high availability, security, and performance.
- A deep understanding of cloud infrastructure management, such as provisioning and configuring virtual machines, networking, storage solutions, and implementing security best practices like Azure Active Directory and network security groups.
- Additionally, proficiency in automation and CI/CD pipelines using Azure DevOps, along with expertise in Azure monitoring tools like Azure Monitor and Application Insights, is crucial for ensuring high availability, security, cost optimization, and efficient deployment of applications in the cloud.
- Strong focus on automating manual processes and optimizing workflows for more efficient system management
- Experience with designing distributed systems, microservices, and understanding the trade-offs between performance, consistency, and scalability.
Please call us on (***)***-**** for more information.
NB: Should you not hear from us in 6weeks please consider your application unsuccessful.
GO APPLY NOW
Safe & secure application process
Explore More Opportunities
Get Similar Job Alerts
Job Seeker Tip
Quantify your achievements on your CV using numbers and percentages where possible.
How to Apply
Click “GO APPLY NOW” to visit the company’s application page.
Follow their instructions carefully.
JVR Jobs connects you with employers – we don’t process applications directly.
Latest Job Opportunities
North West: Lab Technician / Controller posted by PRO Personnel Employment Agency
Requirements:Qualification: National Diploma in Mineral Processing and Extractive Metallurgy or equivalent or a degree in a relevant scientific field,...
View JobRustenburg: Excavator Hydraulic Breaker Operator posted by PRO Personnel Employment Agency
RequirementsGrade 12Must have valid licenseMust have valid certificatesMust have 3 years experience as Operator on Hydraulic Rock Breaker.Must be able…
View JobMpumalanga: Tactical Team Leader posted by Bidvest Protea Coin
Day-to-day operational running the shift the team leader is allocated toOpening of dockets, inspecting of dockets before handing inCrime scene…
View JobMpumalanga: Contract Manager posted by Bidvest Protea Coin
Security advice to clientThreat and Risk assessmentClient liaison and professional client relationshipManpower ManagementEquipment ManagementEffectively...
View JobMpumalanga: Contract Manager posted by Bidvest Protea Coin
Security advice to clientThreat and Risk assessmentClient liaison and professional client relationshipManpower ManagementEquipment ManagementEffectively...
View JobGauteng: Investigator posted by Bidvest Protea Coin
Managing and Investigation of crime incidents;Handling of informers and intelligence;Management of Informer Rewards;Ensure that court procedures are...
View Job
Browse Employers
Job Alerts