Job Description
Johannesburg – Gauteng – South Africa
- Own uptime, performance, and monitoring for all production applications.
- Manage Heroku pipelines, CI/CD, review apps, and production environments.
- Operate Celery workers and queues, monitor health, and handle missed task check-ins.
- Define and track service level objectives (SLOs) (availability, latency, task success rate).
- Maintain runbooks, a centralised wiki for incident response, and lead post-mortems.
- Run periodic disaster recovery drills and coordinate Penetration Tests.
- Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
- Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
- Standardise infrastructure (Terraform or scripts for DO/AWS; app.json for Heroku).
- Manage Cloudflare for DNS, edge security, and performance optimisation.
- Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
- Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.
- Maintain CI pipelines with type checking, linting, and security scanning.
- Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
- Support Developers with tooling for local/staging environments and build self-service dashboards (e.g., Celery queue status).
- Collaborate with Developers to streamline workflows and educate on secure coding practices.
- Own vulnerability management and dependency patching cadence.
- Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
- Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
- Contribute evidence and responses for security questionnaires and SOC 2 audits.
- Maintain a security pack with architecture, sub-processors, and DR/backup processes.
- Configure Sentry ownership rules, Cron Monitors, and release health.
- Centralise metrics/logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus/New Relic).
- Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
- Conduct capacity planning and track resource usage trends.
- Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
- Assess new tools/services to enhance platform capabilities (e.g., observability, security).
- Track costs, security posture, and integration quality for all third-party services.
- Cloud Infrastructure Management: 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
- CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
- Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
- Security Fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
- Disaster recovery & backups: Experience implementing and operating automated backups, restore testing, and writing/maintaining incident runbooks.
- Communication & collaboration: Ability to document processes clearly and work closely with Developers in a small team.
- Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.
- Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.
- Scaling & cost optimisation: Capacity planning, performance tuning, and managing infra spend.
- Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security questionnaires.
- Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.
- Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).
- Proficiency in Python; familiarity with Django/Flask.
- Experience with DNS/CDN/edge security (e.g., Cloudflare).
- Observability platforms (Prometheus, Grafana, New Relic).
- Static analysis and code quality tools (mypy, Bandit, SonarQube).
- Prior exposure to multi-tenant SaaS environments.
GO APPLY NOW
Safe & secure application process
Explore More Opportunities
Get Similar Job Alerts
Job Seeker Tip
Practice common interview questions with a friend or family member to build confidence.
How to Apply
Click “GO APPLY NOW” to visit the company’s application page.
Follow their instructions carefully.
JVR Jobs connects you with employers – we don’t process applications directly.
Latest Job Opportunities
Stellenbosch: Reservations Manager posted by The Vineyard -, Oude Werf – & Stellenbosch Hotels
At Stellenbosch Hotel, every staff member plays a vital role in shaping the exceptional experience we offer to our esteemed…
View JobUmhlanga Rocks: Sous Chef – Umhlanga posted by Restaurant Staff
Sous Chef opportunity available at a reputable modern Restaurant / Bar in Umhlanga. We are looking for individuals who have…
View JobKempton Park: Restaurant Manager – Kempton Park posted by Restaurant Staff
Upmarket Restaurant in Emperors Palace is looking for a manager to join the team. Will consider candidates with 4 or…
View JobSouth Africa: Lodge Management Couple posted by Hospitality and Outdoor Ltd
Hospitality and Outdoor- New Vacancy- Lodge Management Couple 5* Remote Beach Lodge- Wild Coast, Eastern Cape Wonderful opportunity for an…
View JobCape Town Region: Restaurant Manager, Cape Town (3659) posted by Hospitality Placements
Required for 7-star European-style restaurant, V&A Salary is highly negotiable ยท Only the cream of the crop, top-drawer experience will…
View JobSouth Africa: Direct Travel Designer – Leading Travel Company – Cape Town / Remote (Hybrid) | Sl posted by Kendrick Recruitment
Direct Travel Designer Location: Cape Town / Remote (Hybrid Role) Type of Property: Leading Luxury Travel Company Summary of Key…
View Job
Browse Employers
Job Alerts