Job Description
Johannesburg – Gauteng – South Africa
- Own uptime, performance, and monitoring for all production applications.
- Manage Heroku pipelines, CI/CD, review apps, and production environments.
- Operate Celery workers and queues, monitor health, and handle missed task check-ins.
- Define and track service level objectives (SLOs) (availability, latency, task success rate).
- Maintain runbooks, a centralised wiki for incident response, and lead post-mortems.
- Run periodic disaster recovery drills and coordinate Penetration Tests.
- Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
- Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
- Standardise infrastructure (Terraform or scripts for DO/AWS; app.json for Heroku).
- Manage Cloudflare for DNS, edge security, and performance optimisation.
- Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
- Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.
- Maintain CI pipelines with type checking, linting, and security scanning.
- Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
- Support Developers with tooling for local/staging environments and build self-service dashboards (e.g., Celery queue status).
- Collaborate with Developers to streamline workflows and educate on secure coding practices.
- Own vulnerability management and dependency patching cadence.
- Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
- Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
- Contribute evidence and responses for security questionnaires and SOC 2 audits.
- Maintain a security pack with architecture, sub-processors, and DR/backup processes.
- Configure Sentry ownership rules, Cron Monitors, and release health.
- Centralise metrics/logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus/New Relic).
- Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
- Conduct capacity planning and track resource usage trends.
- Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
- Assess new tools/services to enhance platform capabilities (e.g., observability, security).
- Track costs, security posture, and integration quality for all third-party services.
- Cloud Infrastructure Management: 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
- CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
- Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
- Security Fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
- Disaster recovery & backups: Experience implementing and operating automated backups, restore testing, and writing/maintaining incident runbooks.
- Communication & collaboration: Ability to document processes clearly and work closely with Developers in a small team.
- Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.
- Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.
- Scaling & cost optimisation: Capacity planning, performance tuning, and managing infra spend.
- Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security questionnaires.
- Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.
- Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).
- Proficiency in Python; familiarity with Django/Flask.
- Experience with DNS/CDN/edge security (e.g., Cloudflare).
- Observability platforms (Prometheus, Grafana, New Relic).
- Static analysis and code quality tools (mypy, Bandit, SonarQube).
- Prior exposure to multi-tenant SaaS environments.
GO APPLY NOW
Safe & secure application process
Explore More Opportunities
Get Similar Job Alerts
Job Seeker Tip
Keep track of all your job applications in a spreadsheet, including company names and application dates.
How to Apply
Click “GO APPLY NOW” to visit the company’s application page.
Follow their instructions carefully.
JVR Jobs connects you with employers – we don’t process applications directly.
Latest Job Opportunities
Western Cape: Tele-Sales (Automotive Parts) (Tokai) posted by Techbridge Recruitment
REQUIREMENTS:- Minimum 3 years experience in parts sales/tele-sales- Grade 12 qualification- Strong selling and people skills- Computer LiterateIf you...
View JobStrand: Tele-Sales (Automotive Parts) (Strand – Western Cape) posted by Techbridge Recruitment
REQUIREMENTS:- Minimum 3 years experience in parts sales/tele-sales- Grade 12 qualification- Strong selling and people skills- Computer LiterateIf you...
View JobWestern Cape: SALESMAN (Automotive Parts) Cape Town Tokai posted by Techbridge Recruitment
Key Responsibilities:Identify customer needs and recommend suitable automotive partsProvide accurate product information (specifications, features, and...
View JobGauteng: Storeman – Boksburg (Building Materials) posted by Techbridge Recruitment
Minimum requirements: Grade 12 with relevant qualification in General Office Administration/Data Capturing/logistics/supply chain management Computer...
View JobMidrand: SALESMAN (Automotive Parts) Midrand posted by Techbridge Recruitment
Key Responsibilities:Identify customer needs and recommend suitable automotive partsProvide accurate product information (specifications, features, and...
View JobWestern Cape: SALESMAN (Automotive Parts) Brackenfell posted by Techbridge Recruitment
Key Responsibilities:Identify customer needs and recommend suitable automotive partsProvide accurate product information (specifications, features, and...
View Job
Browse Employers
Job Alerts