Johannesburg: Big Data Data Engineer posted by PBT Group
Job Description
We are seeking a skilled Data Engineer to design and develop scalable data pipelines that ingest raw, unstructured JSON data from source systems and transform it into clean, structured datasets within our Hadoop-based data platform. The ideal candidate will play a critical role in enabling data availability, quality, and usability by engineering the movement of data from the Raw Layer to the Published and Functional Layers.
Key Responsibilities:
- Design, build, and maintain robust data pipelines to ingest raw JSON data from source systems into the Hadoop Distributed File System (HDFS).
- Transform and enrich unstructured data into structured formats (e.g., Parquet, ORC) for the Published Layer using tools like PySpark, Hive, or Spark SQL.
- Develop workflows to further process and organize data into Functional Layers optimized for business reporting and analytics.
- Implement data validation, cleansing, schema enforcement, and deduplication as part of the transformation process.
- Collaborate with Data Analysts, BI Developers, and Business Users to understand data requirements and ensure datasets are production-ready.
- Optimize ETL/ELT processes for performance and reliability in a large-scale distributed environment.
- Maintain metadata, lineage, and documentation for transparency and governance.
- Monitor pipeline performance and implement error handling and alerting mechanisms.
Technical Skills & Experience:
- 3+ years of experience in data engineering or ETL development within a big data environment.
- Strong experience with Hadoop ecosystem tools: HDFS, Hive, Spark, YARN, and Sqoop.
- Proficiency in PySpark, Spark SQL, and HQL (Hive Query Language).
- Experience working with unstructured JSON data and transforming it into structured formats.
- Solid understanding of data lake architectures: Raw, Published, and Functional layers.
- Familiarity with workflow orchestration tools like Airflow, Oozie, or NiFi.
- Experience with schema design, data modeling, and partitioning strategies.
- Comfortable with version control tools (e.g., Git) and CI/CD processes.
Nice to Have:
- Experience with data cataloging and governance tools (e.g., Apache Atlas, Alation).
- Exposure to cloud-based Hadoop platforms like AWS EMR, Azure HDInsight, or GCP Dataproc.
- Experience with containerization (e.g., Docker) and/or Kubernetes for pipeline deployment.
- Familiarity with data quality frameworks (e.g., Deequ, Great Expectations).
Qualifications:
- Bachelors degree in Computer Science, Information Systems, Engineering, or a related field.
- Relevant certifications (e.g., Cloudera, Databricks, AWS Big Data) are a plus.
* In order to comply with the POPI Act, for future career opportunities, we require your permission to maintain your personal details on our database. By completing and returning this form you give PBT your consent
* If you have not received any feedback after 2 weeks, please consider you application as unsuccessful.
Ready to Apply?
Click below to apply directly with the employer
Safe & secure application
Explore More Opportunities
Get Similar Job Alerts
Job Seeker Tip
Follow up your job application with a polite email if you haven't heard back within a week.
How to Apply
Click “GO APPLY” to visit the company’s application page.
Follow their instructions carefully.
JVR Jobs connects you with employers – we don’t process applications directly.
Latest Job Opportunities
Western Cape: Technical Co-ordinator l Technician posted by Collab SA
RequirementsStrong technical background with MPS knowledge advantageous.Excellent organisational, scheduling, and time management skills.Advanced Excel...
View JobWestern Cape: Senior AI and Automation Developer posted by Collab SA
Requirements- Extensive Experience 5 years of professional experience in full-stack web development, with a significant portion focused on building and...
View JobGauteng: Sales – Consumables posted by Execustaff South Africa (Pty) Ltd
Key ResponsibilitiesAchieve agreed upon sales targetsEstablish and maintain strong customer relationships with existing and new customersService the...
View JobGauteng: Sales Representative – Equipment posted by Execustaff South Africa (Pty) Ltd
Duties and ResponsibilitiesEstablish and maintain strong customer relationships with existing and new customersSupport company representatives, internal...
View JobRoodepoort: Project Management Engineer posted by Assegai Recruitment (Pty) Ltd
Duties and ResponsibilitiesExtensive project managementHeading up entire projects on major power generation sites, such as MHIs, Environmental Impact...
View JobGauteng: HR Manager posted by The Recruitment People
This individual will be responsible for managing the full HR function, including but not limited toStrategic planningRecruitment and talent...
View Job
Browse Employers
Job Alerts