Job Title: Data Engineer: Spark/Scala/Python
Job Location: Dubai – UAE
Job Duration: 12 months extendable
Job Description:
We are looking for a highly skilled and motivated Spark Data Engineer to join our team. The ideal
candidate will have a strong background in Apache Spark, data ingestion, data processing, and data
integration, and will be responsible for developing and maintaining our dynamic data ingestion framework
using the Spark framework. The candidate should have expertise in building scalable, high-performance,
and fault-tolerant data processing pipelines using Spark, and be able to optimize Spark jobs for
performance and scalability. The candidate should also have experience in designing and implementing
data models, handling data errors, implementing data quality and validation processes, and integrating
Spark applications with other big data technologies in the Hadoop ecosystem.
Responsibilities:
• Develop and maintain a dynamic data ingestion framework using Apache Spark
• Implement data ingestion pipelines for batch processing and real-time streaming using Spark’s
data ingestion APIs.
• Design and implement data models using Spark’s DataFrame and Dataset APIs
• Optimize Spark jobs for performance and scalability, including caching, broadcasting, and data
partitioning techniques.
• Implement error handling and fault tolerance mechanisms to handle data errors, processing
failures, and system failures in Spark applications.
• Implement data quality and validation processes, including data profiling, data cleansing, and data
validation rules using Spark’s data processing and data validation APIs.
• Integrate Spark applications with other big data technologies in the Hadoop ecosystem, such as
Hadoop, Hive, HBase, Kafka, and others.
• Ensure data security by implementing data encryption, data masking, and data access controls in
Spark applications.
• Use version control systems, such as Git, for source code management, and implement DevOps
practices, such as continuous integration, continuous delivery, and automated deployments, in
Spark application development workflows.
Qualifications:
• Bachelor’s or master’s degree in computer science, Data Engineering, or a related field
• Strong proficiency in Apache Spark, including Spark Core, Spark SQL, Spark Streaming, and Spark
MLlib, with multiple production developments and deployment experience.
• Proficiency in either Scala or Python programming languages, with knowledge of functional
programming concepts
• Experience in developing and maintaining dynamic data ingestion frameworks using Spark
• Experience in data processing, data integration, and data modeling using Spark’s DataFrame and
Dataset APIs
• Knowledge of performance optimization techniques in Spark, including caching, broadcasting, and
data partitioning
• Experience in implementing error handling and fault tolerance mechanisms in Spark applications
• Knowledge of data quality and validation techniques using Spark’s data processing and data
validation APIs
• Familiarity with other big data technologies in the Hadoop ecosystem, such as Hadoop, Hive,
HBase, Kafka, etc.
• Experience in implementing data security measures in Spark applications, such as data encryption,
data masking, and data access controls.
• Strong problem-solving skills and ability to troubleshoot and resolve issues related to Spark
applications.
• Proficiency in using version control systems, such as Git, and implementing DevOps practices in
Spark application development workflows.
• Excellent communication and collaboration skills, with the ability to work effectively in a team-
oriented environment.