Description
About Us:
We are committed to driving innovation and efficiency through the power of data. We are looking for a Data Engineer who is passionate about data technologies and eager to work on cutting-edge solutions that fuel our decision-making processes and strategic initiatives.
Key Role and Responsibilities:
As a Data Engineer, you will play a pivotal role in designing, implementing, and maintaining scalable data pipelines and architectures. Your expertise will enhance our data ecosystem, ensuring data quality, accessibility, and timeliness for business-critical processes. Your key
responsibilities include:
. ETL/ELT Development:
- Build, automate, and maintain ETL/ELT pipelines to process and transform high-volume datasets from diverse sources using best practices and modern technologies.
. Data Cleansing and Enrichment:
- Develop methods to clean and validate data, ensuring accuracy and consistency. Enrich data to add value and support advanced analytics and business intelligence.
. Big Data Engineering:
- Architect and manage large-scale data processing systems using distributed computing frameworks like Databricks, Apache Spark and Hadoop to handle big data challenges effectively.
. Data Pipelines and Streaming:
- Design and implement robust, scalable, and efficient data pipelines, utilizing tools such as Apache Kafka for real-time data streaming and processing.
. Optimization and Performance Tuning:
- Optimize data architectures for performance and cost, implementing best practices for data storage, retrieval, and throughput.
. Collaboration and Leadership:
- Work closely with data analysts, data scientists, and cross-functional teams to meet data requirements and support projects. Mentor junior engineers and contribute to a culture of continuous improvement.
Key Qualifications:
- Bachelor's or Master’s degree in Computer Science, Data Science, Information Technology, or a related field.
- Over years of experience in data engineering with a proven track record of designing and implementing data solutions in complex environments.
Must-Have Skills and Tools:
- Databricks and Apache Spark: Extensive experience in batch and stream processing using Databricks and Spark for large-scale data processing.
- Apache Hadoop: Proficiency with the Hadoop ecosystem (HDFS, MapReduce, Hive) for big data storage and processing.
- Apache Kafka: Hands-on experience in deploying Kafka for scalable event streaming and data integration solutions.
- SQL and NoSQL Databases: Strong skills in both SQL (e.g., PostgreSQL, MySQL) and NoSQL (e.g., MongoDB, Cassandra) databases for diverse data storage needs.
Good to Have:
- Experience with cloud platforms such as AWS, Azure, or Google Cloud, utilizing services like AWS Glue, Azure Data Factory, or Google BigQuery.
- Familiarity with containerization and orchestration technologies like Docker and Kubernetes.
- Knowledge in data warehousing and data lake architectures.
- Understanding of CI/CD processes and DevOps principles for data engineering.
- Certifications in relevant technologies or cloud platforms.