Data Engineer
Job Description
- Build pipelines to bring in wide variety of data from multiple sources within the organization as well as from social media and public data sources.
- Collaborate with cross functional teams to source data and make it available for downstream consumption.
- Work with the team to provide an effective solution design to meet business needs.
- Ensure regular communication with key stakeholders, understand any key concerns in how the initiative is being delivered or any risks/issues that have either not yet been identified or are not being progressed.
- Ensure dependencies and challenges (risks) are escalated and managed. Escalate critical issues to the Sponsor and/or Head of Data Engineering.
- Ensure timelines (milestones, decisions and delivery) are managed and value of initiative is achieved, without compromising quality and within budget.
- Ensure an appropriate and coordinated communications plan is in place for initiative execution and delivery, both internal and external.
Job Requirement
- Work as a team player
- Excellent problem analysis skills
- Experience with at least one Cloud Infra provider (Azure/AWS)
- Experience in building data pipelines using batch processing with Apache Spark (Spark SQL, Dataframe API) or Hive query language (HQL)
- Experience in building streaming data pipeline using Apache Spark Structured Streaming or Apache Flink on Kafka & Delta Lake
- Knowledge of NOSQL databases. Good to have experience in Cosmos DB and GraphQL
- Knowledge of Big data ETL processing tools
- Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
- Basic knowledge of scripting (shell / bash)
- Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files