Remote
Posted 11 months ago
Remote opportunity with a Large Financial Client
- Must have designed the E2E architecture of unified data platform covering all the aspect of data lifecycle starting from Data Ingestion, Transformation, Serve and consumption.
- Must have excellent coding skills either Python or Scala, preferably Python.
- Must have at least 10+ years of experience in Data Engineering domain with total of 12+ years.
- Must have designed and implemented at least 2-3 project end-to-end in Databricks.
- Must have at least 3+ years of experience on databricks which consists of various components as below
- Delta lake
- dbConnect
- db API 2.0
- SQL Endpoint – Photon engine
- Unity Catalog
- Databricks workflows orchestration
- Security management
- Platform governance
- Data Security
- Must have followed various architectural principles to design best suited per problem.
- Must be well versed with Databricks Lakehouse concept and its implementation in enterprise environments.
- Must have strong understanding of Data warehousing and various governance and security standards around Databricks.
- Must have knowledge of cluster optimization and its integration with various cloud services.
- Must have good understanding to create complex data pipeline.
- Must be strong in SQL and sprak-sql.
- Must have strong performance optimization skills to improve efficiency and reduce cost.
- Must have worked on designing both Batch and streaming data pipeline.
- Must have extensive knowledge of Spark and Hive data processing framework.
- Must have worked on any cloud (Azure, AWS, GCP) and most common services like ADLS/S3, ADF/Lambda, CosmosDB/DynamoDB, ASB/SQS, Cloud databases.
- Must be strong in writing unit test case and integration test.
- Must have strong communication skills and have worked with cross platform team.
- Must have great attitude towards learning new skills and upskilling the existing skills.
- Responsible to set best practices around Databricks CI/CD.
- Must understand composable architecture to take fullest advantage of Databricks capabilities.
- Good to have Rest API knowledge.
- Good to have understanding around cost distribution.
- Good to have if worked on migration project to build Unified data platform.
- Good to have knowledge of DBT.
- Experience around DevSecOps including docker and Kubernetes.
- Software development full lifecycle methodologies, patterns, frameworks, libraries, and tools
- Knowledge of programming and scripting languages such as JavaScript, PowerShell, Bash, SQL, Java, Python, etc.
- Experience with data ingestion technologies such as Azure Data Factory, SSIS, Pentaho, Alteryx
- Experience with visualization tools such as Tableau, Power BI
- Experience with machine learning tools such as mlFlow, Databricks AI/ML, Azure ML, AWS sagemaker, etc.
- Experience in distilling complex technical challenges to actionable decisions for stakeholders and guiding project teams by building consensus and mediating compromises when necessary.
- Experience coordinating the intersection of complex system dependencies and interactions
- Experience in solution delivery using common methodologies especially SAFe Agile but also Waterfall, Iterative, etc.
- Demonstrated knowledge of relevant industry trends and standards