We are in search of a seasoned Data Engineer who has a strong background in Machine Learning Engineering. In this unique role, you will primarily focus on data engineering tasks while leveraging your ML expertise to collaborate with data scientists for model deployment.
The ideal candidate is someone who excels in building robust data infrastructures and also has a proven track record in helping data teams deploy machine learning models in real-world environments. You'll be working hand-in-hand with our platform team, supporting and maintaining our data infrastructure, and concurrently assisting data scientists in model deployment and monitoring.
Day in the Life
- Design, build, and maintain efficient, reliable, and complex ETL pipelines to process and analyze large volumes of data from various sources.
- Develop and enhance our data lakehouse, driving data quality across departments and building self-service tools for analysts.
- Define, build, and own data architecture for a trusted, governed, dimensionally-modeled repository of data.
- Collaborate with cross-functional teams including data scientists to assist in deploying and monitoring machine learning models in production environments.
- Help data scientists develop and maintain ML API services for seamless integration into the company's infrastructure.
- Apply knowledge of real-time, streaming, and batch processing concepts to optimize model performance and data handling.
- Participate in code and design reviews to maintain high development standards.
Who You Are
- Bachelor's/Master's degree in Computer Science, Data Science, or a related quantitative field.
- Proficiency in Python and software engineering.
- Proven experience as a Data Engineer, with a solid understanding of SQL, and Big Data technologies.
- Expertise in containerization and orchestration technologies like Docker and Kubernetes.
- Knowledge of vector stores, databases, and data warehousing concepts.
- Experience in deploying, and monitoring ML API services using Flask or FastAPI.
- Strong project management skills, with the ability to collaborate effectively with cross-functional teams.
Preferred Qualifications:
- Experience with Hevo Data or other streaming vendors(Fivetran, Airbyte, DMS)
- Experience with DBT
- Experience with Snowflake or Redshift
- Experience with orchestration tools such as Airflow
- Experience with data catalog solutions such as Atlan
- Experience with Metaflow is a plus
- Experience with cloud platforms such as AWS, GCP, or Azure
- Experience with specialized ML serving tools like Bento, Seldon Core, Hugging Face Inference, Sagemaker Endpoints is a plus.
What We Offer:
- A mission-driven and value-based company dedicated to empower deskless workers and local businesses
- An early employee opportunity at a Series B hyper-growth startup
- Work with the founding team and industry veterans to accelerate your career
- Competitive salary and equity
- Comprehensive health coverage
- Performance-based year-end bonuses
- Unlimited PTO
- Remote/WFH schedule