Algolia is set to enable every company to create world-class Search and Discovery experiences with an API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world.
If you're a problem solver, able to think outside the box and eager to nurture others and learn from them, then this is your challenge!
The Team
The Platform as a Service (PaaS) team is dedicated to empowering development teams by creating toolchains, guidelines, and standards. Our focus is on enabling seamless automation and CI/CD, comprehensive observability, and unwavering reliability in a secured cloud-native environment.
The Opportunity
The Senior Site Reliability Engineer (IC4) position within the Platform As a Service team presents an exciting opportunity for a seasoned professional to enhance scalable infrastructure with a focus on CI/CD, Observability, and application hosting. In this role, you will bridge the gap between our junior and senior staff, playing a critical role in ensuring the reliability, scalability, and performance of Algolia’s Search Products. As a senior contributor, you will be responsible for building and optimizing systems that ensure the platform’s efficiency and reliability, while also mentoring junior engineers and collaborating across teams. Your work will be pivotal in improving infrastructure, enhancing observability standards, and streamlining CI/CD processes. You will play a significant role in transitioning legacy systems to a modern Kubernetes-based architecture, contributing to long-term infrastructure strategies, and ensuring alignment with business needs.
Your role will consist of:
- CI/CD Development and Maintenance: Contribute to the design, optimization, and maintenance of the CI/CD pipelines to improve the
speed, reliability, and efficiency of the development lifecycle. Assist in driving standardization across various services hosted on the
platform.
- Observability Enhancement: Lead efforts to improve the observability of critical systems, working closely with cross-functional teams to
ensure actionable monitoring and alerting frameworks are in place. Help troubleshoot complex issues and optimize system reliability.
- Kubernetes and Cloud Management: Contribute to the development and operation of our Kubernetes-based architecture. Ensure
systems are resilient, scalable, and optimized for performance. Actively participate in enhancing cloud-based solutions for API
management and microservices.
- System Optimization and Scaling: Collaborate with team members to ensure system scalability, operability, and performance. Lead
initiatives to optimize resource utilization, focusing on cost efficiency while maintaining high system availability.
- Mentorship and Knowledge Sharing: Mentor mid-level engineers (IC3) by providing guidance on technical challenges and SRE best
practices. Support team growth by fostering knowledge-sharing sessions and helping establish processes that drive operational
excellence.
- Cross-Team Collaboration: Work closely with product, software, and other SRE teams to ensure that platform goals align with broader
business objectives. Drive initiatives aimed at enhancing platform stability, security, and scalability.
You might be a fit if you have:
- Strong Programming Skills: Proficient in Golang and Python with a solid understanding of software craftsmanship. Knowledge of Ruby
is a plus.
- Experience in CI/CD Pipelines: Hands-on experience in building and maintaining CI/CD pipelines using tools like GitHub Actions,
CircleCI, or alternatives. Familiarity with best practices for ensuring build and deployment reliability.
Observability: Experience designing and implementing monitoring, alerting, and observability frameworks that provide actionable
insights. Strong troubleshooting skills in production environments.
- Kubernetes and Cloud Infrastructure: Proven experience in managing and optimizing Kubernetes-based architectures and working
with public cloud providers such as GCP, AWS, or Microsoft Azure.
- Distributed Systems Expertise: Experience in designing, building, and operating distributed systems at scale, with a focus on reliability,
availability, and performance.
- Mentorship and Leadership: Experience mentoring junior engineers and helping them grow. Ability to collaborate with cross-functional
teams and contribute to strategic initiatives.
- Problem-Solving Skills: Ability to independently solve complex technical problems with minimal supervision while collaborating
effectively with other team members.
- Excellent Communication and Organizational Skills: Strong ability to communicate complex technical issues to both technical and
non-technical audiences. Ability to organize and prioritize multiple projects.
We’re looking for someone who can live our values:
GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment
TRUST - Willingness to trust our co-workers and to take ownership
CANDOR - Ability to receive and give constructive feedback.
CARE - Genuine care about other team members, our clients and the decisions we make in the company.
HUMILITY- Aptitude for learning from others, putting ego aside.