The opportunity
Altruist is in the midst of an exceptional growth phase and we’re excited to hire a Staff DevOps Engineer to join our growing team, and we’re open to Senior and Principal level candidates as well. In this role, you’ll demonstrate expertise in managing and scaling complex infrastructure, leading automation initiatives, and implementing advanced SRE practices. With a strong background in Kubernetes, AWS services, and Infrastructure as Code (IaC), you will play a pivotal role in ensuring the reliability, scalability, and security of our systems. This role requires a proactive problem solver who can mentor junior engineers, lead teams through technical initiatives, and drive innovative DevOps solutions.
Your impact
- Architect, deploy, and maintain scalable and secure infrastructure with Kubernetes (EKS) and AWS services.
- Lead root cause analysis and implement comprehensive solutions to prevent future incidents.
- Establish advanced monitoring, logging, and alerting systems using tools like Prometheus and DataDog.
- Design and optimize complex CI/CD pipelines with GitLab, ensuring seamless deployment workflows.
- Champion Site Reliability Engineering (SRE) best practices to enhance system resilience.
- Oversee and optimize network configurations, security groups, VPCs, and other cloud networking elements.
- Mentor and guide junior DevOps engineers, fostering a culture of collaboration and continuous learning.
- Design and implement disaster recovery strategies, including automated failover mechanisms.
- Drive resource optimization initiatives to improve performance and reduce costs.
- Experience with AWS Enterprise security best practices and frameworks.
- Serve as a senior escalation point during on-call rotations and incident management.
What you bring
- Experience - 8+years of experience working in DevOps or a related field.
- Mastery of Kubernetes (K8s) ecosystem (Helm, Vault, Docker, CNIs).
- Prior experience as a technical lead.
- Extensive experience with AWS services such as RDS, CloudWatch, Kafka and advanced networking (Transit Gateways, Load Balancing, DNS).
- Expertise in Infrastructure as Code (IaC) using Terraform and other tools.
- Strong hands-on experience with monitoring tools like Prometheus and DataDog.
- Advanced proficiency in GitLab CI/CD and automating complex pipelines.
- Proficiency in Linux environments, shell scripting, and at least one modern programming language.
- In-depth knowledge of system architecture and distributed systems design.
- Proven experience designing and implementing disaster recovery infrastructure.
- Education - Ideally looking for a B.A. / B.S. degree in relevant fields such as Computer Science.
- Technical aptitude - You’re technologically savvy and can easily get up to speed on modern tech stacks.
- Ownership - The pride you put into every aspect of your work is unparalleled and undeniable
- Superb communication - Intentional dialogue is a superpower. You listen as well as you share your perspective with others.
- Resilience - We’re inspired by your unwavering determination to achieve success, no matter the adversity you face along the way.
- Assurance - Your confidence is brilliant, yet ego-less. You possess a strong knowledge base, the ability to discover the unknown, and are open to differing perspectives.
- Creative problem solving - Identifying the problem is simply not enough. You’re instinctually creative with your approach in finding solutions to roadblocks.