Senior Database Reliability Engineer

ClickUp • San Diego, California, United States • 1w ago

We are looking for driven and innovative software engineers with strong site reliability engineering (SRE) discipline or interest in this area to help us make ClickUp the "one app to rule them all". As an SRE at ClickUp, your primary roles will be improving the stability, availability and reliability of our globally distributed and cloud-based infrastructure that powers our app for thousands of users daily. If you are a rockstar engineer with an entrepreneurial and high-paced mindset who are ready to own, drive and tackle some of the most complex problems there are out there we would love to hear from you!

What you'll do:

Work on database reliability and performance aspects for ClickUp from within Platform Engineering
Participate in OnCall support rotation
Implement best practices for our PostgreSQL database cluster and its components
Work on observability of relevant database metrics and make sure we reach our database objectives
Provide database expertise to engineering teams (review of database migrations, queries and performance optimizations)
Work on automation of database infrastructure and help engineering succeed by providing self-service tools
Plan the growth and manage the capacity of ClickUp's database infrastructure
Design, build and maintain core database infrastructure pieces that allow ClickUp to scale
Support and debug database production issues across services and levels of the stack
Make monitoring and alerting alert on symptoms and SLOs, and not on outages
Document every action so your learnings turn into repeatable actions, then into automation
Perform and run blameless RCA's on incidents and outages, relentlessly looking for answers that will prevent the incident from ever happening again.
Help engineering succeed by providing self-service tools
Work with peer DBAs/DBREs to roll out changes to our production environment
Help mitigate database-related production incidents
Provide database expertise to engineering teams (for eg. through reviews of db migrations, query and performance optimizations)
Design, build and maintain core database pieces for massive scaling
Support and debug database production issues across services and levels of the stack.

What we’re looking for:

5 years of experience running PostgreSQL in large production environments.
2 years of experience with infrastructure automation and configuration management (Chef, Ansible, Puppet, Terraform…).
Managed a DB at least 100GB in size on a high OLTP throughput database environment
2 years of experience with any programming language or advanced scripting - Python, bash
Knowledge of Linux and/or the Unix Shell
Have solid knowledge of SQL and PL/pgSQL
Have solid knowledge of the internals of PostgreSQL.
Have experience working in a distributed production environment.
Share ClickUp’s core values and work in accordance with those values.
Have excellent written and verbal English communication skills.
Great collaborations skills and can communicate asynchronously.
Being Proactive - You see something broken, you can't help but fix it!
Urge for delivering quickly and iterating fast.
Have a passion for stable and secure systems management practices.
Cloud technology experience with AWS, Azure, GCP
Some data modeling and design skills
Show ownership of Relational Database/PostgreSQL ecosystem
Bonus if you know non-relational database technologies - DynamoDB, Redis, etc.