We are looking for driven and innovative software engineers with strong site reliability engineering (SRE) discipline or interest in this area to help us make ClickUp the "one app to rule them all". As an SRE at ClickUp, your primary roles will be improving the stability, availability and reliability of our globally distributed and cloud-based infrastructure that powers our app for thousands of users daily. If you are a rockstar engineer with an entrepreneurial and high-paced mindset who are ready to own, drive and tackle some of the most complex problems there are out there we would love to hear from you!
What you'll do:
- Work on database reliability and performance aspects for ClickUp from within Platform Engineering
- Participate in OnCall support rotation
- Implement best practices for our PostgreSQL database cluster and its components
- Work on observability of relevant database metrics and make sure we reach our database objectives
- Provide database expertise to engineering teams (review of database migrations, queries and performance optimizations)
- Work on automation of database infrastructure and help engineering succeed by providing self-service tools
- Plan the growth and manage the capacity of ClickUp's database infrastructure
- Design, build and maintain core database infrastructure pieces that allow ClickUp to scale
- Support and debug database production issues across services and levels of the stack
- Make monitoring and alerting alert on symptoms and SLOs, and not on outages
- Document every action so your learnings turn into repeatable actions, then into automation
- Perform and run blameless RCA's on incidents and outages, relentlessly looking for answers that will prevent the incident from ever happening again.
- Help engineering succeed by providing self-service tools
- Work with peer DBAs/DBREs to roll out changes to our production environment
- Help mitigate database-related production incidents
- Provide database expertise to engineering teams (for eg. through reviews of db migrations, query and performance optimizations)
- Design, build and maintain core database pieces for massive scaling
- Support and debug database production issues across services and levels of the stack.
What we’re looking for:
- 5 years of experience running PostgreSQL in large production environments.
- 2 years of experience with infrastructure automation and configuration management (Chef, Ansible, Puppet, Terraform…).
- Managed a DB at least 100GB in size on a high OLTP throughput database environment
- 2 years of experience with any programming language or advanced scripting - Python, bash
- Knowledge of Linux and/or the Unix Shell
- Have solid knowledge of SQL and PL/pgSQL
- Have solid knowledge of the internals of PostgreSQL.
- Have experience working in a distributed production environment.
- Share ClickUp’s core values and work in accordance with those values.
- Have excellent written and verbal English communication skills.
- Great collaborations skills and can communicate asynchronously.
- Being Proactive - You see something broken, you can't help but fix it!
- Urge for delivering quickly and iterating fast.
- Have a passion for stable and secure systems management practices.
- Cloud technology experience with AWS, Azure, GCP
- Some data modeling and design skills
- Show ownership of Relational Database/PostgreSQL ecosystem
- Bonus if you know non-relational database technologies - DynamoDB, Redis, etc.
#LI-RS1
#LI-REMOTE