About the company
We combine the power of Web3 and creativity to build experiences that connect people from all corners of the globe.
Job Summary
Responsibilities:
📍 Identify, propose and execute improvements to performance and scalability bottlenecks in our current systems/infrastructure on AWS 📍Measure systems' health, scalability and performance metrics and identify areas of improvement 📍Utilize your knowledge of code to solve broad operational challenges within the Limit Breaks Infrastructure and Platform 📍Work with the wider engineering team to identify how we can provide the most production-like environment for running both manual and automated testing 📍Define SLOs, SLIs, monitoring, alerting and incident response practices
Qualifications:
📍 5+ years experience in SRE, Dev Ops or Systems engineering 📍Strong background in kubernetes 📍Extensive experience in Terraform and Ansible 📍CI/CD and automation experience 📍Strong background in AWS 📍Ability to participate in an on-call rotation 📍Effective communication skills to be able to clearly explain your reasoning and thought process for anything you propose 📍Excellent collaboration skills to be able to work closely with product engineers and product owners to understand their context and co-design appropriate solutions which balance feature velocity with site reliability 📍Implementation of in-house monitoring and observability infrastructure 📍Implementation of Elastic Search stack or equivalent solutions for capturing logs from all environments 📍Working with InfoSec to implement various tools to monitor and protect the environment in real-time