About the company
Gemini is a regulated cryptocurrency exchange, wallet, and custodian that makes it simple and secure to buy bitcoin, ether, and other cryptocurrencies.
Job Summary
Responsibilities:
šLead and manage a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and operational excellence. šDevelop and execute the SRE team's strategic goals, objectives, and roadmap in alignment with the overall business objectives. šOversee the design, implementation, and maintenance of highly available and scalable production systems. šDrive continuous improvement initiatives by identifying areas for enhancement and implementing best practices, automation, and process improvements. šCollaborate with cross-functional teams and Departments to ensure smooth integration of applications and systems. šDefine and enforce Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure system reliability and uptime. šMonitor system performance, troubleshoot issues, and ensure timely incident response, root cause analysis, and problem resolution. šImplement effective monitoring, logging, and alerting systems to proactively identify and mitigate potential issues. šStay up-to-date with industry trends, emerging technologies, and best practices related to SRE and DevOps, and apply them to improve operational efficiency.
Minimum Qualifications:
šBachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience). šProven experience as a Site Reliability Engineer or similar role, with at least 5 years of hands-on experience in managing production systems. šStrong expertise in the listed technologies: Ansible, Concourse CI, Jenkins, Github Actions, EKS (Kubernetes), Linux Administration. šDemonstrated experience in leading and managing a team of technical professionals. šSolid understanding of SRE principles, including reliability, scalability, availability, and performance. šProficient in scripting and automation (e.g., Python, Bash, or similar). šExperience with infrastructure-as-code (IaC) tools, configuration management, and CI/CD pipelines. šKnowledge of cloud platforms (e.g., AWS, Azure, or Google Cloud) and containerization technologies (e.g., Docker). šExcellent problem-solving skills and the ability to thrive in a fast-paced, dynamic environment. šStrong communication and leadership skills, with the ability to collaborate effectively with both technical and non-technical stakeholders.
Preferred Qualifications:
šRelevant certifications, such as Certified Kubernetes Administrator (CKA) or AWS Certified DevOps Engineer. šExperience with monitoring and observability tools (e.g., Datadog, New Relic, Prometheus, Grafana, ELK Stack). šFamiliarity with agile methodologies and experience working in an Agile/Scrum environment.