About the company
Zeta Markets is the fastest growing perps exchange on Solana, offering unparalleled speed, simplicity and security to let you trade your favourite cryptocurrencies in the click of a button.
Job Summary
Responsibilities
📍Infrastructure Management 📍Design, deploy, and maintain cloud-based infrastructure, leveraging the AWS stack. 📍Utilize tools like Terraform, Cloudformation and Ansible for Infrastructure as Code (IaC) and automation. 📍Maintain and optimize infrastructure components, including cranks, indexers, and trade feeds written in Rust. 📍Observability and Reliability 📍Maintain and enhance the observability stack, including Prometheus, InfluxDB, and Grafana. 📍Establish and manage robust alerting systems and on-call schedules through PagerDuty or equivalent tools. 📍Monitor system performance and implement proactive measures to reduce downtime and improve reliability. CI/CD and Automation 📍Develop and improve CI/CD pipelines to enable automated testing, staging, and production deployments. 📍Ensure secure and efficient management of package versions and deployment processes. 📍Performance Optimization 📍Analyze and optimize systems to reduce latency, manage congestion, and improve overall performance. 📍Design and implement strategies for scaling and distributing systems to handle increasing traffic and demand. 📍Apply expertise in low-latency trading systems and distributed architectures to enhance performance. 📍Collaboration and Leadership 📍Work closely with development teams to support and integrate new features into production. 📍Take ownership of on-call schedules, ensuring proper incident management and response protocols. 📍Provide mentorship and guidance to team members on best practices for infrastructure and DevOps.
Qualifications
📍Must-Have Skills 📍**Technical Expertise:Strong experience with AWS cloud tools and services (e.g., EC2, S3, Lambda, DynamoDB, ECS etc.). Proficiency in Rust, particularly for maintaining backend infrastructure (e.g., cranks, indexers, trade feeds). Hands-on experience with observability tools (Prometheus, Grafana, InfluxDB). Knowledge of CI/CD tools (e.g., GitHub Actions). Proficiency with Infrastructure as Code (IaC) tools like Terraform and Ansible. Strong understanding of distributed systems and low-latency architectures. 📍Operational Skills:**Experience managing on-call schedules and incident response protocols (PagerDuty or similar). Proven ability to reduce system latencies and optimize for high performance. Expertise in deploying and maintaining highly available and reliable systems.