About the company
Interested in working on cutting-edge blockchain technology and creating equitable access to the global financial system? Since 2014, the mission-driven team at the Stellar Development Foundation (SDF) has helped fuel the tremendous growth of the Stellar blockchain network, an open-source platform that operates at high-scale today. Developers and companies around the world build on it, and the SDF team is expanding to support the rapidly growing and changing Stellar ecosystem. The launch of Soroban, the new smart contracts platform designed to work well with Stellar, brings a wealth of opportunity for innovation. When you join SDF as a Community Manager, you will be a part of a team thatās leveraging that opportunity to increase developer participation in order to bootstrap the ecosystem of tools, protocols, dapps, and educational resources necessary for Soroban to succeed.
Job Summary
In this role, you will:
šMaintain, improve, scale and secure our AWS/GCP infrastructure and Linux systems. šAssist our development teams in running, packaging, deploying and troubleshooting applications šWork with developers on streamlining deployment processes with Jenkins and other CI/CD tooling. šBuild, maintain, monitor and improve our Kubernetes clusters. šWork with development teams on migrating applications to Kubernetes. šBe responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, ELK. šMonitor, triage and respond to alerts in our high availability environments. šParticipate in design and code reviews, and ensure that the foundation for our services is best in class. šEvaluate new technologies, design and implement as appropriate. šIdentify automation opportunities and implement by creating custom or by using off the shelf solutions.
You have:
š5+ years of experience of working in cloud-based systems operations, as a SRE or DevOps engineer. šFirst-hand experience with configuration management and infrastructure as code (Ansible, Puppet, Terraform). šProficient in utilizing SRE methodologies like capacity planning and disaster recovery testing to ensure the scalability, resilience, and availability of critical services. šA strong understanding of computer networking, TCP/UDP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.). šExperienced in managing production workloads and skilled in using monitoring tools to detect issues early. šComfortable with participating in on-call rotations and conducting thorough root cause analyses to keep systems running smoothly. šProficiency in at least one programming language. šCommitted to supporting teammates, especially during challenging times, and excited about working in a close-knit, growing team. Approachable, empathetic, and proactive in promoting collaboration and innovation. šExcels in working independently, demonstrating the ability to accomplish tasks without constant monitoring. šProduction experience building and maintaining Kubernetes clusters.