About the company
The DFINITY Foundation is a major contributor to the Internet Computer blockchain.
Job Summary
What You’ll Do
📍Own Product Reliability: Take ownership of the availability and reliability of the Caffeine.ai platform. You'll define our Service Level Objectives (SLOs), provide a reliable Continuous Delivery (CD) platform and work across teams to meet and exceed them. 📍Build Deep Product Insight: Design, implement, and manage our observability stack (Datadog, Opentelemetry, distributed tracing, logs, metrics) to provide high-fidelity signals into the health of our services and, most importantly, the user experience. 📍Engineer Scalable Solutions: Dive deep into our architecture to identify and eliminate performance bottlenecks, single points of failure, and sources of toil. You'll write code—primarily in Rust, Go and Typescript (we use Pulumi)—to automate operations and build robust, self-healing systems. You will setup routing and service mesh configurations (e.g. Istio). 📍Champion Reliability from Day One: Partner with software engineers during design and code reviews to proactively bake in reliability, scalability, and operability. You will be the expert voice that helps the team build for production from the start. 📍Lead and Learn from Incidents: Coordinate the incident response process for our production services. You'll lead blameless post-mortems that drive meaningful improvements across our systems and processes. 📍Participate in an On-Call Rotation: As a key member of the team, you will be part of a compensated on-call rotation focused on coordinating incident response and ensuring platform stability.
Who You Are
📍You are a product-minded engineer with proven experience as a Site Reliability Engineer, with a strong focus on user-facing applications and distributed service architectures. 📍You have deep expertise in building and running modern observability stacks (e.g., Datadog, Opentelemetry) and believe in data-driven decision-making. 📍You are a proficient software developer. You have experience designing and writing production-grade applications and automation, ideally in a systems and infra language like Rust or Go, and are open to use Python, Typescript or Bash.
The future of finance is here — whether you’re interested in blockchain, cryptocurrency, or remote web3 jobs, there’s a perfect role waiting for you.

.jpg?1700169058)



