About the company

dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world’s fastest-growing, emerging markets. By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote-first dynamic culture with travel, health and learning benefits, among others. Being a part of dLocal means working with 1000+ teammates from 30+ different nationalities and developing an international career that impacts millions of people’s daily lives. We are builders, we never run from a challenge, we are customer-centric, and if this sounds like you, we know you will thrive in our team.

Job Summary

What will you do?

📍Own OpenTelemetry Pipelines: Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. 📍Optimize ingestion strategies to balance cost, performance, and usability. 📍Empower Engineering Teams: Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. 📍Drive adoption of best practices while ensuring teams own their telemetry. 📍Support Incident Management: Be the Engineering side of our Incident Management Team, designing the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident. 📍Collaborate Across Teams: Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development. 📍Automate Observability Infrastructure: Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines. 📍Define Baseline Observability Standards: Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level. 📍Own Technical and Security Health: Take full ownership of dLocal’s infrastructure reliability, ensuring adherence to key availability and security KPIs. 📍Optimize Alerting Systems: Continuously refine alerting signals to minimize noise and ensure them are always actionable, reducing fatigue and improving response efficiency.

Which skill do you need?

📍Over 4 years’ of experience as SRE Engineer or in a very similar role more focused on observability. 📍Expertise in Kubernetes, including its core components, deployment methodologies, and monitoring best practices. 📍Some understanding of OpenTelemetry, including setting up OTEL collectors, instrumentation, and pipeline optimization. 📍Proficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog. 📍Hands-on experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar). 📍Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows.

If this role isn’t the perfect fit, there are plenty of exciting opportunities in blockchain technology, cryptocurrency startups, and remote crypto jobs to explore. Check them on our Jobs Board.

Site Reliability Engineer, Technical Referent

About the company

Job Summary

What will you do?

Which skill do you need?

Salaries for similar jobs:

Similar jobs