About the company
Livepeer is a decentralized video infrastructure network built for developers.
Job Summary
We are looking for someone who cares about the reliability of the infrastructure as much as we do. You will ensure the final product is high quality and works as intended.
RESPONSIBILITIES
Be on an on-call (PagerDuty) rotation to respond to incidents that impact livepeer availability, and provide support for service engineers with customer incidents. Use your on-call shift to prevent incidents from ever happening. Run our infrastructure with Chef, Ansible, Terraform, Github CI/CD, and Kubernetes. Build monitoring that alerts on symptoms rather than on outages. Document every action so your findings turn into repeatable actions and then into automation. Improve operational processes (such as deployments and upgrades) to make them as boring as possible. Design, build and maintain core infrastructure that enables livepeer scaling to support hundreds of thousands of concurrent users. Debug production issues across services and levels of the stack. Plan the growth of livepeer infrastructure.
DESIRED SKILLS
Think about systems: edge cases, failure modes, behaviours, specific implementations. Know your way around Linux and the Unix Shell. Know what is the use of configuration management systems like Chef and Ansible. Have strong programming skills: Shell, Python and/or Go. Have an urge to collaborate and communicate asynchronously. Have an urge to document all the things so you don't need to learn the same thing twice. Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it. Have an urge for delivering quickly and effectively, and iterating fast. Have experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies Ability to use github
INTERVIEW PROCESS
Initial Phone Interview First and Second Round Interview Reference Checks & Verbal Offer Official Offer Letter Signing & Onboarding Process