About the company
OpenAIās mission is to ensure that general-purpose artificial intelligence benefits all of humanity. Our Communications team is composed of PR/Media Relations, Events, Design, and other external-facing functions. The teamās ethos is to support OpenAI's mission and goals by clearly and authentically explaining our technology, values, and approach to safely building powerful AI. The Events team is a dynamic group dedicated to crafting extraordinary experiences that encompass our company's values and mission. Our team is driven by a passion for bringing people together to connect in meaningful ways.
Job Summary
Responsibilities
šDesign, develop, and maintain diagnostics software infrastructure for automating multi-node system testing at scale. šBuild and integrate diagnostic test tools for both at-scale automation and manual lab debugging. šCollaborate with hardware teams and external vendors to define and build diagnostics and qualification software for compute infrastructure. šHelp with bringups and lab debugging using your tooling. šLead and define diagnostics software requirements with vendors, review deliverables, and manage third-party software integration. šMonitor vendor software development, ensuring adherence to technical standards, timelines, and quality metrics. šCollaborate cross-functionally with hardware, system architecture, power, and cooling teams. šSupport integration and readiness of compute systems through scalable software solutions. šEnable the utmost reliability and health of our serving machines by ensuring systems entering and operating in our fleet are healthy thanks to rigorous diagnostics testing.
Qualifications
šExperience with building fleet-level system test automation and diagnostic systems. šStrong Python, C, C++, and Linux shell scripting skills. Rust experience is also a big plus. šKnowledge of hardware subsystems in compute environments and the common failure-modes that need to be tested or debugged. šExperience working with hardware engineers to understand their test and diagnostics requirements and building associated tooling. šProven track record defining and managing vendor relationships and deliverables.
The crypto industry is evolving rapidly, offering new opportunities in blockchain, web3, and remote crypto roles ā donāt miss your chance to be part of it.