About the company
Jump Trading Group is committed to world class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incenting collaboration and mutual respect. At Jump, research outcomes drive more than superior risk adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.
Job Summary
What You'll Do:
šDesign, implement, maintain, and support high performance compute and storage systems. šImplement and support performance monitoring and fault monitoring systems. šMonitor systems and storage performance, up to and including network components. šBuild tooling to compile, package, install, and upgrade software and operating system components at scale. šWrite code to automate frequently performed tasks. šCollaborate directly with researchers to optimize their use of HPC infrastructure. šDevelop and improve systems and user documentation. šDevelop and monitor the tools used to maintain a production computing environment. šProvide operational support on a rotating basis and as needed. šOther duties as assigned or needed.
Skills You'll Need:
šAt least 5+ years of professional experience in high performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience is a plus, but not required šAt least 5+ years of experience with Linux systems administration High proficiency with at least one programming/scripting language (e.g., Go, Python, C) šExtensive experience designing, building, and maintaining complicated, interdependent, and distributed systems šExtensive experience profiling and debugging application stacks (debuggers and profilers) šExperience with system configuration management tools (SaltStack, Ansible, Puppet, etc.) šA compulsion to perform root cause analysis šReliable and predictable availability