About the company
Hex Trust is fully licenced, insured and Asia's leading digital asset custodian. Led by veteran banking technologists and award-winning financial services experts, Hex Trust has built a proprietary bank-grade platform ā Hex Safe ā that delivers a custody solution for banks, financial institutions, asset managers, exchanges, and corporations. Through Hex Safe, clients can access liquidity providers, exchanges and lending & staking platforms, enabling seamless access to services while assets are held in our highly secure and regulated platform. Hex Trust has offices in Hong Kong and Singapore and is expanding to the European market during 2021. Hex Trust is a registered Trust Company under the Hong Kong Trust Ordinance and holds a Trust or Company Service Provider (TCSP) license under the Anti-Money Laundering and Counter-Terrorist Financing Ordinance.
Job Summary
Position Summary
šResponsible for monitoring the digital asset custody technology stack to ensure the stack is operational 24x7, stable and has sufficient capacity. Provide timely root cause analysis of platform and application incidents, involving the key resources as needed, so we can quickly restore normal operation. šBuilding out automated solutions for complex operational problems using industry best practice and cloud native technologies. Actively seek out innovative solutions towards operational excellence and coordinate proactively with development, operations, and the wider platform team to improve system availability, security, performance, and maintainability.
Duties & Responsibilities
šEnsure continuous, scalable and robust operation of our production environments. šCollaborate closely with the development teams in a fast-paced delivery environment to foster SRE mindset as part of software development process. šCodify and rollout shared tooling and process/service to enable development teams continuously delivery new features while improving non-functional requirements such as system availability, security, performance, and maintainability. šWorking with development and DevOps to ensure the appropriate level of component redundancy and infrastructure capacity is in place. šProactively analyse events and provide ongoing recommendations to incorporate process improvements to prevent service impacting incidents. šManage Level 1 through to Level 3 production support. š2+ years in SRE or Production Support within experience in the financial industry, or managing business critical systems requiring 99.9% uptime. šProven background in hands-on production / application support. šExpert proficiency in Linux, Shell Scripting, Monitoring and Instrumentation tools. šComfortable working with developer tools such as GIT, JIRA and Notion. šFamiliar with Node JS, MariaDB/MySQL and managing automated build/deployment pipelines (e.g. GitLab, Ansible).