About the company
About OKX OKX is a leading crypto trading app, and a Web3 ecosystem. Trusted by more than 20 million global customers in over 180 international markets, OKX is known for being the fastest and most reliable crypto trading app of choice for investors and professional traders globally. Our Singapore office is a Product and Engineering hub and we are in the progress of expanding our teams in Singapore for the continuous growth of our global business. We build and maintain core trading platform with millions of daily active users. Design, Product and Engineering teams work cross-functionally to identify customer needs, and ship high-quality new features through fast iterations.
Job Summary
Responsibilities:
📍Platform Core: Design and operate large-scale distributed data systems 📍Own the big data compute and storage infrastructure (MaxCompute/ODPS, Hologres, Spark) 📍Build and maintain multi-site task orchestration that dynamically selects engines and enforces policy 📍Drive reliability and performance improvements across batch and real-time pipelines 📍AI Integration: Build the AI-native platform layer 📍Develop and expose MCP (Model Context Protocol) tool interfaces so AI agents can interact with platform APIs 📍Build the scheduling and cost-optimization agents that auto-tune resource allocation and alert severity 📍Instrument platform telemetry to feed AI-driven SLA monitoring and anomaly detection 📍Design context retrieval pipelines (RAG / vector search) for SQL code and config knowledge bases 📍Tooling & DX: Evolve the developer experience 📍Own the internal data development platform — IDE integrations, code review automation, deployment tooling 📍Build APIs-first tools (backfill, ingestion automation) designed for future MCP integration 📍Collaborate with data warehouse and service teams to define platform contracts 📍Ops & Governance: Drive operational excellence 📍Establish SLA benchmarks, cost metrics, and latency dashboards as AI optimization targets 📍Build automated incident response and root-cause analysis pipelines 📍Define and enforce infrastructure policies across multi-cloud environments
Requirements:
📍5+ years of experience building large-scale data platforms (Hadoop/Spark/Flink or equivalent) 📍Deep expertise in distributed storage and compute systems (MaxCompute, Hologres, ClickHouse, Hive) 📍Strong software engineering skills in Java, Scala, or Python; experience with API-first design 📍Hands-on experience with task scheduling systems (Airflow, DolphinScheduler, or in-house equivalents) 📍Solid understanding of multi-cloud architectures and cost governance 📍Familiarity with LLM integration patterns: tool calling, RAG pipelines, context management 📍Experience with MCP or similar agent-tool frameworks is a strong plus 📍Passion for building systems that make other engineers 10x more productive
Looking for your next challenge? The world of crypto offers exciting roles in blockchain development, web3 innovations, and remote opportunities.


