Tailored for e-commerce, accelerating new growth opportunities
Hyper-personalized search and discovery for every user journey
Acquire high-value customers with efficiency and precision
Transform content creation with generative intelligence
Our proprietary, high-performance LLM inference framework is purpose-built for the DeepSeek model family. Featuring advanced system integration—PD separation, EPLB (priority scheduling), DeepEP (parallel execution), and DeepGEEM (granular memory management)—we deliver over 50% throughput gains and halve end-to-end latency in multi-GPU, multi-node environments. This robust foundation enables scalable, real-time AI deployment for mission-critical business scenarios.
ByteArk is a pioneering technology company specializing in AI infrastructure and enterprise solutions, headquartered in Hangzhou Future Science City. We focus on LLM inference optimization, industry-grade AI applications, and high-performance GPU computing, building a unified platform that bridges foundational computing and business innovation.
Driven by an engineering-first culture, over 70% of our team are technical experts from top global universities and Fortune 500 tech leaders. ByteArk delivers reliable, scalable AI compute to clients worldwide. Recognized as a National High-Tech Enterprise and Zhejiang Provincial Innovation Leader, we hold 100+ patents and software copyrights, rapidly expanding our global AI infrastructure footprint.
Create tenfold value, take modest returns, give back to society
'Entrepreneurship is like sailing: you need a distant destination, but must also discover new islands for resources along the way.' CEO David, a serial entrepreneur, founded ByteArk in 2018 after a career as an IT engineer in the semiconductor industry, leading smartphone projects.
A visionary entrepreneur of the 1980s generation. Founded multiple global businesses with annual revenues exceeding $20M. Early blockchain pioneer since 2014, specializing in trading and capital management. In 2018, founded ByteArk, managing over $40M in revenue and $100M+ in digital assets.
专注于 推理执行阶段 本身的效率与执行路径优化,包括 Prefill/Decode 阶段的解耦、缓存调度、采样优化等。
1. 负责 LLM 推理系统的执行路径、资源调度与通信模块的系统级优化; 2. 设计并实现支持大规模多卡部署的调度执行架构,提升系统吞吐能力; 3. 优化通信链路与数据传输,减少跨节点通信延迟与带宽瓶颈; 4. 推进混合精度策略(如 FP16、BF16、INT8)在推理框架中的高效应用; 5. 支持并推动开源或自研推理框架(如 vLLM、SGLang)在系统层的深度性能演进。 职位要求: 1. 本科及以上学历,计算机科学、人工智能、软件工程或相关专业; 2. 熟悉主流推理框架,具备 vLLM、SGLang、TensorRT-LLM 等推理框架的优化经验者优先; 3. 熟悉通信优化,具备 NCCL、NVSHMEM、RDMA 等通信库的使用经验,了解通信开销的优化方法; 4. 理解资源管理机制,熟悉任务调度、并发控制、NUMA 架构、CPU/GPU 亲和性优化等系统层优化手段; 5. 具备系统级性能瓶颈分析能力,能够跨模块主导复杂性能问题的定位与解决,推动整体性能优化闭环。
关注推理框架本身的底层基础设施与系统结构,如资源分配、跨节点通信、GPU 编排、混合精度计算等。
1. 负责 LLM 推理系统的执行路径、资源调度与通信模块的系统级优化; 2. 设计并实现支持大规模多卡部署的调度执行架构,提升系统吞吐能力; 3. 优化通信链路与数据传输,减少跨节点通信延迟与带宽瓶颈; 4. 推进混合精度策略(如 FP16、BF16、INT8)在推理框架中的高效应用; 5. 支持并推动开源或自研推理框架(如 vLLM、SGLang)在系统层的深度性能演进。 职位要求: 1. 本科及以上学历,计算机科学、人工智能、软件工程或相关专业; 2. 熟悉主流推理框架,具备 vLLM、SGLang、TensorRT-LLM 等推理框架的优化经验者优先; 3. 熟悉通信优化,具备 NCCL、NVSHMEM、RDMA 等通信库的使用经验,了解通信开销的优化方法; 4. 理解资源管理机制,熟悉任务调度、并发控制、NUMA 架构、CPU/GPU 亲和性优化等系统层优化手段; 5. 具备系统级性能瓶颈分析能力,能够跨模块主导复杂性能问题的定位与解决,推动整体性能优化闭环。