Technology

Core innovations powering LongCat AI models

Key Technologies

Zero-Computation Experts

Smart MoE routing mechanism that activates only 18.6B–31.3B parameters per token (averaging ~27B) from a 560B parameter pool, achieving cost efficiency while maintaining competitive quality.

Shortcut-connected MoE (ScMoE)

Overlaps computation and communication for speed at scale, reducing latency. Enables unified expert routing across modalities in Omni models.

DORA Training System

Dynamic Orchestration for Asynchronous rollout enables efficient large-scale training across domains. Successfully trained on >20T tokens in ~30 days across large GPU clusters.

Dual-Path Reasoning Framework

Combines agentic tool use with formal reasoning for enhanced problem-solving capabilities. Featured in Flash-Thinking model.

Modality-Decoupled Parallel (MDP) Training

Training schedule that enables efficient multi-modal learning by decoupling different modalities during training. Used in Omni model.

Progressive Multi-Modal Fusion

Curriculum learning approach for multi-modal alignment, gradually integrating different modalities during training.

Training Innovations

  • Hyperparameter transfer: Efficient model scaling
  • Model-growth initialization: Progressive capacity expansion
  • Variance alignment: Training stability
  • Router balancing: Optimal expert utilization

Architecture Highlights

  • MoE Architecture: 560B parameters with efficient routing
  • High-throughput inference: 100+ tokens/s on H800 GPUs
  • Extended context: Up to 128K tokens
  • Multi-modal support: Unified architecture for text, image, audio, video