Documentation
Deploy, integrate, and troubleshoot LongCat open-source models
Production Deployment
LongCat-Flash MoE models (560B) require vLLM or SGLang with tensor and expert parallelism. Smaller models like Flash-Lite can run on fewer GPUs via Transformers.
- Deploy with vLLM — single-node FP8, multi-node BF16, MTP
- Deploy with SGLang — EP-MoE, flashinfer, MTP
Hardware: FP8 Flash-Chat needs ≥1× 8-GPU node (e.g. 8× H20); BF16 needs ≥2 nodes (e.g. 16× H800).
Multimodal Models
Video Generation
LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation. Optimized for long-form videos (up to 5 minutes) with high temporal consistency.
Digital Human Video (Video-Avatar 1.5)
LongCat-Video-Avatar 1.5 is the commercial-grade open-source digital human model. Supports AT2V, ATI2V, and video continuation.
- Whisper-large audio encoding for precise lip sync
- DMD 8-step inference: ~15× faster; shared base + LoRA adapters
- Multi-person & open-domain: humans, anime, animals
- Long-video stability: Cross-Chunk Latent Stitching
Resources: Model page | GitHub | Hugging Face
Image Generation & Editing
LongCat-Image powers generation on LongCat Web with superior Chinese text rendering. See model page and Chinese text generation guide.
License & Usage
All LongCat models are released under the MIT License, allowing distillation, fine-tuning, and secondary development. Evaluate models before use in sensitive or high-risk scenarios and ensure compliance with applicable laws.