Documentation

Deploy, integrate, and troubleshoot LongCat open-source models

Production Deployment

LongCat-Flash MoE models (560B) require vLLM or SGLang with tensor and expert parallelism. Smaller models like Flash-Lite can run on fewer GPUs via Transformers.

Hardware: FP8 Flash-Chat needs ≥1× 8-GPU node (e.g. 8× H20); BF16 needs ≥2 nodes (e.g. 16× H800).

Multimodal Models

Video Generation

LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation. Optimized for long-form videos (up to 5 minutes) with high temporal consistency.

Digital Human Video (Video-Avatar 1.5)

LongCat-Video-Avatar 1.5 is the commercial-grade open-source digital human model. Supports AT2V, ATI2V, and video continuation.

  • Whisper-large audio encoding for precise lip sync
  • DMD 8-step inference: ~15× faster; shared base + LoRA adapters
  • Multi-person & open-domain: humans, anime, animals
  • Long-video stability: Cross-Chunk Latent Stitching

Resources: Model page | GitHub | Hugging Face

Image Generation & Editing

LongCat-Image powers generation on LongCat Web with superior Chinese text rendering. See model page and Chinese text generation guide.

License & Usage

All LongCat models are released under the MIT License, allowing distillation, fine-tuning, and secondary development. Evaluate models before use in sensitive or high-risk scenarios and ensure compliance with applicable laws.