Documentation & Quick Start

Getting Started

LongCat-2.0

LongCat-2.0 (1.6T MoE, 1M context) is the flagship model for agentic coding and enterprise agents. Access via longcat.ai, OpenRouter, or the LongCat API platform. Self-hosted deployment documentation will be added when weights are published.

Flash MoE Models

LongCat-Flash MoE models (560B) require vLLM or SGLang with tensor and expert parallelism. Smaller models like Flash-Lite can run on fewer GPUs via Transformers.

Hardware: FP8 Flash-Chat needs ≥1× 8-GPU node (e.g. 8× H20); BF16 needs ≥2 nodes (e.g. 16× H800).

Multimodal Models

Video Generation

LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation. Optimized for long-form videos (up to 5 minutes) with high temporal consistency.

Digital Human Video (Video-Avatar 1.5)

LongCat-Video-Avatar 1.5 is the commercial-grade open-source digital human model. Supports AT2V, ATI2V, and video continuation.

Whisper-large audio encoding for precise lip sync
DMD 8-step inference: ~15× faster; shared base + LoRA adapters
Multi-person & open-domain: humans, anime, animals
Long-video stability: Cross-Chunk Latent Stitching

Resources: Model page | GitHub | Hugging Face

Image Generation & Editing

LongCat-Image powers generation on LongCat Web with superior Chinese text rendering. See model page and Chinese text generation guide.

Official Resources

Comparisons & Pricing

License & Usage

All LongCat models are released under the MIT License, allowing distillation, fine-tuning, and secondary development. Evaluate models before use in sensitive or high-risk scenarios and ensure compliance with applicable laws.