Documentation & Quick Start
Get started with LongCat AI models
Quick Start
LongCat-Flash uses a chat template defined in tokenizer_config.json. Examples:
First Turn
[Round 0] USER:{query} ASSISTANT:
With System Prompt
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
Multi-Turn
SYSTEM:{system_prompt} [Round 0] USER:{q} ASSISTANT:{r} ... [Round N-1] USER:{q} ASSISTANT:{r} [Round N] USER:{q} ASSISTANT:
Tool Call Envelope
{tool_description}
## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
<longcat_tool_call>{"name": <function-name>, "arguments": <args-dict>}</longcat_tool_call>
Deployment
Flash-Chat & Flash-Thinking
SGLang and vLLM adaptations enable high-throughput inference for LongCat-Flash models. Deployment guides cover environment setup, tensor parallelism, and inference configurations. Supports both single-user and multi-user scenarios with cost-efficient inference around $0.7 per 1M output tokens on H800 GPUs.
Video Generation
LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation tasks. Optimized for generating long-form videos (up to 5 minutes) with high temporal consistency and physical motion plausibility.
License & Usage
All LongCat models are released under the MIT License, allowing model distillation, fine-tuning, and secondary development. Evaluate and validate the models before use in sensitive or high-risk scenarios, and ensure compliance with applicable laws and regulations for your use case.