LongCat AI – Next-Generation Multi-Modal Models
Open-source MoE LLMs by Meituan: Flash-Chat, Flash-Thinking, Video, Audio-Codec, and Omni. Fast, efficient, and production-ready.
LongCat-Flash-Omni (November 2025)
First open-source real-time all-modality interaction model. Omni unifies text, image, audio, and video with a single end-to-end ScMoE backbone, enabling low-latency, streaming multi-modal understanding and generation with up to 128K context and robust multi-turn, long-horizon dialogue.
Modalities
- Text: instruction following, reasoning, coding
- Image: VQA, fine-grained recognition, OCR
- Audio: speech understanding, streaming ASR
- Video: temporal reasoning, event grounding
Architectural Highlights
- Unified ScMoE: single trunk, expert routing across modalities
- MDP training: modality-decoupled parallel schedule
- Progressive fusion: curriculum for multi-modal alignment
Performance
- Omni-Bench: open-source SOTA
- WorldSense: open-source SOTA
Key Capabilities
- Real-time: low-latency interactive streams (audio/video)
- Long-context: 128K tokens, multi-turn long-session memory
- Tool-use: agentic calls across modalities
Applications
- Multi-modal assistants and voice agents
- Visual Q&A and scene understanding
- Real-time AI video customer support
Model Series
Flash-Chat
Foundation dialogue model (560B params, MoE). Achieves 100+ tokens/s on H800 GPUs with ~27B active params/token.
Released: Sept 1, 2025
Flash-Thinking
Enhanced reasoning with dual-path framework. 64.5% token savings in agentic scenarios.
Released: Sept 22, 2025
Key Highlights
- High-throughput inference: 100+ tokens/s on H800 GPUs
- Zero-Computation Experts: Activates only ~27B params/token from 560B pool
- Extended context: Up to 128K tokens
- Open-source SOTA: Leading performance on Omni-Bench, WorldSense, MMLU, and more
- Production-ready: Deployed across Meituan's services