LongCat Models

Comprehensive overview of all LongCat AI model variants and their capabilities.

Model Variants

LongCat-Flash-Chat

Released: September 1, 2025

Foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. Activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts.

Supports up to 128K context length
Achieves 100+ tokens/s on H800 GPUs
Strong instruction following, reasoning, and coding

Learn More →

LongCat-Flash-Thinking

Released: September 22, 2025

Enhanced reasoning model focusing on "Agentic Reasoning" and "Formal Reasoning". Features a dual-path reasoning framework and DORA asynchronous training system.

64.5% token savings in tool-call scenarios
Improved tool-call efficiency
Formal and agentic reasoning capabilities

Learn More →

LongCat-Video

Released: October 27, 2025

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks.

Generates 5-minute coherent videos at 720p/30fps
Long temporal sequences and cross-frame consistency
Physical motion plausibility

Learn More →

LongCat-Flash-Omni

Released: November 2025

First open-source real-time all-modality interaction model. Unifies text, image, audio, and video with a single end-to-end ScMoE backbone.

Open-source SOTA on Omni-Bench and WorldSense
Low-latency, streaming multi-modal IO
128K context with multi-turn dialogue

Learn More →

LongCat-Image

Released: Latest | Parameters: 6B

Open-source AI image generation and editing model. Achieves open-source SOTA on image editing benchmarks (GEdit-Bench, ImgEdit-Bench) and leading performance in Chinese text rendering (ChineseWord: 90.7). Covers all 8,105 standard Chinese characters.

Image editing: Open-source SOTA (ImgEdit-Bench 4.50, GEdit-Bench 7.60/7.64)
Chinese text rendering: 90.7 on ChineseWord, covering all 8,105 characters
Text-to-image: GenEval 0.87, DPG-Bench 86.8
Available on LongCat Web and LongCat APP (24 templates, image-to-image)
Fully open-source: Hugging Face | GitHub

Learn More →

LongCat-Audio-Codec

Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs, enabling efficient audio encoding and decoding.

Model Comparison

Model	Parameters	Key Feature	Use Case
Flash-Chat	560B (MoE)	High-throughput dialogue	General conversation, coding
Flash-Thinking	560B (MoE)	Enhanced reasoning	Tool use, formal reasoning
Video	DiT-based	Video generation	Text/image-to-video, continuation
Flash-Omni	ScMoE	All-modality	Multi-modal interaction
Image	6B (MM-DiT+Single-DiT)	Image generation & editing (Open-source SOTA)	Text-to-image, image editing, Chinese text rendering

Get Started

Choose a model to explore detailed documentation, benchmarks, and deployment guides:

View Documentation Compare Benchmarks