Qwen3 Release and Overview: How it is Different From Other LLM Models

April 29, 2025

5 Mins Read

Got it! I will gather the latest information about the Qwen 3 AI release, focusing specifically on its features, capabilities, and any official announcements. I’ll summarize the key points for you shortly.

Qwen3 Release and Overview

In late April 2025, Alibaba’s DAMO Academy announced Qwen3, the third-generation large language model in its open-source Tongyi Qwen series . Qwen3 was officially released on April 29, 2025 and is made available under a permissive (Apache 2.0) open-source license . The Qwen3 family includes eight models ranging from 0.6 billion to 235 billion parameters .

Notably, Qwen3 offers two Mixture-of-Experts (MoE) variants – a flagship Qwen3-235B-A22B (235B total, 22B active parameters) and a smaller Qwen3-30B-A3B (30B total, 3B active) – alongside six dense (non-MoE) models (32B, 14B, 8B, 4B, 1.7B, and 0.6B parameters) . Most models (larger ones) support extremely long context windows (up to 128K tokens), enabling them to process lengthy inputs effectively.

Key Features and Capabilities

Hybrid “Thinking” Modes: Qwen3 introduces two inference modes. In Thinking Mode, the model performs step-by-step reasoning (chain-of-thought) before answering complex queries, while in Non-Thinking Mode it gives immediate, concise responses for simpler tasks . Users can toggle or budget these modes per task, balancing accuracy against latency or compute cost.
Multilingual Support: The models are trained on data from 119 languages and dialects, covering all major world languages. This extensive multilingual training enables Qwen3 to handle international use cases, from Chinese and English to many low-resource languages.
Agentic and Coding Abilities: Qwen3 is optimized for coding tasks and “agent” (tool-using) scenarios. Alibaba notes enhanced support for function-calling and an internal Model Context Protocol (MCP) for integrating external tools. The models demonstrate strong performance on coding benchmarks and complex reasoning tests. In practice, Qwen3 excels at programming assistance, mathematical reasoning, logic puzzles, and instruction-following.
Extended Context Length: To handle long documents, Qwen3 was trained with very long contexts (up to 32K tokens in training, with deployment supporting up to 128K tokens for large models). This far exceeds most prior open LLMs and enables applications like long-form document analysis or book-length summaries.
Open-Source and Integration: All Qwen3 models are released under Apache-2.0 license. Pretrained weights are available on Hugging Face, GitHub, and Alibaba’s ModelScope. Developers can fine-tune or deploy them freely. Alibaba also offers Qwen3 via cloud APIs (Alibaba Cloud) and integrates it into its Tongyi AI app, allowing users immediate access to the model’s capabilities.

Comparison Table

Here’s a comparison table of key features and capabilities across Qwen3, DeepSeek-VL, ChatGPT (GPT-4-turbo), and LLaMA 3 (Meta’s latest open model):

Feature / Model	Qwen3 (Alibaba)	DeepSeek-VL (DeepSeek AI)	ChatGPT (GPT-4-turbo) (OpenAI)	LLaMA 3 (Meta)
Release Date	April 2025	April 2024 (DeepSeek-VL 1.5)	Nov 2023 (GPT-4-turbo); Ongoing updates	April 2024
Model Type	Open-source (Dense + MoE)	Open-source (Multimodal LLM)	Closed-source, API only	Open-source (Dense only)
Model Sizes	0.5B → 235B (8 sizes; MoE & Dense)	1.3B → 67B (VL 1.5)	Single model (GPT-4-turbo, est. 1.5T MoE)	8B, 70B
MoE Support	Yes (Qwen3-235B-A22B, Qwen3-30B-A3B)	No (Dense only)	Yes (GPT-4-turbo uses MoE architecture)	No
Max Context Length	Up to 128K tokens (Qwen3-72B/235B)	32K tokens	Up to 128K tokens	8K (default); extended in some variants
Multilingual Support	119 languages & dialects	Supports multilingual + vision inputs	Strong multilingual (supports 50+ languages)	Multilingual, less emphasis than Qwen/ChatGPT
Vision Support	Qwen-VL (separate model)	Yes (DeepSeek-VL handles vision + text)	Yes (GPT-4V)	Planned but not yet released
Fine-tuning License	Apache 2.0 (fully open)	Open (likely Apache or similar)	Not available for fine-tuning	Apache 2.0 (fully open)
Tool Use / Function Calling	Yes, with internal MCP protocol	Limited tool use via API integration	Yes (via OpenAI Function Calling & Tools)	Experimental (via third-party frameworks)
Agent Capabilities	Yes – strong reasoning & code agents supported	Limited agent-style behavior	Yes – integrated with OpenAI Assistants	Not directly built-in
Coding Capabilities	Strong (Codeforces, LiveCodeBench)	Good, especially in DeepSeek-Coder variant	Strong (Codex base, powers GitHub Copilot)	Strong (LLaMA 3-70B competes on HumanEval)
Math & Logic Reasoning	Excellent – top-tier on AIME, BFCL	Good, especially with CoT prompting	Strong (GPT-4 excels at math/logic tasks)	Moderate to Strong (varies by size)
Training Data Volume	36T tokens (Qwen3), multilingual + code/math enriched	2T+ tokens (DeepSeek)	Not disclosed (very large; >15T est.)	15T+ tokens (public + licensed data)
Chain-of-Thought (CoT)	Hybrid mode (Thinking vs Non-Thinking toggle)	Supports CoT-style prompts	Yes (CoT and auto CoT in GPT-4-turbo)	Yes (manual prompting)
Deployment Options	Hugging Face, ModelScope, Alibaba Cloud, local	Open download, API via DeepSeek	API only (OpenAI, Microsoft Azure, etc.)	Hugging Face, Meta-hosted, local

Training Data and Methods

Qwen3 was pretrained on an extremely large corpus (~36 trillion tokens) – roughly twice the data used for its predecessor Qwen2.5 (18T). The data collection combined web text, extracted text from documents (using earlier Qwen2.5-VL), and synthetic math/code data (generated by Qwen2.5 specialized models). Training proceeded in stages: an initial phase on 30T tokens with moderate context length to build basic language skills, a second phase with +5T tokens enriched in STEM, coding and reasoning content, and a final long-context stage using high-quality data to extend context to 32K tokens.

After pretraining, the team applied a multi-stage fine-tuning pipeline to imbue the hybrid reasoning modes. This included: (1) fine-tuning with long chain-of-thought (CoT) examples to bootstrap reasoning, (2) reinforcement learning (RL) with rule-based rewards to refine exploration, (3) fusing the “thinking” and “non-thinking” behaviors by training on mixed CoT and instruction data, and (4) general RL across diverse tasks to improve alignment and robustness. These techniques produce a model that can dynamically switch between deep reasoning and quick answers.

Because of architectural and data improvements, even relatively small Qwen3 models match or exceed earlier models. For example, Qwen3-1.7B performs as well as Qwen2.5-3B, and Qwen3-4B matches Qwen2.5-7B, with outsized gains on math and code tasks. Likewise, the MoE models achieve similar capabilities to much larger dense models while using only ~10% of the parameters in any one pass, greatly saving compute during inference. Alibaba reports that the 235B model can run at full power on just four high-end GPUs (e.g. four H100s), using about one-third the memory of comparable models.

Performance Benchmarks

Alibaba has released internal benchmark comparisons showing that Qwen3 leads the open-model field on many tasks. For instance, the flagship Qwen3-235B-A22B model narrowly outperforms OpenAI’s o3-mini and Google’s Gemini 2.5 Pro on Codeforces (programming contest) challenges, and beats o3-mini on hard math (AIME) and reasoning (BFCL) benchmarks. In one math olympiad test (AIME25), Qwen3-235B scored 81.5 – the highest of any open model reported. It also surpasses meta’s Grok-3 on coding (LiveCodeBench) and exceeds both OpenAI’s o1 and Chinese startup DeepSeek R1 on the ArenaHard reasoning test.

The largest publicly available Qwen3 model, Qwen3-32B (dense), also ranks highly: it outperforms OpenAI’s o1 on several coding and reasoning benchmarks. In short, Qwen3’s training yields top-tier ability in coding assistance, math/problem-solving, instruction following, and other advanced tasks – on par with or exceeding contemporary open LLMs.

Applications and Availability

The Qwen3 series is designed for a wide range of AI applications. It excels at tasks like coding assistance, logic reasoning, math and science problem-solving, complex question answering, translation, and multi-turn agentic interactions. For example, Alibaba’s consumer AI app (“Tongyi”) immediately integrated Qwen3: users can employ it for logic puzzles, programming help, document translation, photo-based Q&A, and more. Its long-context ability also enables handling large documents or multi-step plans.

Because Qwen3 is fully open-source (Apache-2.0), developers worldwide can download the models (from Hugging Face, ModelScope, Kaggle, etc.) and use them commercially for free. Companies can deploy Qwen3 in cloud services – indeed, the models are offered via Alibaba Cloud’s AI platform and by third-party hosts (e.g. Fireworks AI, Hyperbolic) – or run them on-premise. The MoE variants provide a cost-effective path for large-scale AI services, while the smaller dense models allow easy integration into consumer apps and mobile tools. With support for 119 languages and strong tool-calling, Qwen3 aims to power everything from enterprise chatbots and code generators to educational and creative AI agents.

Anas Hasan

April 29, 2025

7 months ago

Anas Hassan is a tech geek and cybersecurity enthusiast. He has a vast experience in the field of digital transformation industry. When Anas isn’t blogging, he watches the football games.

Have Your Say!!