Bitcoin

Bitcoin

$117,333.97

BTC -0.10%

Ethereum

Ethereum

$2,932.74

ETH -1.24%

  • Login
  • Register
Metaverse Media Group
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
Metaverse Media Group

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

The Decoderby The Decoder
12 July 2025
Microsoft has introduced Phi-4-mini-flash-reasoning, a lightweight AI model built for scenarios with tight computing, memory, or latency limits. Designed for edge devices and mobile apps, the model aims to deliver strong reasoning abilities without demanding hardware. The article Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput appeared first on THE DECODER….

summary
Summary

Microsoft has introduced Phi-4-mini-flash-reasoning, a lightweight AI model built for scenarios with tight computing, memory, or latency limits. Designed for edge devices and mobile apps, the model aims to deliver strong reasoning abilities without demanding hardware.

Phi-4-mini-flash-reasoning packs 3.8 billion parameters and builds on the Phi-4 family introduced in December, with a focus on mathematical reasoning.

At the core of the new model is an updated architecture called SambaY, now featuring a Gated Memory Unit (GMU) and “differential attention.” Traditional transformers rely on complex attention in every layer to decide which parts of the input matter most.

The GMU streamlines this by replacing heavy cross-attention operations with a simple element-wise multiplication between the current input and a memory state from an earlier layer. This allows the model to dynamically recalibrate which tokens to focus on without the usual computational overhead.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

Hybrid decoder architecture with Mamba/SWA self-decoder, GMU-supported cross-attention, and linear KV cache prefill.
SambaY combines a full attention layer with gated memory units, reducing cross-attention and speeding up inference. | Image: Microsoft
Recommend our article

SambaY mixes several attention mechanisms: a single full-attention layer creates a key-value cache that later layers can access, while GMUs take the place of about half the cross-attention layers, letting layers share information through lightweight multiplications. This approach slashes both memory use and compute requirements. In typical models, data transfers between memory and processor climb as sequence length grows, but with SambaY, this remains mostly flat.

A new architecture for more efficient reasoning

These architectural changes bring a significant boost in performance. Microsoft says Phi-4-mini-flash-reasoning delivers up to ten times higher throughput and cuts average latency by a factor of two to three compared to its predecessor. However, these results are based on tests with industrial GPUs, not the low-resource devices the model is meant for.

Scatter plot: Inference latency of Phi4-mini-Reasoning and Phi4-mini-Flash-Reasoning over generation lengths up to 32,000 tokens.
Phi-4-mini-flash-reasoning shows much lower latency at 32,000 tokens compared to the standard reasoning model, highlighting the efficiency of the flash method. | Image: Microsoft
Scatter plot: Latency vs. throughput for standard and flash reasoning, flash achieves 10× higher throughput with the same latency.
Flash reasoning boosts throughput by a factor of ten while maintaining the same latency readiness. | Image: Microsoft

Phi-4-mini-flash-reasoning also excels at handling long contexts. The model supports a context window of up to 64,000 tokens and can maintain its speed and performance even at maximum capacity. Microsoft cites the efficiency of the SambaY design, which keeps processing speeds steady even as sequence length increases—a clear advantage over standard transformer models that tend to slow down in these scenarios.

Outperforming larger models in reasoning benchmarks

The flash version stands out in benchmarks. Phi-4-mini-flash-reasoning was trained on five trillion tokens from the same data as Phi-4-mini, including synthetic data, using 1,000 A100 GPUs over 14 days.

In testing, it consistently beat the base model, especially on knowledge-intensive and programming tasks, with performance gains of several percentage points. The model also did better in math and scientific reasoning, all without the resource-heavy reinforcement learning step used in previous versions.

Recommendation
Bar chart: Pass@1 accuracy of six models on AIME24, AIME25, Math-500, and GPQA Diamond, Phi-4-mini-flash-reasoning highest.
Phi-4-mini-flash-reasoning outperforms its base model and, in some cases, even surpasses models twice its size. | Image: Microsoft

Phi-4-mini-flash-reasoning is available on Hugging Face, and Microsoft has released code examples in the Phi Cookbook. The full training codebase is open-sourced on GitHub.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 4 mins read
0
0
21
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

Indeed and Glassdoor are cutting 1,300 jobs citing increased use of AI-driven recruitment
AI

Indeed and Glassdoor are cutting 1,300 jobs citing increased use of AI-driven recruitment

2 hours ago
21
Code is just a lossy projection of intent, according to OpenAI researcher Sean Grove
AI

Code is just a lossy projection of intent, according to OpenAI researcher Sean Grove

3 hours ago
21
Kimi-K2 is the next open-weight AI milestone from China after Deepseek
AI

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

5 hours ago
22

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
Bitcoin Price Prediction Bonanza: $10M Wagered on Moonshots, $200K in Sight
Crypto

Bitcoin Price Prediction Bonanza: $10M Wagered on Moonshots, $200K in Sight

Bitcoin.com News
by Bitcoin.com News
21 minutes ago
19
Solana’s Pump.fun Nets $500M as PUMP Token ICO Sells out in 12 Minutes
Crypto

Solana’s Pump.fun Nets $500M as PUMP Token ICO Sells out in 12 Minutes

Bitcoin.com News
by Bitcoin.com News
2 hours ago
21
Indeed and Glassdoor are cutting 1,300 jobs citing increased use of AI-driven recruitment
AI

Indeed and Glassdoor are cutting 1,300 jobs citing increased use of AI-driven recruitment

The Decoder
by The Decoder
2 hours ago
21
Code is just a lossy projection of intent, according to OpenAI researcher Sean Grove
AI

Code is just a lossy projection of intent, according to OpenAI researcher Sean Grove

The Decoder
by The Decoder
3 hours ago
21
USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days
Crypto

USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days

Bitcoin.com News
by Bitcoin.com News
3 hours ago
21
Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput
AI

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

The Decoder
by The Decoder
3 hours ago
21
Load More
Next Post
USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days

USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days

ADVERTISEMENT

Follow Us

Categories

  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
Bitcoin

Bitcoin

$117,333.97

BTC -0.10%

Ethereum

Ethereum

$2,932.74

ETH -1.24%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now