Bitcoin

Bitcoin

$108,986.37

BTC 0.62%

Ethereum

Ethereum

$4,394.75

ETH 1.18%

  • Login
  • Register
Metaverse Media Group
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
Metaverse Media Group

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput

The Decoderby The Decoder
12 July 2025
Microsoft has introduced Phi-4-mini-flash-reasoning, a lightweight AI model built for scenarios with tight computing, memory, or latency limits. Designed for edge devices and mobile apps, the model aims to deliver strong reasoning abilities without demanding hardware. The article Microsoft introduces Phi-4-mini-flash-reasoning with up to 10x higher token throughput appeared first on THE DECODER….

summary
Summary

Microsoft has introduced Phi-4-mini-flash-reasoning, a lightweight AI model built for scenarios with tight computing, memory, or latency limits. Designed for edge devices and mobile apps, the model aims to deliver strong reasoning abilities without demanding hardware.

Phi-4-mini-flash-reasoning packs 3.8 billion parameters and builds on the Phi-4 family introduced in December, with a focus on mathematical reasoning.

At the core of the new model is an updated architecture called SambaY, now featuring a Gated Memory Unit (GMU) and “differential attention.” Traditional transformers rely on complex attention in every layer to decide which parts of the input matter most.

The GMU streamlines this by replacing heavy cross-attention operations with a simple element-wise multiplication between the current input and a memory state from an earlier layer. This allows the model to dynamically recalibrate which tokens to focus on without the usual computational overhead.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

Hybrid decoder architecture with Mamba/SWA self-decoder, GMU-supported cross-attention, and linear KV cache prefill.
SambaY combines a full attention layer with gated memory units, reducing cross-attention and speeding up inference. | Image: Microsoft
Recommend our article

SambaY mixes several attention mechanisms: a single full-attention layer creates a key-value cache that later layers can access, while GMUs take the place of about half the cross-attention layers, letting layers share information through lightweight multiplications. This approach slashes both memory use and compute requirements. In typical models, data transfers between memory and processor climb as sequence length grows, but with SambaY, this remains mostly flat.

A new architecture for more efficient reasoning

These architectural changes bring a significant boost in performance. Microsoft says Phi-4-mini-flash-reasoning delivers up to ten times higher throughput and cuts average latency by a factor of two to three compared to its predecessor. However, these results are based on tests with industrial GPUs, not the low-resource devices the model is meant for.

Scatter plot: Inference latency of Phi4-mini-Reasoning and Phi4-mini-Flash-Reasoning over generation lengths up to 32,000 tokens.
Phi-4-mini-flash-reasoning shows much lower latency at 32,000 tokens compared to the standard reasoning model, highlighting the efficiency of the flash method. | Image: Microsoft
Scatter plot: Latency vs. throughput for standard and flash reasoning, flash achieves 10× higher throughput with the same latency.
Flash reasoning boosts throughput by a factor of ten while maintaining the same latency readiness. | Image: Microsoft

Phi-4-mini-flash-reasoning also excels at handling long contexts. The model supports a context window of up to 64,000 tokens and can maintain its speed and performance even at maximum capacity. Microsoft cites the efficiency of the SambaY design, which keeps processing speeds steady even as sequence length increases—a clear advantage over standard transformer models that tend to slow down in these scenarios.

Outperforming larger models in reasoning benchmarks

The flash version stands out in benchmarks. Phi-4-mini-flash-reasoning was trained on five trillion tokens from the same data as Phi-4-mini, including synthetic data, using 1,000 A100 GPUs over 14 days.

In testing, it consistently beat the base model, especially on knowledge-intensive and programming tasks, with performance gains of several percentage points. The model also did better in math and scientific reasoning, all without the resource-heavy reinforcement learning step used in previous versions.

Recommendation
Bar chart: Pass@1 accuracy of six models on AIME24, AIME25, Math-500, and GPQA Diamond, Phi-4-mini-flash-reasoning highest.
Phi-4-mini-flash-reasoning outperforms its base model and, in some cases, even surpasses models twice its size. | Image: Microsoft

Phi-4-mini-flash-reasoning is available on Hugging Face, and Microsoft has released code examples in the Phi Cookbook. The full training codebase is open-sourced on GitHub.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 4 mins read
0
0
23
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

OpenAI prepares to launch GPT-5, but big leaps are unlikely
AI

OpenAI prepares to launch GPT-5, but big leaps are unlikely

4 weeks ago
29
Psychologist says ChatGPT helps understanding even if it may not understand
AI

Psychologist says ChatGPT helps understanding even if it may not understand

4 weeks ago
26
Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects
AI

Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects

4 weeks ago
26

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO
Crypto

XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
29
SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert
Crypto

SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
29
XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows
Crypto

XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
29
Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs
Crypto

Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
28
Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s
Crypto

Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
32
Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight
Crypto

Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight

Bitcoin.com News
by Bitcoin.com News
4 weeks ago
26
Load More
Next Post
USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days

USDT Hits $160B—Stablecoins Swell to $257B in Just 7 Days

ADVERTISEMENT

Follow Us

Categories

  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
Bitcoin

Bitcoin

$108,986.37

BTC 0.62%

Ethereum

Ethereum

$4,394.75

ETH 1.18%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now