DeepSeek targets mid-July V4 launch with new API pricing

DeepSeek announced Sunday that its V4 model will officially launch in mid-July with peak-valley API pricing that doubles rates during busy hours.news.futunn
The team also open-sourced DSpark, a speculative decoding framework co-developed with Peking University that speeds per-user generation by up to 85%.marktechpost+1
DSpark is already deployed across DeepSeek's production systems and works with third-party models including Alibaba's Qwen and Google's Gemma.digg+1

Researchers from Peking University and DeepSeek released DSpark on June 27, an open-source speculative decoding framework that accelerates large language model inference by 60 to 85 percent per user in live production systems, marking the Chinese AI lab's first major technical release since its $7 billion funding round.marktechpost+2

How DSpark Works

Speculative decoding splits text generation into two roles: a small, fast draft model proposes a batch of tokens, and the full target model verifies that batch in a single forward pass, keeping all tokens it agrees with. DSpark improves on earlier approaches with two additions. First, rather than training a separate draft model from scratch, it grafts a lightweight speculative head directly onto the existing model checkpoint — meaning the underlying model's output quality remains unchanged. Second, a confidence-scoring system gives each drafted token a probability of surviving verification, while a hardware-aware scheduler adjusts how many tokens get checked based on current GPU load. When traffic is light, the system verifies longer runs of guesses; when traffic is heavy, it discards low-confidence tokens before they consume compute.digg+3

Performance and Compatibility

In DeepSeek's online production environment handling real user traffic, DSpark delivered 60 to 85 percent faster single-user generation on V4-Flash and 57 to 78 percent on V4-Pro compared to DeepSeek's prior MTP-1 baseline. Under certain latency conditions, throughput gains reached as high as 661 percent on Flash and 406 percent on Pro. Offline benchmarks showed accepted token length rising 26 to 31 percent over Eagle3 and 16 to 18 percent over DFlash.youtube+2

The framework is model-agnostic. DeepSeek demonstrated compatibility with Alibaba's Qwen3 and Google's Alphabet Inc. Gemma checkpoints. Alongside DSpark, the team open-sourced DeepSpec, a full-stack codebase for training and evaluating speculative decoding drafters, all under an MIT license on GitHub.marktechpost+3

Broader Context

The release arrives as DeepSeek prepares to officially launch its V4 model in mid-July with a new peak-and-off-peak API pricing mechanism. DSpark is already fully deployed across DeepSeek's online services, reducing wasted GPU compute from invalid verifications while maintaining output quality identical to the base model. DeepSeek founder Liang Wenfeng co-authored the accompanying paper, titled "DSpark: Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation."pandaily+1

Sources (14)

1 DeepSeek V4 is scheduled for mid-July, with concurrent ... news.futunn.com
2 DeepSeek Releases DSpark, a Speculative Decoding Framework ... www.marktechpost.com
3 DeepSeek DSpark Boosts Generation Speed by 85% in First Post ... pandaily.com
4 DeepSeek-AI and Peking University open-source DSpark, using ... digg.com
5 Open Source Speculative Decoding for 85% Faster Inference www.youtube.com
6 DSpark - DeepSeek Just Made Inference 85% Faster - YouTube www.youtube.com
7 DeepSpec: a full-stack codebase for training and ... - GitHub github.com
8 DeepSeek's DSpark Brings Speculative Decoding Back Into the ... dev.to
9 Open Source Inference Frameworks - Aussie AI www.aussieai.com
10 hemingkx/SpecDec: Codes for our paper "Speculative Decoding github.com
11 Best Inference Framework & Open Models for Orchestrator-Workers ... forums.developer.nvidia.com
12 mscheong01/speculative_decoding.c: minimal C implementation of ... github.com
13 DSpark: Speculative decoding accelerates LLM inference [pdf] www.reddit.com
14 DeepSpec/DSpark_paper.pdf at main · deepseek-ai ... - GitHub github.com

Breaking News

Popular News

DeepSeek targets mid-July V4 launch with new API pricing

How DSpark Works

Performance and Compatibility

Broader Context

Leave a ReplyCancel Reply

Stay informed and not overwhelmed, subscribe now!

Newsletter Subscribe

How DSpark Works

Performance and Compatibility

Broader Context

Related Posts

JWST spots most distant barred spiral galaxy ever seen

Huawei publishes mass production data backing Tau Scaling Law

Micron breaks ground on $9B Hiroshima expansion for AI memory chips

Leave a ReplyCancel Reply

Stay informed and not overwhelmed, subscribe now!