Newsletter Subscribe
Enter your email address below and subscribe to our newsletter

news.futunnmarktechpost+1digg+1Researchers from Peking University and DeepSeek released DSpark on June 27, an open-source speculative decoding framework that accelerates large language model inference by 60 to 85 percent per user in live production systems, marking the Chinese AI lab's first major technical release since its $7 billion funding round.marktechpost+2
Speculative decoding splits text generation into two roles: a small, fast draft model proposes a batch of tokens, and the full target model verifies that batch in a single forward pass, keeping all tokens it agrees with. DSpark improves on earlier approaches with two additions. First, rather than training a separate draft model from scratch, it grafts a lightweight speculative head directly onto the existing model checkpoint — meaning the underlying model's output quality remains unchanged. Second, a confidence-scoring system gives each drafted token a probability of surviving verification, while a hardware-aware scheduler adjusts how many tokens get checked based on current GPU load. When traffic is light, the system verifies longer runs of guesses; when traffic is heavy, it discards low-confidence tokens before they consume compute.digg+3
In DeepSeek's online production environment handling real user traffic, DSpark delivered 60 to 85 percent faster single-user generation on V4-Flash and 57 to 78 percent on V4-Pro compared to DeepSeek's prior MTP-1 baseline. Under certain latency conditions, throughput gains reached as high as 661 percent on Flash and 406 percent on Pro. Offline benchmarks showed accepted token length rising 26 to 31 percent over Eagle3 and 16 to 18 percent over DFlash.youtube+2
The framework is model-agnostic. DeepSeek demonstrated compatibility with Alibaba's Qwen3 and Google's Alphabet Inc. Gemma checkpoints. Alongside DSpark, the team open-sourced DeepSpec, a full-stack codebase for training and evaluating speculative decoding drafters, all under an MIT license on GitHub.marktechpost+3
The release arrives as DeepSeek prepares to officially launch its V4 model in mid-July with a new peak-and-off-peak API pricing mechanism. DSpark is already fully deployed across DeepSeek's online services, reducing wasted GPU compute from invalid verifications while maintaining output quality identical to the base model. DeepSeek founder Liang Wenfeng co-authored the accompanying paper, titled "DSpark: Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation."pandaily+1