Modelos de lenguaje

DeepSeek V4: el modelo de código abierto más grande del mundo, a un precio que rompe el mercado

Fernando Luis

02 May 2026 — 1 min read

Este artículo sintetiza información publicada originalmente por github.com. Para el contexto completo, las declaraciones originales y los detalles que no hemos incluido, consulta la fuente indicada.

Resumen: DeepSeek V4: el modelo de código abierto más grande del mundo, a un precio que rompe el mercado

El contexto

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Qué ha pasado

To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

Detalles

Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.

Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.

Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.

Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.

On top of the efficient architecture of DeepSeek-V2, we pioneer an auxiliary-loss-free strategy for load balancing, which minimizes the performance degradation that arises from encouraging load balancing.

We investigate a Multi-Token Prediction (MTP) objective and prove it beneficial to model performance.

We design an FP8 mixed precision training framework and, for the first time, validate the feasibility and effectiveness of FP8 training on an extremely large-scale model.

Fuente original

Lee el artículo completo en github.com.

Si trabajas en este ámbito y quieres compartir tu perspectiva, escríbeme a [email protected].

DeepSeek V4: el modelo de código abierto más grande del mundo, a un precio que rompe el mercado

Fernando Luis

El contexto

Qué ha pasado

Detalles

Fuente original

Read more

Ataque a la cadena de suministro de Microsoft: más de 70 repositorios comprometidos para robar credenciales de desarrolladores de IA

Xiaomi rompe la barrera de los 1.000 tokens/s en un modelo de 1 billón de parámetros — sin hardware especializado

Google sigue a Anthropic y firma un acuerdo de computación con SpaceX para Gemini Enterprise

Cómo funciona el Attention机制: el corazón de los LLMs que cambió la IA para siempre