Your Model, Your Rules: Mistral Forge and Proprietary AI

There is a subtle misconception at the heart of how most companies use artificial intelligence today. They send prompts to models trained on billions of internet pages, books, articles, forums, and public code on GitHub, and expect answers calibrated to their internal reality. But that internal reality—the operating procedures of a pharmaceutical company, the maintenance manuals of a turbine, the standard contracts of a Milanese law firm, the compliance policies of a bank—has never entered any training dataset. It is like asking someone who has read the entire Treccani encyclopedia to explain how the internal approval process for a vacation request works in your company. The answer will be generic, polite, and useless.

Mistral AI chose the stage of Nvidia GTC 2026, Jensen Huang's annual conference where this year the focus was almost exclusively on agentic AI for the enterprise, to announce Forge: a system that allows organizations to train language models directly on their own institutional knowledge. It is not a new chatbot, nor a tool to optimize prompts. It is something structurally different, and it is worth understanding exactly what, because the technical, economic, and geopolitical implications are anything but trivial.

What Forge Does, Concretely

To understand Forge, you first need to understand what distinguishes it from existing AI customization tools. The vast majority of enterprise solutions today work in two ways: Retrieval-Augmented Generation (RAG), where the model is not touched but is "informed" at the time of the response by retrieving relevant documents from a database, or superficial fine-tuning, where the model is retrained on a small specific dataset to slightly adapt its behavior. Both approaches leave the base model unchanged. It is like renting an apartment and bringing your own furniture: the building doesn't change, only the decor does.

Forge proposes something radically different: building the building from scratch, following your own specifications. The product page describes a process articulated in multiple phases of the model lifecycle. Pre-training, the deepest phase, allows training the model on large volumes of unstructured internal documentation, company codebases, operational data, and historical archives, so that the model doesn't just consult that knowledge but internalizes it in its basic functioning. It is the difference between a doctor reading a medical record before a visit and a doctor who has spent ten years working in that specific department.

Alongside pre-training, Forge offers post-training tools to refine behavior on specific tasks, Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to encode internal preferences and standards, and Low-Rank Adaptation (LoRA) for lighter adaptations without retraining the entire model. The third leg of the system is Reinforcement Learning: through RLHF pipelines, organizations can align model behavior with their operational policies and evaluation criteria, and improve agent performance in complex environments, from workflow orchestration to tool usage and decision-making. Everything is completed by synthetic data generation tools, fundamental for covering those edge cases that rarely emerge in real data but make a difference in production, and by evaluation frameworks linked to the company's internal KPIs, not generic benchmarks used in academia.

A technical detail that matters for architectural choices is the support for both dense models and Mixture-of-Experts (MoE) architectures. As already analyzed when talking about Devstral 2, MoE architectures activate only a subset of "specialized sub-networks" for each request, achieving capabilities comparable to much larger models with lower latency and computational costs. For a company deciding whether to invest in a high-quality dense model or a more efficient MoE, this flexibility is not a cosmetic detail. Forge also supports multimodal inputs—text, images, audio—where the use case requires it.

On the agentic front, Forge has been designed to work with Mistral Vibe, Mistral's autonomous agent that can use the platform to perform fine-tuning, find optimal hyperparameters, schedule jobs, and generate synthetic data autonomously. The system monitors metrics to avoid regressions on relevant benchmarks, and the entire interface is designed to be actionable in natural language, even by non-human agents.

Mistral has already made Forge available to a selected group of partners: ASML (the Dutch manufacturer of EUV lithography machines, which led Mistral's Series C round), Ericsson, the European Space Agency, DSO National Laboratories, HTX Singapore, and Reply, the Italian technology consulting company. These are names covering very different sectors: telecommunications, defense and security, aerospace, precision manufacturing, and tech consulting. The range is not accidental: Mistral wants to demonstrate that Forge responds to concrete industrial needs, not laboratory use cases.

The Comparison: Where Fine-Tuning Ends and Forge Begins

To understand where Forge positions itself relative to the existing ecosystem, it is worth doing an honest comparative exercise, starting with the most relevant competitors.

OpenAI offers fine-tuning on GPT-4o and other family models, but it is an adaptation of OpenAI's base model, not a training from scratch on an architecture of the customer's choice. It is a more accessible, fast option with much lower entry barriers, but structurally limited: you are still working within the constraints of the base model, which remains OpenAI's property and can be deprecated, modified, or price-changed without the customer having a say. The conceptual distance from Forge is that between customizing SaaS software and developing your own application.

Anthropic with Claude does not offer base model retraining: the paradigm is that of "skills" and contextual integration through system prompts and RAG. It is a leaner and more accessible approach, but explicitly designed to adapt behavior at runtime, not to modify the model's fundamental knowledge. Google with Vertex AI offers custom training capabilities, even starting from scratch on own architectures, but the platform is historically oriented towards traditional machine learning rather than large agentic language models, and integration with agent-first tools is less mature than what Forge claims.

The other significant alternative is local training on open-weight models, which gives maximum control over the entire chain, from hardware to model to data. But the difference with Forge lies in the scale and the expertise required. Pre-training an enterprise-sized model requires GPU clusters of hundreds of units, curated datasets in the terabyte range, and specific skills that very few companies can afford to build internally. As documented in the analysis of SLMs, even fine-tuning a 7-billion parameter model requires non-trivial equipment and skills: scaling to a full pre-training is a jump of orders of magnitude. Forge positions itself as the managed service that eliminates this barrier, delegating the infrastructure and technical know-how to Mistral while the company contributes domain knowledge and data.

On this point, Timothée Lacroix, co-founder and chief technologist of Mistral, was explicit with TechCrunch: the customer decides the model and infrastructure, but Mistral advises and accompanies. And for teams that need more than just consulting, Forge comes with "forward-deployed" engineers, a role Mistral has explicitly borrowed from the playbooks of Palantir and IBM: technical professionals who integrate directly into customer teams to oversee the construction of data pipelines, the definition of evals, and the calibration of the training process. It is a delivery model that implicitly admits that technology alone is not enough.

Image from mistral.ai

Real Advantages, Real Criticalities

Having discussed the tools, it is worth honestly analyzing what Forge promises and where open questions emerge.

The most structural advantage is over intellectual property control. A model trained on a company's internal data permanently encodes that knowledge into its architecture, not as an external searchable reference but as an integral part of reasoning. This profoundly changes the nature of the AI agents built on that model: instead of agents that retrieve information from databases and incorporate it into responses, you get agents that reason using the organization's vocabulary, decision patterns, and operational constraints as a natural starting point. For critical workflows, the resulting behavior is more predictable, more adherent to internal procedures, and less subject to hallucinations that emerge when a generalist model attempts to apply generic reasoning to highly specific contexts.

For sectors where the language is not English, or where specialized terminologies exist that do not appear in public training corpora, the advantage of pre-training on proprietary data is even more marked. A model trained on years of Italian regulatory standards understands the nuances of Italian administrative law not because someone explained them at runtime, but because it "read" them during training with the same depth as it read any other text.

Criticalities, however, deserve just as much attention. The first concerns the data itself. Forge requires large volumes of structured, high-quality internal documentation to produce significant results. In practice, many organizations find themselves with non-homogeneous historical archives, documents in heterogeneous formats, non-normalized data, and conflicting versions of the same policies. "Garbage in, garbage out" applies even more strongly to training than to RAG: a model pre-trained on poor quality data doesn't just retrieve it at runtime, it interiorizes it. The risk of overfitting on too small a corpus or on obsolete policies is real, and the process of cleaning and curating the dataset is often as burdensome as the training itself.

The second criticality concerns costs and skills. Pre-training enterprise-sized models on high-end GPU clusters has costs that are difficult to justify for medium-sized entities. Mistral has not yet published a detailed pricing structure for Forge; the service is currently available upon direct request, making it difficult for a CFO to make a concrete assessment of return on investment before budget approval. The FDEs included in the service solve part of the internal skills problem but introduce a human and organizational dependency with its own management costs.

The third issue, probably the most sensitive for managerial decision-makers, concerns infrastructure. The Forge product page speaks of "infrastructure flexibility" and promises deployment without "cloud lock-in." But reading the available documentation carefully, the distinction that emerges is between flexibility in inference, where the resulting model can effectively be deployed on private cloud, on-premise, or on Mistral Compute infrastructure at the customer's choice, and the training phase, for which Mistral does not publicly specify deployment options. Considering that pre-training a significant-sized model requires clusters of hundreds of H100 GPUs or equivalent, and it is highly unlikely that even the largest partners like ASML or Ericsson have this infrastructure in-house for a project of this type, it is reasonable to assume that at least the training phase takes place on Mistral infrastructure. But, it is important to specify, this is an assessment based on technical considerations and what Mistral doesn't say, not on explicit statements. Mistral neither confirms nor denies this reading in the available public documentation. Those evaluating Forge for particularly sensitive data would do well to clarify this point contractually before proceeding.

Europe in the Eye of the AI Storm

Forge was not announced in a vacuum. The timing at Nvidia GTC 2026 is an explicit positioning: Mistral presents itself on the stage of the industry's most influential conference, in front of the main players in the global AI ecosystem, with a product that competes directly with the enterprise offerings of OpenAI and Google Cloud. It is an act of lucid defiance, not improvisation.

As analyzed in this previous article of mine regarding Devstral 2, Mistral is in a structurally paradoxical position: it is the most convincing demonstration that Europe can produce frontier AI, and at the same time, it is a medium-sized company operating with resources incomparable to its American rivals. The 11.7 billion euro valuation reached with the Series C round led by ASML is a remarkable milestone by European standards and microscopic compared to OpenAI's valuation, which exceeds 150 billion dollars. The projection to exceed one billion dollars in ARR (Annual Recurring Revenue) in 2026, reported by the Financial Times, signals real commercial traction but does not solve the scale asymmetry.

In this context, Forge has a geopolitical reading that goes beyond the product itself. For a European company that trains its customers to build proprietary models, the issue of data sovereignty is both a selling point and a political commitment. GDPR guarantees a regulatory framework that American providers must respect but did not help build. Mistral, as a French entity subject to European law, offers different structural guarantees on how data is handled during training, even if, as we have seen, the technical details of the infrastructure on which that training takes place remain partially opaque.

The point worth not romanticizing is that the dependence on NVIDIA hardware remains intact. Every Mistral model, including Forge, is trained on GPUs designed in California, which means that "European technological sovereignty" is inevitably partial until Europe has an equivalent of the AI chip production chain. ASML, which produces the EUV lithography machines without which no advanced chip can be manufactured, is a fundamental piece, but the road from ASML to a competitive European AI GPU is still long.

Questions That Remain Open

Forge is an interesting and technically ambitious product. But some questions remain unanswered, and they are questions that those who must make concrete decisions would do well to keep in mind.

The most urgent concerns the transparency of the training infrastructure: where physically does the model training take place? What contractual guarantees exist on data isolation during training? What security certifications cover the data during that process? Since detailed public documentation on these aspects does not yet exist, the answer today is: you have to ask Mistral directly and get it in writing.

The second question concerns the economic sustainability of the model for medium-sized organizations. Forge today seems optimized for large entities with budgets, datasets, and operational complexities that justify an investment of this scale. What happens when, or if, Mistral decides to extend Forge to the mid-market? Pricing and access modes could change significantly from the current consultative approach.

The third question is about the model lifecycle over time. A model trained today on an organization's data starts to diverge from operational reality the moment it is deployed, because organizations change, regulations are updated, and processes evolve. Forge includes drift detection tools and continuous improvement pipelines via RL, but how sustainable is it actually to keep a proprietary model updated compared to an external model that someone else updates continuously? It is a hidden cost that does not appear in press releases.

The fourth, and perhaps most fundamental, is the question of lock-in. Building a model deeply integrated with an organization's institutional knowledge is by definition an investment that is difficult to turn back from. If Mistral were to change strategy, be acquired, or simply decide to modify its service conditions, how difficult would it be to extract and reuse that encoded knowledge? It is the AI version of a question companies have already asked themselves with proprietary databases, CRMs, and ERP software: every tool that becomes critical infrastructure also becomes a risk of dependency.

In summary, Forge is a serious answer to a serious problem. The idea that AI models must learn to reason with the specific knowledge of those who use them, not simply consult it, is conceptually solid and probably represents an important direction for enterprise adoption of AI in the coming years. Open questions do not negate this; they make them more necessary.