
Nebius has launched the Nebius Token Factory, a platform that allows AI companies as well as digital businesses intelligence to deploy and optimize both open-source and custom models at scale, with the reliability and control needed for enterprise-level use.
Built on Nebius’s comprehensive AI infrastructure, the Nebius Token Factory combines high-performance inference, post-training, and detailed access management into a single, secure platform. It supports all major open models, such as NVIDIA Nemotron, DeepSeek, GPT-OSS by OpenAI, Llama, and Qwen, while also allowing customers to host their own models.
As AI transitions from experimentation to production, relying on closed models can cause scaling challenges. Open-source along with custom models helps overcome these barriers, fostering innovation as well as improving cost-effectiveness. However, managing and securing these models at scale has often been complex and resource-heavy for many teams.
Nebius Token Factory helps teams unlock these benefits by combining the flexibility of open models with the governance, performance, and cost-efficiency required for large-scale AI operations. It’s optimized for efficiency, offering sub-second latency, auto-scaling throughput, and 99.9% uptime, even for workloads handling millions of requests per minute.
According to Coherent Market Insights, the Enterprise Application Market is projected to grow at a CAGR of 6.9% during 2025 to 2032. Currently, the market is at USD 319.40 Billion in 2025 and is expected to be around USD 509.88 Billion by 2032. The enterprises application market comprises applications that help businesses automate various back-office functions related to accounting, project management, enterprise resource planning, and more. The enterprise application refers to a complex software system used by organizations to support business operations, enhance productivity and improve efficiency.
Nebius Token Factory helps teams unlock these benefits by combining the flexibility of open models with the governance, performance, and cost-efficiency required for large-scale AI operations. It’s optimized for efficiency, offering sub-second latency, auto-scaling throughput, and 99.9% uptime, even for workloads handling millions of requests per minute.
Early adopters of Nebius Token Factory are leveraging the platform to power a wide range of AI solutions from intelligent chatbots and coding copilots to high-performance search, retrieval-augment generation (RAG), document intelligence and automated customer support.
Prosus, the power behind some of the world’s leading lifestyle and e-commerce brands, has achieved up to 26x cost reductions compared to proprietary models.
“We move fast, test and iterate quickly, and the flexibility, products and quick responses from Nebius Token Factory allowed us to keep this pace all the way through production,” said Zülküf Genç, Director of AI at Prosus. “By leveraging Nebius Token Factory’s dedicated endpoints, Prosus was able to secure guaranteed performance and isolation. The addition of autoscaling was the game-changer, allowing us to handle massive workloads of up to 200 billion tokens per day without manual intervention.”
Source:
News: Nebius
Company: Nebius
