DeepSeek and R1: A low-cost AI you can run locally

You can spend billions on innovation. But sometimes, groundbreaking technology comes from lean, research-driven teams working on a shoestring budget. That’s where DeepSeek and its flagship model R1 made a disruptive impact on the Generative AI industry.

In this post, we’ll dive into the history of DeepSeek and its R1 model, explore the technical breakthroughs that make it a game changer, and walk you through how to deploy DeepSeek R1 on your local machine using Ollama and Open-WebUI.

A brief history of DeepSeek and its R1 model

The genesis of DeepSeek

Founded in 2023 by Chinese engineer and hedge-fund veteran Liang Wenfeng, DeepSeek was born out of a desire to push the boundaries of AI research without being solely driven by commercial gain. Liang, who grew up in Guangdong and studied at Zhejiang University, had already made waves in the AI arena by co-founding the hedge fund High-Flyer Quant in 2015, a venture that combined financial acumen with pioneering AI research and supercomputing power.

DeepSeek’s mission has been clear from the start: to leverage cutting-edge research and novel training techniques to build AI models that perform at a high level, at a fraction of the cost incurred by their U.S. counterparts. Operating with a modest team of around 140 engineers and researchers, DeepSeek has quickly become synonymous with low-cost innovation in the high-stakes world of AI.

The rise of R1

The crown jewel in DeepSeek’s portfolio is the R1 reasoning model. Unlike many of its U.S.-based rivals, which have spent anywhere from $100 million to$ 1 billion on developing advanced AI systems, DeepSeek achieved R1 with an investment of just under $6 million. This model has shaken up the tech market, sending ripples through U.S. AI stocks and prompting questions about the escalating costs of AI infrastructure.

At its core, R1 is a powerful reasoning model designed for processing queries by generating a coherent, multi-step reasoning. It’s built using a hybrid training approach that combines a baseline of supervised learning with reinforcement learning techniques. Early iterations, even one known as R1-Zero, demonstrated the model’s ability to “reflect” on its own reasoning, a milestone that signaled a new era in AI model training.

Censorship and data privacy considerations

Despite its impressive technical achievements, it’s important to note that DeepSeek’s R1 is a product of its environment. As a Chinese AI, it is subject to heavy censorship. For example, if you visit the online version hosted by DeepSeek at https://chat.deepseek.com/, you’ll find that you cannot ask questions about banned topics in China, such as the events of Tiananmen Square. This level of censorship means that certain historical and political topics are filtered out, limiting the scope of inquiries that the model can handle.

Additionally, any data sent to the hosted application is transmitted directly to servers in China. If data privacy and control are top priorities for you or your project, this could be a significant concern. For those wary of having sensitive or personal data processed on foreign servers, a self-hosted solution may be a more secure alternative (explained later in this article!).

Technical breakthroughs that set R1 apart

While the sheer cost-efficiency of R1 is remarkable, its performance is underpinned by several innovative technical breakthroughs:

8-bit Floating Point (FP8) Training:
R1 uses low-precision FP8 training to save memory without sacrificing performance, enabling more efficient computations.
Multi-token Prediction:
By predicting multiple tokens at once, the model effectively doubles its inference speed compared to traditional one-token-at-a-time methods.
Multi-head Latent Attention (MLA):
This technique compresses key-value index data, saving valuable VRAM while enhancing overall efficiency.
Mixture-of-Experts (MoE) Model:
With a staggering 671 billion parameters in its architecture, R1 activates only 37 billion at a time, allowing it to run on consumer-grade GPUs instead of relying exclusively on high-end hardware.

These innovations not only make R1 an impressive feat of engineering but also position it as a viable option for developers looking to integrate advanced AI reasoning into their applications without incurring astronomical costs.

A Guide to running DeepSeek on Ollama with Open-WebUI

For developers eager to experiment with DeepSeek’s R1 model, there’s good news: you can easily set it up on your local machine using Docker, Ollama for inference, and Open-WebUI as a front-end interface.

Note: The configuration below is optimized for CPU-only execution. If you’re looking to leverage GPU acceleration, refer to the Ollama documentation for additional options.

Step 1: Run Ollama for inference

First, start the Ollama container. This lightweight container allows for model inference:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama-cpu ollama/ollama

This command does the following:

Runs the Ollama container in detached mode (-d).
Mounts a volume for persistent configuration.
Maps port 11434 for API access.
Names the container ollama-cpu for easy reference.

Step 2: Run Open-WebUI for a chat interface

Next, deploy the Open-WebUI container, a web interface that lets you interact with the AI model in chat mode:

docker run -d --network=host -v open-webui/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

This command sets up Open-WebUI with:

Host network mode for seamless connectivity.
A mounted volume to store application data.
An environment variable pointing to the Ollama API.
Auto-restart enabled to ensure continuous operation.

Step 3: Download the DeepSeek R1 model

With Ollama running, you can now download the DeepSeek R1 model. Use the following command to pull the lightweight deepseek-r1:8b version:

docker exec ollama-cpu ollama run deepseek-r1:8b

This command executes within the running Ollama container, fetching the R1 model for inference.

As said earlier, the raw model is heavily-censored, by the policy of the Chinese gov. Since the model is open-source, there are fine-tuned models that we call “abliterated”, from the community. For instance, you can download this one, using the command:

docker exec ollama-cpu ollama run huihui_ai/deepseek-r1-abliterated:8b

Step 4: Start chatting with DeepSeek R1

Finally, open your web browser and navigate to http://localhost:8080.

Here, you’ll be greeted by the Open-WebUI interface. Simply select the DeepSeek R1 model from the list and start experimenting with advanced AI reasoning in a chat-like environment.

Conclusion

In my opinion, DeepSeek’s R1 is a strong, high-performance model that delivers impressive AI reasoning without the astronomical price tag. The innovative techniques like FP8 training, multi-token prediction, and a Mixture-of-Experts architecture prove that you don’t need huge budgets to make a significant impact in AI.

I also believe that DeepSeek has done a fantastic job of generating buzz and getting everyone to talk about it, smart marketing that has amplified its reach. That said, if you’re concerned about data privacy and the heavy censorship that comes with a Chinese-hosted solution, consider self-hosting R1 with Ollama and Open-WebUI.

For me, DeepSeek’s R1 is a win-win: it’s a solid model paired with open-sourced weights, making it well worth exploring for developers, researchers, and small companies alike.

Table of Contents