“TeckMile.com – Elevating Your Skills and Creativity, One Project at a Time.”

Install Ollama and DeepseekR1 on Debian12

In this tutorial, we’ll cover how to install Ollama and the DeepseekR1 model on Debian, along with NVIDIA drivers and CUDA Toolkit directly from NVIDIA’s official website. We’ll also configure GPU usage for all available GPUs and install OpenWebUI for a user-friendly interface.

Prerequisites:

  • Debian 12 (Bookworm) or later
  • An NVIDIA GPU compatible with CUDA (check NVIDIA’s CUDA GPUs list)
  • Internet connection

Step 1: Install NVIDIA Drivers and CUDA Toolkit

NVIDIA Drivers:

  1. Update Your System:
    • sudo apt update
  2. Install Required Packages:
    • sudo apt install -y build-essential dkms
  3. Download NVIDIA Drivers:
  4. Install NVIDIA Drivers:
    • Switch to a tty (like Ctrl+Alt+F2), log in, and:bashsudo service lightdm stop # or gdm3 for GNOME cd ~/Downloads sudo sh NVIDIA-Linux-x86_64-<version>.run
    • Follow the prompts, accepting the license agreement. Choose to install the 32-bit compatibility libraries if prompted.
  5. Reboot:bashsudo reboot

CUDA Toolkit:

  1. Download CUDA Toolkit:
  2. Install CUDA Toolkit:
    • Once downloaded:bashsudo sh cuda_<version>_linux.run
    • Select ‘y’ for installing CUDA Toolkit, and ‘n’ for the driver installation since we’ve already done that.
  3. Set Environment Variables:
    • Add these lines to your ~/.bashrc or ~/.zshrc:bashecho 'export PATH=/usr/local/cuda-<version>/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-<version>/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc

Step 2: Install Ollama

  1. Download and Install Ollama:
    • Ollama doesn’t have an official Debian package, so we’ll install from source:
    • sudo apt install -y curl
    • curl -fsSL https://ollama.com/install.sh | sh
  2. Verify Installation:bashollama --version

Step 3: Configure GPU Usage

  1. Ollama Service Configuration:
    • If you’re running Ollama as a service, modify /etc/systemd/system/ollama.service:ini
    • Configuring Ollama for different hardware scenarios when running as a systemd service involves tweaking the service file. For a CPU-only setup, you would simply omit any GPU-related environment variables in the service file, allowing Ollama to fall back to CPU usage. To use a single GPU, such as GPU 0, you’d set Environment=”CUDA_VISIBLE_DEVICES=0″, ensuring that only this GPU is recognized for computation. For a multi-GPU setup with GPUs 0, 1, and 2, you’d modify the service to include Environment=”CUDA_VISIBLE_DEVICES=0,1,2″, allowing Ollama to utilize all three GPUs. If you aim to spread the computational load across these GPUs, you can also set Environment=”OLLAMA_SCHED_SPREAD=1″ alongside the CUDA_VISIBLE_DEVICES setting. This configuration encourages Ollama to distribute the workload across available GPUs, potentially improving performance for parallelizable tasks. Remember to reload systemd (sudo systemctl daemon-reload) and restart the service (sudo systemctl restart ollama) after making changes to the service file to apply these configurations.
    • Example for three gpus:
    • [Unit]
    • Description=Ollama Service
    • After=network-online.target
    • [Service]
    • Environment=”CUDA_VISIBLE_DEVICES=0,1,2″
    • Environment=”OLLAMA_SCHED_SPREAD=1″
    • ExecStart=/usr/local/bin/ollama serve
    • User=ollama
    • Group=ollama
    • Restart=always
    • RestartSec=3
    • Environment=”PATH=/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/gam>
    • [Install]
    • WantedBy=default.target

Step 4: Install DeepseekR1 Model

  1. Run DeepseekR1 Model with Ollama:
    • The DeepSeek R1 models come in various sizes, each with different hardware demands. The smallest model, Deepseek-R1-1.5B, can be run on systems with only a CPU and 8GB of RAM, making it suitable for basic setups without GPU requirements. Moving to the Deepseek-R1-7B and Deepseek-R1-8B, you’ll need a GPU with at least 8GB of VRAM to ensure smooth performance. For the larger models like Deepseek-R1-14B and Deepseek-R1-32B, more substantial hardware is required, with GPUs needing 12-24GB of VRAM. The Deepseek-R1-70B model demands high-end hardware, typically GPUs with 48GB of VRAM or multi-GPU setups for enterprise-level applications. Here’s a table of the commands to pull each model using Ollama:

      Model Name
      Command to Pull Model:
      Deepseek-R1-1.5B
      ollama run deepseek-r1:1.5b
      Deepseek-R1-7B
      ollama run deepseek-r1:7b
      Deepseek-R1-8B
      ollama run deepseek-r1:8b
      Deepseek-R1-14B
      ollama run deepseek-r1:14b
      Deepseek-R1-32B
      ollama run deepseek-r1:32b
      Deepseek-R1-70B
      ollama run deepseek-r1:70b
    • Remember, these commands assume you’re using Ollama to manage and run these models locally. Adjustments might be necessary based on your specific setup or if using different tools.

Step 5: Install OpenWebUI

  1. Install Dependencies: sudo apt install -y python3 python3-pip npm pip3 install -r requirements.txt npm install
  2. Download and Run OpenWebUI:
    • curl -LsSf https://astral.sh/uv/install.sh | sh
  3. Access OpenWebUI:
    • Open your browser and navigate to http://localhost:8080.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *