Complete Guide: Setting Up Ollama on Intel GPU with Intel Graphics Package Manager

I remember using ChatGPT for the first time to write a reply when I received appreciation from the leadership team for my work in my previous company. Nowadays, it is part of day-to-day life; AI has made my life easier. I was wondering what if we can run LLM locally on my laptop. I installed Ollama desktop for Windows on my Laptop. My laptop with just 16 GB RAM was working fine with small models with basic email writing tasks. Using a model with 1b parameters and my regular apps like Teams, Chrome, etc., my laptop was frequently becoming unresponsive. On my another Laptop with a dedicated graphics card, I was able to run models up to 8b parameters smoothly.

I thought why can’t we use Intel GPU to perform the GPU heavy tasks on my laptop. I started exploring and found a reference to Intel Ipex-llm project on Github. You will get a zip file which you can extract and Ollama locally using Intel GPU. I did this setup on Ubuntu 24.04 running on Windows WSL. Here is step by step process:

Update GPU driver on machine

Follow below steps to install packages from Intel:

A. Refresh the package index and install package manager
```
sudo apt-get update
sudo apt-get install -y software-properties-common
```
B. Add intel-graphics Personal Package Archive (PPA)
```
sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
```
C. Install compute related packages
```
sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
```
D. Install media related packages
```
sudo apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo
```
E. Verify installation
```
clinfo | grep "Device Name"
```
If you do not see the result like above, there could be some issue with the user you are using, run below commands to add your user.
```
sudo gpasswd -a ${USER} render
newgrp render
```
Using the above steps, we have installed Intel graphics packages in Ubuntu running in WSL.
Download the file from this link.
Extract the file
```
tar -xvf [Downloaded tgz file path]
```
Go to the extracted folder and run start-ollama.sh
```
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
```

Open another terminal and run your model

cd PATH/TO/EXTRACTED/FOLDER
./ollama run llama3.2:1b

You can verify the GPU usage from task manager.

Conclusion

I was able to run small models like qwen3:1.7b, qwen3:0.6b, llama3.2:1b, and gemma3:1b smoothly. Running deepseek model deepseek-r1:1.5b was giving garbage response. Somehow managed to run gemma3:4b only once after that it was getting failed. What more I can expect on a machine running on 16 GB RAM with an i5 processor. It was good learning, I connected the locally running Ollama with Librechat and played with it.

References: