How to Run Ollama and Open WebUI on a Headless Arch Server

Intro

AI, for better or worse, is becoming a powerful and nearly indispensable tool in many workflows. However, with the introduction of ads and excessive data collection, there seems to be more reason now than ever to locally host these useful tools. Ollama and Open WebUI provide an easy way to do this, thousands of open-source models to work with and local data storage. Further, I hope this tutorial shows that you can achieve useful results even on an extremely low-spec server, because who can afford a 4090 and 128 GB of RAM these days.

Setting up Ollama is easy and the provided documentation is quite good. However, I wrote this because there seem to be few arch-specific tutorials and I want to have personal documentation in case I ever need to retrace my steps.

As a final note, I am running this on a very under-powered “home lab” that is really just a Lenovo Thinkcentre M910Q with an i5-6000T and 8GB of RAM. Accordingly, I ran these models without a GPU and was lucky enough to deal with incredible wait times. Anecdotally, when I have long wait times to use an LLM it helps me determine when something is actually suited for an LLM. Since I began locally hosting, I found that if a problem is something I can solve on my own or research through traditional means I will solve it myself as opposed to immediately reaching for ChatGPT.

I do not want to take away from the existing arch tutorial by Jeremy Morgan or the official Ollama documentation I am just documenting my process.

Setup

I am going to assume that you have a basic Arch setup and can ssh from your main PC.

First connect to the server and do the obligatory endless Arch updates:

ssh username@host 
sudo pacman -Syu

If you are fortunate enough to have a graphics card run:

sudo pacman -S nvidia-utils

However, this is largely optional as the basic drivers required are included in the kernel.

You may need to setup and check if CUDA is running as well. You can do this through:

nvidia-smi

If you have not yet install docker:

sudo pacman -S docker

and give it privileges:

sudo usermod -aG docker $USER

Now you should be ready.

Basic Ollama in the CLI

This is how to run ollama in the CLI, for how to run Open WebUI go to the next section. However, Ollama in the CLI is faster (as will be shown below).

For my CPU-only setup the download was as easy as running:

sudo pacman -S ollama

Hooray! You now have ollama on your server! Now for the exciting part 🙂

To start your Ollama server type:

ollama serve

To test if it is working you can visit

http://localhost:11434

It should display “Ollama is running”.

Now in Ollama (I typically leave the server running in one CLI and open a new one) you can choose whatever models you want to install from:

https://ollama.com/search

Once you have chosen a model run:

ollama run YOUR_MODEL

In this example and for testing I will be using tinyllama:latest and phi4-mini:latest.

ollama run tinyllama:latest

Now you should be able to chat to your hearts content, without anyone reading over your shoulder!

If you want to exit the chat just type /bye and you will return to the CLI.

Ollama CLI Expanded

If you want something a bit more usable than just talking in a CLI you can call your Ollama server as an API for more modularity and increased control. While you could do this in python or any language I will just be doing a simple call using Curl to calculate the tokens per second of a given model on my hardware.

To do this we can run the request:

curl curl http://localhost:11434/api/generate -d '{
  "model": "your_model",
  "prompt": "your rad prompt"
}'

This should return a lot of text as the model streams its response and then you should get the JSON object with:

{
  "model": "your_model",
  "created_at": "YYYY-MM-DD",
  "response": "",
  "done": true,
  "context": [1, 2, 3],
  "total_duration": duration (in nanoseconds),
  "load_duration": load (in nanoseconds),
  "prompt_eval_count": # of tokens in prompt,
  "prompt_eval_duration": duration (in nanoseconds),
  "eval_count": # of tokens generated,
  "eval_duration": duration (in nanoseconds)
}

To find tokens per second you can take

tps=\frac{EvalCount}{EvalDuration}*10^9

For a reference on just how terrible my hardware is I calculated the tps for tinyllama and phi4-mini with the prompt “write a short story about a snake learning C++” and got:

Model	Params (B)	Tokens	Time	TPS
tinyllama	1	428	19208094931	22.2823
phi-4 mini	3.8	491	84248724631	5.8274

Running Ollama and Open WebUI in a Docker Container

Running on openwebui instead is as easy as running:

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Now if you visit

http://hostname:3000

You should see a login screen, make an account (only stored locally) and now you’re setup!

To add models press the profile in the bottom left->admin panel->settings->models->manage

In this screen just type in the model from the ollama website and it will install.

So if the code on the ollama website is

ollama run tinyllama

just type in tinyllama. Once it downloads you can use the chat screen like it is another model.

Going Further

When I first transitioned from CLI Ollama to Open WebUI it felt very slow. While there is performance loss, when I tested the CLI version vs the UI on the same prompt and found that they were very similar.

Model	Params (B)	Tokens	Time (sec)	TPS
Tinyllama CLI	1	428	19.208	22.282
Phi4-mini CLI	3.8	491	84.258	5.827
Tinlyllama WebUI	1	840	38.993	21.524
Phi4-mini WebUI	3.8	698	121.274	5.756

I have found that Open WebUI has a lot of features that the existing free models are lacking. My personal favorite is that you can run multiple models at the same time and create a tree of merged, repeated and diverging responses and these responses can get feedback and are rated against one another.

You can also see the cumulative use and ratings of the models I use:

Something about it giving ratings like it is chess.com is so exciting to me and is information you would never see for free on corporate cloud models.

Finally, to make sure my models are accessible from everywhere I use tailscale to automate local network access through their vpn.

Thanks for reading:)

How to Run Ollama and Open WebUI on a Headless Arch Server

February 28, 2026

•

Grant VanderKolk

A short guide on how to run Ollama and Open WebUI in an Arch server.
Bad Data for Optimization Project

November 9, 2025

•

Grant VanderKolk

This project seeks to create a framework where in disparate, time-correlated data sources are combined to create an optimizable and not quite random set.

Grant VanderKolk

How to Run Ollama and Open WebUI on a Headless Arch Server

Intro

Setup

Basic Ollama in the CLI

Ollama CLI Expanded

Running Ollama and Open WebUI in a Docker Container

Going Further

How to Run Ollama and Open WebUI on a Headless Arch Server

Bad Data for Optimization Project

ADHD Treatment

Other Posts

How to Run Ollama and Open WebUI on a Headless Arch Server

Bad Data for Optimization Project