Intro
AI, for better or worse, is becoming a powerful and nearly indispensable tool in many workflows. However, with the introduction of ads and excessive data collection, there seems to be more reason now than ever to locally host these useful tools. Ollama and Open WebUI provide an easy way to do this, thousands of open-source models to work with and local data storage. Further, I hope this tutorial shows that you can achieve useful results even on an extremely low-spec server, because who can afford a 4090 and 128 GB of RAM these days.
Setting up Ollama is easy and the provided documentation is quite good. However, I wrote this because there seem to be few arch-specific tutorials and I want to have personal documentation in case I ever need to retrace my steps.
As a final note, I am running this on a very under-powered “home lab” that is really just a Lenovo Thinkcentre M910Q with an i5-6000T and 8GB of RAM. Accordingly, I ran these models without a GPU and was lucky enough to deal with incredible wait times. Anecdotally, when I have long wait times to use an LLM it helps me determine when something is actually suited for an LLM. Since I began locally hosting, I found that if a problem is something I can solve on my own or research through traditional means I will solve it myself as opposed to immediately reaching for ChatGPT.
I do not want to take away from the existing arch tutorial by Jeremy Morgan or the official Ollama documentation I am just documenting my process.
Setup
I am going to assume that you have a basic Arch setup and can ssh from your main PC.
First connect to the server and do the obligatory endless Arch updates:
ssh username@host
sudo pacman -Syu
If you are fortunate enough to have a graphics card run:
sudo pacman -S nvidia-utils
However, this is largely optional as the basic drivers required are included in the kernel.
You may need to setup and check if CUDA is running as well. You can do this through:
nvidia-smi
If you have not yet install docker:
sudo pacman -S docker
and give it privileges:
sudo usermod -aG docker $USER
Now you should be ready.
Basic Ollama in the CLI
This is how to run ollama in the CLI, for how to run Open WebUI go to the next section. However, Ollama in the CLI is faster (as will be shown below).
For my CPU-only setup the download was as easy as running:
sudo pacman -S ollama
Hooray! You now have ollama on your server! Now for the exciting part 🙂
To start your Ollama server type:
ollama serve
To test if it is working you can visit
http://localhost:11434
It should display “Ollama is running”.
Now in Ollama (I typically leave the server running in one CLI and open a new one) you can choose whatever models you want to install from:
Once you have chosen a model run:
ollama run YOUR_MODEL
In this example and for testing I will be using tinyllama:latest and phi4-mini:latest.
ollama run tinyllama:latest
Now you should be able to chat to your hearts content, without anyone reading over your shoulder!
If you want to exit the chat just type /bye and you will return to the CLI.
Ollama CLI Expanded
If you want something a bit more usable than just talking in a CLI you can call your Ollama server as an API for more modularity and increased control. While you could do this in python or any language I will just be doing a simple call using Curl to calculate the tokens per second of a given model on my hardware.
To do this we can run the request:
curl curl http://localhost:11434/api/generate -d '{
"model": "your_model",
"prompt": "your rad prompt"
}'
This should return a lot of text as the model streams its response and then you should get the JSON object with:
{
"model": "your_model",
"created_at": "YYYY-MM-DD",
"response": "",
"done": true,
"context": [1, 2, 3],
"total_duration": duration (in nanoseconds),
"load_duration": load (in nanoseconds),
"prompt_eval_count": # of tokens in prompt,
"prompt_eval_duration": duration (in nanoseconds),
"eval_count": # of tokens generated,
"eval_duration": duration (in nanoseconds)
}
To find tokens per second you can take
For a reference on just how terrible my hardware is I calculated the tps for tinyllama and phi4-mini with the prompt “write a short story about a snake learning C++” and got:
| Model | Params (B) | Tokens | Time | TPS |
| tinyllama | 1 | 428 | 19208094931 | 22.2823 |
| phi-4 mini | 3.8 | 491 | 84248724631 | 5.8274 |
Running Ollama and Open WebUI in a Docker Container
Running on openwebui instead is as easy as running:
docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
Now if you visit
http://hostname:3000
You should see a login screen, make an account (only stored locally) and now you’re setup!
To add models press the profile in the bottom left->admin panel->settings->models->manage
In this screen just type in the model from the ollama website and it will install.
So if the code on the ollama website is
ollama run tinyllama
just type in tinyllama. Once it downloads you can use the chat screen like it is another model.
Going Further
When I first transitioned from CLI Ollama to Open WebUI it felt very slow. While there is performance loss, when I tested the CLI version vs the UI on the same prompt and found that they were very similar.
| Model | Params (B) | Tokens | Time (sec) | TPS |
| Tinyllama CLI | 1 | 428 | 19.208 | 22.282 |
| Phi4-mini CLI | 3.8 | 491 | 84.258 | 5.827 |
| Tinlyllama WebUI | 1 | 840 | 38.993 | 21.524 |
| Phi4-mini WebUI | 3.8 | 698 | 121.274 | 5.756 |
I have found that Open WebUI has a lot of features that the existing free models are lacking. My personal favorite is that you can run multiple models at the same time and create a tree of merged, repeated and diverging responses and these responses can get feedback and are rated against one another.


You can also see the cumulative use and ratings of the models I use:


Something about it giving ratings like it is chess.com is so exciting to me and is information you would never see for free on corporate cloud models.
Finally, to make sure my models are accessible from everywhere I use tailscale to automate local network access through their vpn.
Thanks for reading:)

