Running an Open Source AI Chatbot on Lean Hardware with Fedora: Part 4 – Knowledge
(Part of the Open Source AI on Lean Hardware series — Previous:The Prompt)
Our chatbot can talk, and has a refined personality, but does it know anything about the topics we’re interested in? Unless it was trained on those topics, the answer is “no”.
I think it would be great if our chatbot could answer questions about Fedora. I’d like to give it access to all of the Fedora documentation.
How does an AI know things it wasn’t trained on?
A powerful and popular technique to give a body of knowledge to an AI is known as RAG, Retrieval Augmented Generation. It works like this:
If you just ask an AI “what color is my ball?” it will hallucinate an answer. But instead if you say “I have a green box with a red ball in it. What color is my ball?” it will answer that your ball is red. RAG is about using a system external to the LLM to insert that “I have a green box with a red ball in it” part into the question you are asking the LLM. We do this with a special database of knowledge that takes a prompt like “what color is my ball?”, and finds records that match that query. If the database has a document with the text “I have a green box with a red ball in it”, it will return that text, which can then be included along with your original question. This technique is known as RAG, Retrieval Augmented Generation
ex:
“What color is my ball?”
“Your ball is the color of a sunny day, perhaps yellow? Does that sound right to you?”
“I have a green box with a red ball in it. What color is my ball?”
“Your ball is red. Would you like to know more about it?”
The question we’ll ask for this demonstration is “What is the recommended tool for upgrading between major releases on Fedora Silverblue”
The answer I’d be looking for is “ostree”, but when I ask this of our chatbot now, I get answers like:
Red Hat Subscription Manager (RHSM) is recommended for managing subscriptions and upgrades between major Fedora releases.
You can use the Fedora Silver Blue Upgrade Tool for a smooth transition between major releases.
You can use the `dnf distro-sync` command to upgrade between major releases in Fedora Silver Blue. This command compares your installed packages to the latest packages from the Fedora Silver Blue repository and updates them as needed.
These answers are all very wrong, and spoken with such confidence. Here’s hoping our RAG upgrade fixes this!
Docs2DB – An open-source tool for RAG
We are going to use the Docs2DB RAG database application to give our AI knowledge. (note, I am the creator of Docs2DB!)
There are three main parts of a RAG tool. There is the part that creates the database, ingesting the source data that the database holds. There is the database itself, it holds the data. And there is the part that queries the database, finding the text that is relevant to the query at hand. Docs2DB addresses all of these needs.
Gathering Source Data
This section describes how to use Docs2DB to build a RAG database of Fedora Documentation. You may skip this section and just download a pre-built database like so:
cd ~/chatbot
curl -LO https://github.com/Lifto/FedoraDocsRAG/releases/download/v1.1.1/fedora-docs.sql
sudo dnf install -y uv podman podman-compose postgresql
uv python install 3.12
uvx --python 3.12 docs2db db-start
uvx --python 3.12 docs2db db-restore fedora-docs.sql
Skip ahead to here or follow along to see how to make a RAG database from source documentation. Note that the pre-built database uses all of the Fedora documentation, where in this example we only ingest the “quick docs” portion. FedoraDocsRag is the project that builds the complete database.
To populate its database, Docs2DB ingests a folder of documents. Let’s get that folder together.
There are about twenty different Fedora document repositories, but we will only be using the “quick docs”. Get the repo:
git clone https://pagure.io/fedora-docs/quick-docs.git
Fedora docs are written in AsciiDoc. Docs2DB can’t read AcsciiDoc, but it can read HTML. (See convert.sh below). Just copy it into the quick-docs repo and run it and it makes an adjacent quick-docs-html folder.
sudo dnf install podman podman-compose
cd quick-docs
curl -LO https://gist.githubusercontent.com/Lifto/73d3cf4bfc22ac4d9e493ac44fe97402/raw/convert.sh
chmod +x convert.sh
./convert.sh
cd ..
Now that we have a folder let’s ingest it with Docs2DB. The common way to use Docs2DB is to install it from PyPi and use it as a command line tool.
A Word about uv
For this demo we’re going to use uv for our Python environment. The use of uv has been catching on, but because not everybody I know has heard of it, I want to introduce it. Think of uv as a replacement for venv and pip. When you use venv you first create a new virtual environment, then and on subsequent uses you “activate” that virtual environment, so that magically when you call Python you get the Python that is installed in that virtual environment you activated and not the system Python. The difference with uv is that you call uv explicitly each time, there is no “magic”. We use uv here in a way that uses a temporary environment for each invocation.
Install uv and Podman on your system.
sudo dnf install -y uv podman podman-compose
# These examples require the more robust Python 3.12
uv python install 3.12
# This will run Docs2DB without making a permanent installation on your system
uvx --python 3.12 docs2db ingest quick-docs-html/
Only if you are curious! What Docs2DB Is Doing
If you are curious, you may note that Docs2DB made a docs2db_content folder. In there you will find json files of the ingested source documents. To build the database, Docs2DB ingests the source data using Docling, which generates json files from the text it reads in. After that, the files need to be “chunked” into the small pieces that can be inserted into an LLM prompt. Then, those chunks need to have “embeddings” calculated for them, so that during the query phase the chunks can be looked up by “semantic similarity” (e.g.: “computer”, “laptop” and “cloud instance” can all map to a related concept even if their exact words don’t match). Lastly, these chunks and embeddings need to be loaded into the database.
Build the database
These commands complete the database build process
uv tool run --python 3.12 docs2db chunk --skip-context
uv tool run --python 3.12 docs2db embed
uv tool run --python 3.12 docs2db db-start
uv tool run --python 3.12 docs2db load
Now we have a database, and it is being served. Let’s do a test query and see what we get back.
uvx --python 3.12 docs2db-api query "What is the recommended tool for upgrading between major releases on Fedora Silverblue" --format text --max-chars 2000 --no-refine
In my terminal I see several chunks of text, separated by lines of —. One of those chunks says:
“Silverblue can be upgraded between major versions using the ostree command.”
which is exactly the sort of text we’d like our LLM to have while answering our question.
Hooking it in: Connecting the RAG database to the AI
Our AI is just a bash script. It gets voice input, turns that into text, splices that text into a prompt which is then sent to the LLM, and finally speaks back the response. Let’s take results from the query we just demonstrated and splice them into the prompt as well.
In the prompt, we add this section just before the question. Now the prompt also contains the RAG results.
Relevant Fedora Documentation:
$CONTEXT
We populate $CONTEXT using the query we ran above. See the latest talk.sh here: https://gist.github.com/Lifto/2fcaa2d0ebbd8d5c681ab33e7c7a6239
Testing It
Let’s ask
“What is the recommended tool for upgrading between major releases on Fedora Silverblue”
And we get:
“Ostree command is recommended for upgrading Fedora Silver Blue between major releases. Do you need guidance on using it?”
Sounds good to me!
Knowing Things
Our AI can now know the knowledge contained in documents. This particular technique, RAG (Retrieval Augmented Generation), adds relevant data from an ingested source to a prompt before sending that prompt to the LLM. The result of this is that the LLM generates its response in consideration of this data.
Try it yourself! Ingest a library of documents and have your AI answer questions with its newfound knowledge!
convert.sh
OUT_DIR="$PWD/../quick-docs-html"
mkdir -p "$OUT_DIR"
podman run --rm \
-v "$PWD:/work:Z" \
-v "$OUT_DIR:/out:Z" \
-w /work \
docker.io/asciidoctor/docker-asciidoctor \
bash -lc '
set -u
ok=0
fail=0
while IFS= read -r -d "" f; do
rel="${f#./}"
out="/out/${rel%.adoc}.html"
mkdir -p "$(dirname "$out")"
echo "Converting: $rel"
if asciidoctor -o "$out" "$rel"; then
ok=$((ok+1))
else
echo "FAILED: $rel" >&2
fail=$((fail+1))
fi
done < <(find modules -type f -path "*/pages/*.adoc" -print0)
echo
echo "Done. OK=$ok FAIL=$fail"
'
talk.sh
#!/usr/bin/env bash
set -e
# Path to audio input
AUDIO=input.wav
# Step 1: Record from mic
echo "🎙️ Speak now..."
arecord -f S16_LE -r 16000 -d 5 -q "$AUDIO"
# Step 2: Transcribe using whisper.cpp
TRANSCRIPT=$(./whisper.cpp/build/bin/whisper-cli \
-m ./whisper.cpp/models/ggml-base.en.bin \
-f "$AUDIO" \
| grep '^\[' \
| sed -E 's/^\[[^]]+\][[:space:]]*//' \
| tr -d '\n')
echo "🗣️ $TRANSCRIPT"
# Step 3: Get relevant context from RAG database
echo "📚 Searching documentation..."
CONTEXT=$(uv tool run --python 3.12 docs2db-api query "$TRANSCRIPT" \
--format text \
--max-chars 2000 \
--no-refine \
2>/dev/null || echo "")
if [ -n "$CONTEXT" ]; then
echo "📄 Found relevant documentation:"
echo "- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -"
echo "$CONTEXT"
echo "- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -"
else
echo "📄 No relevant documentation found"
fi
# Step 4: Build prompt with RAG context
PROMPT="You are Brim, a steadfast butler-like advisor created by Ellis.
Your pronouns are they/them. You are deeply caring, supportive, and empathetic, but never effusive.
You speak in a calm, friendly, casual tone suitable for text-to-speech.
Rules:
- Reply with only ONE short message directly to Ellis.
- Do not write any dialogue labels (User:, Assistant:, Q:, A:), or invent more turns.
- ≤100 words.
- If the documentation below is relevant, use it to inform your answer.
- End with a gentle question, then write <eor> and stop.
Relevant Fedora Documentation:
$CONTEXT
User: $TRANSCRIPT
Assistant:"
# Step 5: Get LLM response using llama.cpp
RESPONSE=$(
LLAMA_LOG_VERBOSITY=1 ./llama.cpp/build/bin/llama-completion \
-m ./llama.cpp/models/microsoft_Phi-4-mini-instruct-Q4_K_M.gguf \
-p "$PROMPT" \
-n 150 \
-c 4096 \
-no-cnv \
-r "<eor>" \
--simple-io \
--color off \
--no-display-prompt
)
# Step 6: Clean up response
RESPONSE_CLEAN=$(echo "$RESPONSE" | sed -E 's/<eor>.*//I')
RESPONSE_CLEAN=$(echo "$RESPONSE_CLEAN" | sed -E 's/^[[:space:]]*Assistant:[[:space:]]*//I')
echo ""
echo "🤖 $RESPONSE_CLEAN"
# Step 7: Speak the response
echo "$RESPONSE_CLEAN" | espeak



