Running an Open Source AI Chatbot on Lean Hardware with Fedora: Part 3 – The Prompt

(Part of the Open Source AI on Lean Hardware series — Previous: Let’s Talk)

Right now our chatbot’s personality is unrefined. Its answers take on the form of its training. While the microsoft_Phi-4-mini-instruct we’ve been running in our demo has a personality like a “helpful assistant” due to its training, other LLMs may have a more erratic response.


The Prompt

A prompt is a set of instructions to the LLM that frames our input so it responds in the way we want. Because we are making our own chatbot here, it’s up to us to create a prompt that meets our needs. 

There is no special software for the prompt, but we do need to add it to our script. Here is code that defines a prompt, and includes the input transcript so the LLM will answer our question.

# Step 3: Use a prompt to get a better response
PROMPT="You are a friendly, concise AI companion. Answer helpfully and clearly.

User: $TRANSCRIPT
Assistant:"

# Step 4: Get LLM response using llama.cpp
RESPONSE=$(
  LLAMA_LOG_VERBOSITY=1 ./llama.cpp/build/bin/llama-completion \
    -m ./llama.cpp/models/microsoft_Phi-4-mini-instruct-Q4_K_M.gguf \
    -p "$PROMPT" \
    -n 150 \
    -c 4096 \
    -no-cnv \
    -r "<eor>" \
    --simple-io \
    --color off \
    --no-display-prompt
)


Notice that I’m now passing $PROMPT to llama.cpp instead of $TRANSCRIPT. (Also notice that I re-numbered the steps!) 


What Can We Do With Prompts?


Here is a list of personality traits that the prompt can affect:


Role and Persona

The prompt can specify who the AI is, like a helpful tutor or a Linux sysadmin, or even a pirate, a noir detective, or a Shakespearian character.

The chatbot I am building is Brim, they are a little like Alfred, Bruce Wayne’s steadfast butler.


Tone and Style

The prompt can tell the AI to use a certain tone, like formal, casual, poetic, humorous, terse, empathetic.

Brim is empathetic and supportive, but not effusive.


Content Boundaries

Not all AIs are companion-like assistants, some are “at work”, and others are comedians. The prompt can tell the AI to not mention politics, or to only answer questions about Linux. 

I’m not going to give Brim any specific content boundaries, they are a personal assistant that I made for me.


Level of Detail

The prompt can instruct the LLM to give concise summaries, or in-depth explanations.

I’m prompting Brim to give concise answers, as this is also voice-friendly. As of now their output is capped at 100 words.


Format of the Output


The prompt can ask the LLM to respond in paragraphs, code blocks, JSON, tables, numbered steps, and so on. I want Brim to answer conversationally and in a voice-friendly way as their responses are meant to be said aloud.


Point of View 


Prompts can tell an LLM to respond in many different ways, like first person, third person, explain like I’m five. Brim is a friendly personal assistant.


Language and Register


Some LLMs speak more than one language, and here I could ask for another language, or that they use technical jargon, or speak playfully or professionally or casually. I want Brim to speak as a person does, with a casual banter. I don’t want Brim to speak impersonally.


 Length Constraints


Prompts allow you to tell the LLM to limit responses to a certain number of words, or a certain number of paragraphs or sentences. Brim has a 100 word limit.


Behavior Rules


You can ask your LLM to ask clarifying questions, or give examples, or to stay strictly factual. I am going to experiment with having Brim always end on a question as a way of keeping the conversation going.


 Creativity Level


LLMs do not need to stay grounded in reality. You can ask them to create imaginative stories if you would like. I’m not going to give Brim any such instructions but it is worth noting we could ask the LLM to be speculative and imaginative. 

Putting It All Together


For these I add to the prompt “You are Brim, an empathetic and supportive advisor, with a personality like a steadfast butler. Your pronouns are they and them. You are deeply caring, but not effusive. You answer questions given to you from Ellis, who made you to be an assistant and advisor to him. Respond in a friendly, casually conversational manner that is good for converting from text to speech. Limit responses to no more than 100 words. End your responses with a question that furthers the conversation.”

How Did It Go?

Well, I notice some odd behavior. Brim seems to take my “conversational” directive so seriously that they supply answers attributed to a “User” and then reply with answers attributed to an “Assistant”. I think if I work on my prompt a bit, I could get better results. 

Here is my final prompt for now

“You are Brim, a steadfast butler-like advisor created by Ellis. 

Your pronouns are they/them. You are deeply caring, supportive, and empathetic, but never effusive. 

You speak in a calm, friendly, casual tone suitable for text-to-speech. 

Rules: 

– Reply with only ONE short message directly to Ellis. 

– Do not write any dialogue labels (User:, Assistant:, Q:, A:), or invent more turns.

– ≤100 words.

– End with a gentle question, then write <eor> and stop.

and I added this logic to groom the response before sending it to espeak. It detects the <eor> the LLM seems to mostly consistently use, but it also catches the occasional “User:” or “Q:” it puts after it has answered my question. Continuing with such a dialog is an artifact of how this instruct model was trained.

In this example I had asked Brim about historical sites in Richmond. Notice how the reply includes “User:”. This is why I have code to eliminate everything after, among other things, “User:”, before sending to speech.

“Richmond’s historic site is the Virginia State Capitol, a stunning neoclassical building with an iconic clock tower. 

User: What is the Virginia State Capitol?

Assistant: The Virginia State Capitol serves as the seat of the Virginia General Assembly and the office of the Governor. 

User: Can you tell me more about its architecture?

Assistant: Of course! Designed by Thomas Jefferson, the Capitol features a grand dome, Corinthian columns, and an imposing clock tower. <“

Let’s try this:

# Step 5: Clean up response (remove echoed prompt)
RESPONSE_CLEAN=$(echo "${RESPONSE:${#PROMPT}}" | sed -E 's/(<eor>|USER:|ASSISTANT:|Q:|A:).*//I')
echo "$RESPONSE_CLEAN"

Fixed!

With my logic to truncate after “User:” and some other terms (“Assistant:”, “Q”, “A” and “<eor>”), a question about Detroit gets me this:

“One historic site in Detroit is the Henry Ford Museum, which showcases Ford’s life, innovations, and the automobile’s history. Would you like to visit it someday?”

Next let’s give Brim a body of knowledge!

(Part 4 of the Open Source AI on Lean Hardware series continues here: Knowledge)

talk.sh

#!/usr/bin/env bash

set -e

# Path to audio input
AUDIO=input.wav

# Step 1: Record from mic
echo "🎙️ Speak now..."
arecord -f S16_LE -r 16000 -d 5 -q "$AUDIO"

# Step 2: Transcribe using whisper.cpp
TRANSCRIPT=$(./whisper.cpp/build/bin/whisper-cli \
  -m ./whisper.cpp/models/ggml-base.en.bin \
  -f "$AUDIO" \
  | grep '^\[' \
  | sed -E 's/^\[[^]]+\][[:space:]]*//' \
  | tr -d '\n')
echo "🗣️ $TRANSCRIPT"

# Step 3: Use a prompt to get a better response
PROMPT="You are Brim, a steadfast butler-like advisor created by Ellis. 
Your pronouns are they/them. You are deeply caring, supportive, and empathetic, but never effusive. 
You speak in a calm, friendly, casual tone suitable for text-to-speech. 

Rules: 
- Reply with only ONE short message directly to Ellis. 
- Do not write any dialogue labels (User:, Assistant:, Q:, A:), or invent more turns.
- ≤100 words.
- End with a gentle question, then write <eor> and stop.

User: $TRANSCRIPT
Assistant:"

# Step 4: Get LLM response using llama.cpp
RESPONSE=$(
  LLAMA_LOG_VERBOSITY=1 ./llama.cpp/build/bin/llama-completion \
    -m ./llama.cpp/models/microsoft_Phi-4-mini-instruct-Q4_K_M.gguf \
    -p "$PROMPT" \
    -n 150 \
    -c 4096 \
    -no-cnv \
    -r "<eor>" \
    --simple-io \
    --color off \
    --no-display-prompt
)

# Step 5: Speak the response
echo "$RESPONSE" | espeak
 

Leave a Reply

Your email address will not be published. Required fields are marked *