minute to read
Long time no hear? Well, I’ve been chasing down many bad leads for RAG systems. Some performed terribly, others required a lot of hassle for minimal return, and in many cases, the differences were only noticeable in edge cases.
But in the end, I managed to get a system up and running that I’m comfortable with. Let’s start at the beginning. The focus of Project Skald is an AI chatbot to help me navigate my pen-and-paper notes.
I want to ask the chatbot a simple question. My go-to example is a short note about an adventure set in a Victorian fantasy world. If you don’t know Fallen London, make sure to check it out!
The question I want answered is: “What is the relationship between the ambitious muse and the transfigured poet?”
In the notes, both characters have short biographies, and in one adventure, they interact. The poet becomes the muse’s new protégé. Unfortunately, he soon vanishes, and the muse asks the players to find him. Oh—and she provides him with a drug to enhance his poetry.
I want a clear answer that states their relationship. A little context is fine, but I don’t want to read an entire paragraph (otherwise, I could just search my notes myself). And hallucinations are a hard no. The AI must not invent information. Sometimes the AI says things like:
“While the text does not clearly state this, the poet might be romantically interested in the muse.”
This is tricky. On one hand, the AI clearly marks it as speculation. And often, these guesses are based on interpretations of the text. Sometimes, the AI even spots valid inferences that I deliberately left vague.
If you ask a public AI—like DeepSeek or ChatGPT—about the relationship between the muse and the poet, it’ll give you absolute nonsense. That’s fine—they don’t have my notes. ChatGPT suggests many plausible ideas, but none of them are actually correct.
But if you give the AI context, it can answer very well. When I say “AI,” I really mean a Large Language Model (LLM). These models are incredibly good at generating text—especially at summarizing and extracting relevant information. So if I pass the full note on the muse and poet to the LLM, and then ask my question, the answer is solid. That’s actually most of the “magic.” The LLM effectively summarizes a relevant text search for me.
This just shifts the problem: the quality of the LLM’s response depends heavily on the quality of the provided context. And now we need a way to automatically find good context. This is where RAG comes into play.
Retrieval-Augmented Generation (RAG) builds a context with all the relevant information for the LLM. It searches a database and computes distances between my question and the stored text chunks. A small distance means high correlation. We then take the top 10 most relevant chunks and pass them to the LLM.
Building the database is relatively easy. An embedding model—similar to an LLM—computes an embedding for each text chunk. Based on these embeddings, we can compute similarity distances. Initially, I just split the text by character count and saved the chunks.
The first RAG output wasn’t great, so I started tweaking parameters. There are a lot—and very few guides suggest sensible defaults.
For the LLM part:
Even if the LLM is configured perfectly, RAG itself offers more tuning:
So yes—many parameters to tweak. And since the goal is to overengineer everything, the question isn’t “should I tweak this?” but “can I?”
I went with a Python-based approach. The LLM runs via Ollama on my desktop machine. With up to 12B models, I can test a wide variety. I don’t plan to connect to cloud-based AIs.
The retrieval and database are handled by ChromaDB. Why? It was the easiest to set up. (Maybe I’ll share my Python struggles later—it’s not as portable as I hoped, especially for performant local databases.)
All parameters can be adjusted via a Gradio web UI.
I’m working on getting everything running in a Docker container. Once that’s done, I’ll share the project on GitHub. Until then, I still need to learn a bit more Python packaging and probably write up some documentation in case someone wants to build on the project.
Next up: How I test parameters and deal with even more LLM-related headaches.