Facts, Arguments, Theses: Building AI Knowledge Retrieval on Meaning, Not Slices

Progress notes from Zettelgarden: experimenting with LLM pipelines that embed semantic units - theses, arguments, and facts - instead of raw text chunks

Aug 22, 2025

Back in December, I wrote about Zettelgarden’s entity extraction (AI‑pulled people, places, concepts). This update is me jotting down what I’ve been experimenting with lately in knowledge retrieval: what’s working, where it falls short, and what I want to try next

Going back to the original problem to solve - I have a piece of information, I want to find other pieces of information like it. When is something like something else? Not exactly an easy problem to solve. RAG (chunking + embeddings + vector DB) is the standard way, but in practice I’ve found it misses nuance at scale.

My impressions from trying different tools out is that existing tools for RAG often miss the mark. I get the sense that they are often developed while testing with very small data sets which gives the false impression of quality. When used seriously with large data sets, they fail to work as well. An example of this is Open WebUI’s Knowledge features. From my experimentation, it seems to struggle to appropriately use Knowledge with a data set of around 10 documents. I’m looking to work with data sets three magnitudes bigger.

Instead of embedding arbitrary text slices, we can embed structured, context-aware building blocks - theses, arguments, and facts. From my testing, this gives LLMs a leg up in generating summaries of articles, and hopefully will make large-scale knowledge work more practical. That’s the shift I’m pursuing with Zettelgarden.

How I’m Wiring It Up

As with many LLM related topics, the extraction process is a relatively simple process of connecting LLM requests together. I’d call it agentic for marketing purposes, but its mostly a regular ‘loop over a range of chunks’.

Here’s the JSON‑only extraction prompt I’ve been testing (stripped down but representative):

You are an assistant that extracts theses, facts, and arguments from text. We are trying to come up with a coherent summary of the article/podcast/book/etc. You will be looking at some or all of the writing and need to extract certain things from it. You will see the previously extracted theses and arguments from earlier chunks of the work. Please use this to inform yourself on where we have been and based on that, where the writer has taken the arguments in this section. 

Format Example:
{
  "thesis": "...",
  "facts": ["...", "..."],
  "arguments": [
    {"argument": "...", "importance": 8},
    {"argument": "...", "importance": 5}
  ]
}`

The key though, and I think one of my innovations, is that this prompt is altered for the second and subsequent chunks by feeding in discovered theses and arguments. This allows the LLM to read each chunk with the full context of the text. Often with longer works, you’ll have an introductory paragraph that describes the thesis and arguments, then each additional section goes into detail on one of those arguments. Without considering the full context as you go on, the LLM will have no idea what the relative importance of each argument is. With the context, it can make more decisions as it goes, link ideas together across the chunks, etc.

Applications: Summaries and improved fact retrieval

There are two outcomes of this process of breaking text down into component parts. One is that it’s really easy to generate summaries of texts. This nearly comes for free, we already have a tree structure of what the arguments of the text are. From here, I ask the LLM to put together a human readable summary, broken down into an executive summary and then a more detailed reference of the main arguments and some facts to back it up.

I feed the compiled information into an LLM with this (condensed) prompt:

Summarize the following aggregated analysis into a two-part markdown summary. The output should be **structured, concise, and tailored to distinct audiences**.

### Instructions:
1. **Format:** Use headings, subheadings, and bullets for clarity.

2. **Section 1: Executive Summary**  
   - Audience: Senior management, decision-makers, or non-specialist readers.  
   - Style: Concise, strategic, and outcome-focused.  
   - Length: ~4–6 bullet points.  

3. **Section 2: Reference Summary**  
   - Audience: Researchers, analysts, technical leads, or specialists.  
   - Style: Well-structured, factual, and precise.  
   - Present information in a hierarchy (for each theses, show its supporting arguments → facts).  
   - Exclude secondary/tangential details.

This is an example of what a summary might look like:

Executive summary of a Lex Fridman podcast with DHH

This I think is good, useful for me as a human. For shorter articles, the benefit isn’t obvious but with really long content (like that five‑hour podcast), the payoff is clear

The other artifact of this process is a list of facts:

This is where I think the value really is. If we are going back to the original RAG discussion, the purpose is to find information based on other information. I think this sort of list of facts is exactly the sort of thing we would want to be finding. By breaking it down into individual facts like this, vector embeddings are even better at making matches.

Where I’m Headed Next

That’s where things stand right now. The system produces workable summaries and fact lists but I don’t know yet how much of that represents real retrieval improvement vs. just looking neat. I’m looking next to build a repeatable way to test retrieval quality and summary clarity. I’m especially curious to see if this holds up for really long texts (like monographs) vs. shorter articles. More to come once I’ve had a chance to run those experiments.

I want to tackle this in a few ways:

Retrieval quality: Test whether fact‑based embeddings retrieve more relevant documents than traditional chunk embeddings. For example, given a query, do we get the right supporting facts more often?
Summary quality: Create lightweight rubrics—clarity, coverage of main theses, evidence usefulness—that humans can score quickly and consistently.
Use-case benchmarks: Compare performance across content types (short articles vs. long-form works like monographs) to see if the value grows with document length.

The goal is a repeatable evaluation loop: not just to check that outputs feel useful, but to measure when and where this structured extraction actually outperforms standard RAG.

You can check out Zettelgarden either on its website or on Github: https://github.com/NickSavage/Zettelgarden

Half‑Baked Thoughts

you still need to chunk, long documents won’t fit into the context window. I’d like to be able to throw an academic monograph into this and get relevant answers. I think that this method might provide the same sort of summary for short articles as just jumping it into the context window, but it will be much better for anything longer.
Future improvements will be around managing the context the LLM gets. Right now, I’m giving it all the theses and arguments found along the way. I’m probably going to have another processing step of cleaning that list up, deduplicating it, removing things that might not be as relevant, etc.
One thing you might notice in the list of facts is that a number of those facts are only relevant in the context of other facts and the article. For example, “He believes modern web development often overcomplicates simple CRUD operations”. Who is ‘he’? This isn’t quite the discrete fact I wanted, I will need to play with this to come up with a better way.
- Another example of this was an article about AI pilot projects, the fact was something like 95% of pilots fail. Without the context of AI projects, you (or a future AI!) might think that was about human aircraft pilots!

Nick Savage

Discussion about this post