Truths: The Refinement Update

What happens when your AI assistant confidently merges “TJ lives in Texas” with “I just moved to Colorado” into some frankentruth about living in both places simultaneously?

That’s the problem I ran into.

I’ve been heads-down on Iris for the past couple weeks, and the Truths system I announced recently has already seen some significant evolution. Quick refresher: Truths are the stable, core facts Iris knows about you. Things like your name, preferences, and relationships that stay relevant across conversations. They’re distinct from contextual memories that might only matter in specific situations.

What started as a way to separate universal facts from contextual memories quickly revealed some gaps in how I was thinking about knowledge refinement. Let me walk through what changed.

The Quality Problem

Before I even got to refinement, I had a more fundamental issue: the truths being created were often garbage.

Here’s an actual example of what the system was promoting: “Ellis has a 12 year old son that had a snow day on December 10th.”

See the problem? The stable, identity-defining fact is “Ellis has a son.” The age is temporal (it’ll change). The snow day is a one-time event that has no business being in a truth at all. But the old system would happily bundle it all together.

The fix was being much stricter about what qualifies as a truth in the first place. I rewrote the promotion logic to distinguish between:

Identity-defining facts: who the user IS (relationships, values, stable preferences)
Episodic events: what the user DID once (specific moments, one-time occurrences)
Temporal details: things that will become outdated (ages, dates, “recently”, “this week”)

Now when a memory like “Ellis’s son had a snow day on December 10th” comes through, the system extracts “Has a son” and drops everything else. When “User’s been vegetarian for 3 years after watching a documentary” comes through, it becomes “Is vegetarian.” The duration and catalyst are temporal context that doesn’t belong in a truth.

The key insight: not every high-access memory deserves to be a truth. Some things get referenced frequently but aren’t identity-defining. Your office decorations, the specific game you’re playing this month, tool configurations. These might surface a lot in conversation but they don’t describe who you are.

The system now asks: “Will this still be true and relevant six months from now? Does it describe who the user is, or just what they did?” If the answer is no, it stays as a memory. Memories are still searchable and useful. They just don’t get elevated to core facts.

The Generation Problem

In the original implementation, truths could be “crystallized” - refined through the combination of new evidence from memories. But I was tracking this with a simple crystallization_count integer. Every time a truth got refined, bump the counter.

The problem? It didn’t capture what I actually cared about: how far removed is this truth from its original evidence?

A truth crystallized 5 times from brand new memories is fundamentally different from a truth that’s been refined through 5 generations of increasingly abstract consolidation. The former is still closely tied to concrete observations. The latter might be several levels of inference deep.

So I replaced crystallization_count with generation. Generation 0 means the truth came directly from memories. Generation 1 means it was refined once. And so on, up to a configurable maximum.

This feels a lot cleaner. I can now see at a glance how “derived” a truth is from its original evidence. A generation 0 truth is basically a direct promotion. A high-generation truth has been through the wringer, and at a certain point the system protects it from further automatic refinement.

When Evidence Contradicts

Here’s where things got interesting.

The original crystallizer would happily merge new evidence into existing truths. “TJ lives in Texas” + new memory about Austin details = slightly richer truth about living in Texas. Simple.

But what happens when I tell Iris “I just moved to Colorado”?

The old system would try to merge these. The result would be some frankentruth about living in both Texas and Colorado simultaneously. Not great.

I needed contradiction detection.

The Conflict System

Now when the crystallizer encounters new evidence, it first analyzes whether that evidence contradicts the existing truth. If it does, instead of blindly merging, it creates a conflict record that captures the existing content, new evidence, a proposed resolution, and the AI’s reasoning for the change.

Depending on evidence strength and whether the truth is protected, these conflicts either auto-resolve or get flagged for human review.

Evidence Strength

Not all contradictions are equal. If I mentioned Colorado once in passing, that probably shouldn’t override years of Texas-related memories. But if I’ve been consistently talking about my new life in Colorado across multiple conversations? That evidence is stronger.

The system calculates an evidence strength score based on how frequently the memory has been accessed, how recently it’s been relevant, and how refined it is. Strong evidence against unprotected truths can auto-resolve. Weaker evidence gets flagged for human review.

Protected Truths

Some truths shouldn’t be auto-updated regardless of evidence strength:

Pinned truths - I explicitly marked these as important
User-created truths - I wrote these manually
Agent-created truths - The AI decided these were significant enough to create directly

These always get flagged for review, even with overwhelming new evidence. I want a human in the loop for the things that matter most.

Temporal Facts

The crystallizer prompt got a significant update to handle temporal facts correctly. Age, job titles, locations, relationship status. These change over time, and the system needs to understand that newer information should replace older information, not merge with it.

When my son turns 13, that doesn’t contradict him being 12. It supersedes it. Same with job promotions, moves, relationship changes. The crystallizer now understands this distinction and handles temporal updates as natural progressions rather than conflicts.

Truths About Others

Here’s something that bit me early on: subject ambiguity.

If Iris stores “Has ADHD” as a truth, who has ADHD? Me? My son? My coworker I mentioned in passing? The system had no way to know, and that ambiguity could lead to some awkward (or worse, harmful) assumptions.

The fix was enforcing subject preservation. Instead of “Has ADHD,” the system now captures “User’s son has ADHD.” Instead of “Works at Google,” it’s “User’s wife works at Google.”

Category hints help with this too. A truth in the “Relationships” category is probably about someone else. “Personal” or “Preferences” categories are usually about me. The crystallizer uses these signals to maintain clarity about who each fact actually describes.

The Conflict Resolution UI

All these conflicts need to go somewhere. I built out a dedicated UI for reviewing and resolving truth conflicts.

Each conflict shows:

The existing truth content
The new evidence that triggered the conflict
The proposed update (what the AI thinks the truth should become)
The AI’s reasoning for the change
Evidence strength score

Resolution options:

Accept - Use the proposed content as-is
Reject - Keep the original truth, discard the new evidence
Merge - Edit the proposed content manually before accepting

The merge option turned out to be crucial. Sometimes the AI’s proposed update is close but not quite right. Being able to tweak it before accepting saves a lot of back-and-forth.

Cleaner Candidate Selection

One more change worth mentioning: the distillation process now has a minimum absolute access threshold in addition to the percentile threshold.

Previously, if you only had 10 memories and 2 of them had been accessed once, those 2 memories would be in the top 20th percentile and become promotion candidates. That’s… not great. A memory accessed once isn’t really proving its value.

Now there’s a configurable minimum access threshold. A memory has to meet both the percentile threshold and the absolute minimum to become a candidate. This prevents premature promotion while keeping the percentile-based approach that adapts to usage patterns.

Configuration

All of this is configurable. Maximum generation depth, evidence thresholds for auto-resolution, minimum access counts for promotion, staleness timeouts, and how many dynamic truths get included per conversation.

The defaults I landed on feel right for most use cases, but the ability to tune these knobs is important. Different applications have different tolerances for how aggressively truths should update or how derived they can become before stabilizing.

What’s Next

The truth system is feeling more robust now. Truths being promoted are actually identity-defining facts, not event logs with temporal garbage attached. Conflicts get caught instead of creating weird merged truths. Temporal facts update correctly. Generation tracking gives me visibility into how derived each truth is. And subject preservation means Iris won’t confuse my son’s ADHD diagnosis with my own health history.

Right now I’m in tuning mode. Running distillation, evaluating what gets promoted, adjusting the thresholds. The system is live, but I’m watching it closely to see how these changes play out in practice. There’s probably more refinement ahead as I learn what works and what doesn’t.

That frankentruth problem I mentioned at the start? Solved.

If you want the full technical breakdown, check out the Truths documentation. It covers everything from the retrieval architecture to the Artisan commands for manual distillation.