Building a RAG-Powered Chatbot: Grounding AI in Real Data | Allen Walker

Most chatbots are party tricks. They sound impressive for thirty seconds, then confidently tell you something that isn't true. I wanted to build something different — an interactive experience on my portfolio site that could answer real questions about my background, projects, and expertise, grounded in actual data rather than hallucinated narratives.

This is the story of building that system, what's working, what's still in progress, and why the hardest part has nothing to do with the AI itself.

Why a Chatbot?

A portfolio site is inherently static. Someone lands on a page, reads what's there, and either reaches out or moves on. But people don't always know what they're looking for. A recruiter might want to know if I've worked with Salesforce. A founder might want to understand my advisory experience. A hiring manager might want specifics about enterprise go-lives.

A chatbot lets visitors ask what they actually want to know, in their own words, and get relevant answers pulled directly from my experience. Not generic AI responses — grounded, cited, verifiable answers.

The Architecture: Intent + Retrieval + Synthesis

The system isn't a single prompt thrown at an LLM. It's an orchestration layer with three distinct stages:

1. Intent Classification — Every incoming message gets classified into one of six categories: greeting, resume request, scheduling, lead capture, content citation, or general knowledge. This happens through pattern matching before any AI model gets involved, which keeps responses fast and predictable for common interactions.

2. Retrieval (the RAG layer) — For knowledge queries, the system searches a corpus built from multiple structured data sources. This is where grounding happens. Instead of asking an LLM to guess at my experience, the system retrieves actual data points and presents them.

3. Response Synthesis — Retrieved results get woven into a coherent narrative with domain-aware connectors. The system knows the difference between a technical skill result and a resume task result, and structures the response accordingly.

Notion as the Knowledge Layer

This is where it gets interesting. The foundation of the entire system is a set of Notion databases that serve as the single source of truth for my professional data:

Resume Tasks — Individual accomplishments broken down by company, role, impact areas, systems used, and quantified metrics. Not paragraphs of prose — structured, queryable data points.
Technical Skills — Each skill with proficiency level, years of experience, context of use, and narrative.
Domain Knowledge — Industry and functional expertise with the same structured approach.
Soft Skills, Certifications, Education, Community Involvement — Each in their own database with consistent schemas.

When the chatbot needs to answer "Have you worked with Salesforce?", it doesn't guess. It retrieves the Salesforce entry from the technical skills database, finds resume tasks that reference Salesforce, and synthesizes a response from actual data — complete with companies, metrics, and context.

The Notion API fetches this data at runtime with a 5-minute cache, so updates I make to my Notion workspace are reflected in chatbot responses within minutes.

Site Content as Corpus

Beyond Notion, the chatbot indexes the site itself:

Static pages (About, Projects, Tech Stack, AI Strategy) get parsed into searchable entries.
Articles (like the one you're reading) are extracted from MDX files and included in the knowledge corpus.
Interview Q&A clips from Notion are searchable and can be surfaced as video cards in the chat interface.

This creates a layered knowledge base: structured professional data from Notion, narrative content from the site, and multimedia from interview recordings.

The Search Pipeline: Hybrid Scoring

Pure keyword search misses context. Pure semantic search hallucinates relevance. The system uses a hybrid approach:

Lexical scoring tokenizes the query, filters stop words, and calculates token overlap with corpus entries. Exact phrase matches get a 2x bonus — if someone asks about "enterprise go-lives," that phrase match outranks entries that merely contain "enterprise" and "go-lives" separately.

Domain inference classifies the query into a primary domain (technical skill, domain knowledge, resume task, etc.) and boosts results from that domain by 1.5x. Supporting domain results get a 1.15x boost. This means a question about Python returns technical skill entries first, with relevant resume tasks as supporting context.

Diversity reranking ensures the top results represent different facets of the answer. The system slots the best primary-domain result first, then alternates between primary and supporting domains to give a well-rounded response.

Content Cards: Making Results Interactive

Raw text responses aren't enough. The chatbot returns interactive content cards alongside the narrative response. Each card represents a source — an article, a video clip, or a curated page — with type-specific styling and expandable detail views.

Cards include AI-generated summaries (produced in parallel using GPT-4o-mini with tight timeouts) that contextualize why each result is relevant to the query. If summary generation fails, the system falls back to static snippets. Graceful degradation is a theme throughout.

The Hallucination Problem

Here's the honest part: AI hallucination is the single biggest challenge in building a system like this, and it's why I call this a beta feature.

The entire RAG architecture exists because of hallucination. Without retrieval grounding, an LLM will confidently fabricate details about my career — inventing companies I've never worked at, technologies I've never used, metrics I've never achieved. That's worse than having no chatbot at all.

Grounding mitigates this but doesn't eliminate it. The synthesis layer can still introduce subtle inaccuracies when connecting retrieved facts into narrative prose. A system might retrieve that I used Salesforce at Company A and Python at Company B, then imply I built a Python-Salesforce integration that never happened.

My current approach to managing this:

Structured data over prose — Notion entries are discrete facts, not paragraphs. This limits the surface area for misinterpretation.
Citation-first design — Responses point to source cards. Users can verify claims against the actual content.
Constrained synthesis — The response builder uses templates and domain-specific connectors rather than free-form LLM generation for the narrative layer.
No fabrication fallback — If the retrieval layer returns weak results (below a 0.15 relevance threshold), the system says so rather than making something up.

The Data Library Challenge

The chatbot is only as good as its data. This is the unsexy truth about RAG systems that doesn't make it into most demos.

Building and maintaining a comprehensive knowledge base is a significant ongoing effort. Every new skill needs a Notion entry with proficiency, context, and narrative. Every project needs structured task breakdowns with quantified impact. Every role needs detailed accomplishments, not just titles and dates.

The payoff is that this data serves multiple purposes: it feeds the chatbot, informs the referral engine, structures the resume, and provides content for the site. But the investment is real. A RAG system with thin data produces thin answers — and users can tell immediately.

I've learned that the quality of your data library is the primary differentiator between a chatbot that impresses and one that disappoints. The AI layer is almost commodity at this point. The data layer is the moat.

What's Next

This is a beta feature, and I'm treating it that way. Current priorities:

Expanding the knowledge corpus — More structured data, more interview clips, more detailed task breakdowns.
Conversation memory — The system currently keeps 12 messages per session. Longer context windows would enable more natural multi-turn conversations.
Feedback loops — Understanding which queries produce good results and which fall flat, so I can target data gaps.
Guardrails — More explicit boundaries around what the chatbot should and shouldn't attempt to answer.

The Takeaway

Building a RAG-powered chatbot taught me that the interesting engineering isn't in the AI — it's in the data architecture, the retrieval pipeline, and the synthesis constraints that keep responses honest.

Anyone can wire up an LLM and get impressive-sounding answers. Building a system that gives accurate answers, grounded in real data, with graceful degradation when it doesn't know something — that's a different problem entirely. And it's a problem worth solving, because trust is the only thing that separates a useful tool from a liability.

If you want to try it, the chatbot is live on this site. Click the button in the bottom right corner and ask me anything. Just know that it's still learning — and so am I.