March 2026
Building a RAG Pipeline Over 50K Entities
What actually matters when you're doing retrieval-augmented generation over a knowledge graph, not a pile of PDFs.
Most RAG tutorials start with “load your documents into a vector store.” That's fine if you have a folder of PDFs. We had a knowledge graph with 50,000+ entities, each with typed relationships, Schema.org properties, and links to external knowledge bases. The standard chunking playbook didn't apply.
This is how my team at Schema App built a RAG retrieval pipeline using LangChain, AWS Bedrock, and vector embeddings, and what we learned along the way.
Why knowledge graphs are different
A typical document has paragraphs. You chunk by token count or sentence boundary, embed each chunk, and call it a day. An entity in a knowledge graph is a structured object. It has a type (Organization, Product, Person), properties (name, description, URL), and relationships to other entities.
If you flatten all of that into a single text string and embed it, you lose the structure that makes the data valuable in the first place. If you embed each property separately, you get too many tiny chunks with no context. We needed something in between.
The chunking strategy we landed on
We ended up with what I'd call entity-centric chunks. For each entity, we generate a natural language summary that includes its type, key properties, and its most important relationships (parent organization, related products, etc.). That summary becomes the chunk. One entity, one chunk, with enough context to be useful in retrieval.
The tricky part was deciding which relationships to include. Include too many and the chunk gets noisy. Include too few and the retriever misses relevant connections. We started by including all direct relationships and then pruned based on retrieval performance. It took about three iterations to get this right.
Reranking made the biggest difference
Initial retrieval accuracy was okay. Not great. The vector search would return entities that were semantically similar but not actually relevant to the question. A query about “product pricing” would return entities about “product features” because the embeddings are close.
Adding a reranking step changed everything. We used a cross-encoder reranker that takes the query and each candidate chunk and scores relevance more carefully than cosine similarity alone. Top-5 precision went from around 60% to over 85%. That single addition was worth more than any amount of chunking optimization.
Evaluating with Langfuse
You can't improve what you don't measure, and RAG evaluation is harder than it sounds. We used Langfuse to track every query, the retrieved chunks, the generated response, and human feedback scores. Over time this gave us a dataset we could use to compare chunking strategies, embedding models, and reranking thresholds.
The most useful metric turned out to be “answer groundedness” rather than relevance. Users didn't care much if the answer was slightly off-topic, but they cared a lot if the model made something up. So we optimized for reducing hallucination rather than maximizing recall. That changed which tradeoffs we made everywhere else in the pipeline.
What I'd do differently next time
I'd start with evaluation infrastructure before writing a single line of pipeline code. We built Langfuse integration in week three, and for the first two weeks we were making chunking decisions based on vibes. That's two weeks of wasted iteration.
I'd also invest more in hybrid retrieval earlier. We added keyword search alongside vector search late in the process and it helped a lot for queries with specific entity names or IDs. Should have been there from the start.
The pipeline is now serving customer-facing integrations and the hallucination rate is low enough that customers trust it. That took about three months of iteration. If I had to do it again with what I know now, I think we could get there in six weeks.
Have thoughts on this? I'd like to hear them: isser.akhil@gmail.com