SEC.01 — Overview

RAGPipe Technical Blueprint

Retrieval-Augmented Generation

How RAG Works

A technical walkthrough. Raw documents flow through extraction, chunking, embedding, storage, and retrieval — in 3 lines of code.

$ pip install ragpipe-ai

2+1

Functions + Pipeline

Config Needed

0.71s

For 7K Chunks

SEC.02 — Pipeline Flow

The Architecture

Five stages stacked vertically. Data flows top → bottom through extraction, chunking, embedding, storage, and retrieval.

01 description

Extract

Read raw text from files, git repos, or web pages. 3 source types.

.py .md .html .txt

FileSource("./docs") · GitSource("https://...") · WebSource("https://...")

arrow_downward

02 content_cut

Chunk

Split into 512-char pieces with 64-char overlap. 3 chunking strategies.

RecursiveChunker(chunk_size=512, chunk_overlap=64)

arrow_forward

1 document → 3 overlapping chunks

arrow_downward

03 hub

Embed

Convert text → float vectors. Dimensions vary by backend: 768 (Ollama), 1536 (OpenAI), 384 (ST).

AutoEmbed() # auto-detects Ollama → OpenAI → sentence-transformers

[0.12, -0.34, 0.87, 0.05, -0.62, ...] — each chunk becomes a vector

arrow_downward

04 storage

Store

Persist vectors + text + metadata. 3 sink backends.

Qdrant Pinecone JSON

JSONSink("./index.json") · QdrantSink("collection") · PineconeSink("index")

arrow_downward

05 search

Query

Embed the query, compute cosine similarity against stored vectors, return top-K matches.

ragpipe.query("How does auth work?", top_k=5)

Top-5 results:

0.89

0.85

0.82

0.78

0.74

swap_horiz AutoEmbed Fallback Chain

1. Ollama local, 768-dim

arrow_forward

2. OpenAI cloud, 1536-dim

arrow_forward

3. sentence-transformers local, 384-dim

Zero configuration. Tries each backend in order and uses the first available. Ollama and sentence-transformers work locally with no API key.

SEC.03 — Vector Retrieval

Embedding Space

Each document chunk becomes a point in high-dimensional vector space. Chunks about similar topics cluster together geometrically. When you query, RAGPipe finds the nearest points by cosine similarity.

Algorithm: Cosine Similarity

cos(A,B) = A · B / (|A| × |B|)

Query Time

0.20s

keyword search, top-5

Index Time

0.71s

2,267 files, 7,388 chunks

Dimension 1 (t-SNE)

Dimension 2

Code (.py)

Docs (.md)

Config (.yaml)

Query

src/

docs/

*.yaml

"auth flow?"

SIM 0.89

SIM 0.85

SIM 0.82

SEC.04 — Implementation

The Comparison

Same result. Different complexity.

closeLangChain

40 lines · 5+ pkgs

from langchain_community.document_loaders import WebBaseLoader

from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_openai import OpenAIEmbeddings

from langchain_core.vectorstores import InMemoryVectorStore

loader = WebBaseLoader(web_paths=(...))

docs = loader.load()

splits = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
).split_documents(docs)

vector_store = InMemoryVectorStore(
OpenAIEmbeddings())

vector_store.add_documents(splits)

# ... then wire up retriever + LLM chain

# ... 20+ more lines to query

13x less code

check_circleRAGPipe

3 lines · 1 pkg

import ragpipe

# Ingest anything

ragpipe.ingest("./docs")

# Query your data

results = ragpipe.query("What is the refund policy?")

# Or via CLI

$ ragpipe ingest ./docs

$ ragpipe query "refund policy?"

Feature	RAGPipe	LangChain	LlamaIndex
Basic RAG	3 lines	40 lines	5 lines
Packages	1	5+	2-3
CLI	check	separate	separate
YAML pipelines	check	close	close
Git hooks	check	close	close
Zero-config embed	check	close	partial
REST API server	check	separate	close
Document loaders	3	160+	300+

SEC.05 — Execute

Ready to Pipeline?

One install. Zero config. Any data source.

> pip install ragpipe-ai

codeGitHub inventory_2PyPI