How RAG Works
A technical walkthrough. Raw documents flow through extraction, chunking, embedding, storage, and retrieval — in 3 lines of code.
pip install ragpipe-ai
The Architecture
Five stages stacked vertically. Data flows top → bottom through extraction, chunking, embedding, storage, and retrieval.
Extract
Read raw text from files, git repos, or web pages. 3 source types.
Chunk
Split into 512-char pieces with 64-char overlap. 3 chunking strategies.
Embed
Convert text → float vectors. Dimensions vary by backend: 768 (Ollama), 1536 (OpenAI), 384 (ST).
Store
Persist vectors + text + metadata. 3 sink backends.
Query
Embed the query, compute cosine similarity against stored vectors, return top-K matches.
Zero configuration. Tries each backend in order and uses the first available. Ollama and sentence-transformers work locally with no API key.
Embedding Space
Each document chunk becomes a point in high-dimensional vector space. Chunks about similar topics cluster together geometrically. When you query, RAGPipe finds the nearest points by cosine similarity.
The Comparison
Same result. Different complexity.
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
loader = WebBaseLoader(web_paths=(...))
docs = loader.load()
splits = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
).split_documents(docs)
vector_store = InMemoryVectorStore(
OpenAIEmbeddings())
vector_store.add_documents(splits)
# ... then wire up retriever + LLM chain
# ... 20+ more lines to query
import ragpipe
# Ingest anything
ragpipe.ingest("./docs")
# Query your data
results = ragpipe.query("What is the refund policy?")
# Or via CLI
$ ragpipe ingest ./docs
$ ragpipe query "refund policy?"
| Feature | RAGPipe | LangChain | LlamaIndex |
|---|---|---|---|
| Basic RAG | 3 lines | 40 lines | 5 lines |
| Packages | 1 | 5+ | 2-3 |
| CLI | check | separate | separate |
| YAML pipelines | check | close | close |
| Git hooks | check | close | close |
| Zero-config embed | check | close | partial |
| REST API server | check | separate | close |
| Document loaders | 3 | 160+ | 300+ |
Ready to Pipeline?
One install. Zero config. Any data source.
pip install ragpipe-ai