RAG in 3 functions. Sources → Transforms → Sinks for vector databases.

pip install ragpipe-ai
Python 3.10+ Qdrant Pinecone Ollama OpenAI CLI YAML

What is RAG?

RAG (Retrieval-Augmented Generation) is how you give an AI access to your own data. Instead of guessing answers, the AI first searches your documents, finds the relevant parts, and then generates an answer based on what it found.

The problem? Getting your data into a searchable format is painful. You need to extract text, chunk it, embed it into numbers, store in a vector database, and search. RAGPipe does all of this in one pipeline.

RAGPipe Architecture

The 3 Functions

import ragpipe # 1. Ingest anything — files, git repos, web pages ragpipe.ingest("./docs", sink="json", sink_path="./my_data.json") # 2. Query your data results = ragpipe.query("What is the refund policy?", sink_path="./my_data.json") print(results[0].content) # 3. Pipe — full control with the Pipeline API pipeline = ragpipe.Pipeline() pipeline.add_source(ragpipe.GitSource("https://github.com/owner/repo")) pipeline.add_transform(ragpipe.AutoEmbed()) pipeline.add_sink(ragpipe.QdrantSink("my-repo")) pipeline.run()

CLI

# Create a starter pipeline config ragpipe init # Ingest a directory ragpipe ingest ./docs # Ingest a GitHub repo ragpipe ingest https://github.com/owner/repo --embed # Query your data ragpipe query "How does auth work?" # Run a YAML pipeline ragpipe run pipeline.yaml

Why RAGPipe?

3 Functions

ingest(), query(), pipe() — the entire API. Like Chroma's "4 functions" for RAG pipelines.

Zero Config

Auto-detects files, auto-embeds with whatever you have installed. Ollama → OpenAI → sentence-transformers.

YAML Pipelines

Declarative configs like docker-compose for RAG. Version control your ingestion pipelines.

Beautiful CLI

Rich progress bars, result tables, and status spinners. Built with Typer + Rich.

Any Source

Local files, git repositories, web pages with crawling — one interface for everything.

Any Vector DB

Qdrant, Pinecone, or just a JSON file for local development. Swap with one argument.

YAML Pipelines

Define your entire RAG pipeline in a YAML file:

source: type: git repo_url: https://github.com/owner/repo file_patterns: - "src/**/*.py" transforms: - type: html_cleaner - type: recursive_chunker chunk_size: 512 - type: auto_embed sinks: - type: qdrant collection_name: my-repo url: http://localhost:6333 vector_size: 384
ragpipe run pipeline.yaml

Sources

SourceDescription
FileSourceLocal files and directories — auto-detects text files
GitSourceClone git repos (GitHub, GitLab, any git server)
WebSourceScrape web pages with optional depth crawling

Transforms

TransformDescription
RecursiveChunkerSplit text using hierarchical separators (paragraphs → sentences → words)
FixedSizeChunkerSplit by fixed character count with overlap
SemanticChunkerSplit by semantic similarity of sentences
HTMLCleanerStrip HTML to clean text, remove scripts/styles
PIIRemoverRedact emails, phones, SSN, credit cards, IPs
AutoEmbedZero-config embeddings — auto-detects Ollama, OpenAI, sentence-transformers

Sinks

SinkDescription
JSONSinkWrite to a local JSON file — great for prototyping
QdrantSinkWrite to Qdrant vector database (local or cloud)
PineconeSinkWrite to Pinecone vector database

Embedding Backends

AutoEmbed tries each backend in order and uses the first available:

PriorityBackendSetup
1Ollama (local, free)ollama pull nomic-embed-text
2OpenAISet OPENAI_API_KEY
3sentence-transformers (local, free)pip install 'ragpipe-ai[local]'

System Integrations

Smart Index

ragpipe index . — auto-detects language, ignores node_modules/.git, chunks and stores.

File Watcher

ragpipe watch . — auto-reindexes on every file change. Uses watchdog.

REST API

ragpipe serve — local API with /search, /health, /chunks, /reindex on port 7642.

Git Hooks

ragpipe git hook . — auto-index after every commit. Invisible, zero-friction.

VSCode

ragpipe vscode tasks . — generates tasks.json with Index/Query/Serve/Watch tasks.

fzf

ragpipe search --fzf — interactive fuzzy search through your indexed data.

macOS Spotlight

ragpipe macos spotlight "query" — search via native mdfind.

Linux systemd

ragpipe linux service --install — run as a systemd service. Scheduled indexing with timers.