All Projects
Trace

Trace

Completed

RAG Powered FinTech Platform

Hybrid RAGFastAPIChromaDBPaddleOCR
Role:AI & Software Engineer
Duration:4 months

Overview

Trace is a next-generation financial assistant and production-grade Retrieval-Augmented Generation (RAG) pipeline. It solves the unstructured data problem in personal finance by converting raw receipt images into type-safe JSON. The platform enables natural language queries over financial data using a Hybrid Engine that fuses dense vector retrieval with sparse keyword matching, seamlessly streamed to a modern React 19 frontend.

Architecture

The ingestion pipeline extracts text via PaddleOCR, formats it into type-safe JSON using Instructor, and applies item-level chunking. For retrieval, async FastAPI endpoints query both ChromaDB and BM25 indices, merging the results before an ONNX-optimized Cross-Encoder reranks candidates for the final streamed response.

Key Features

Hybrid Retrieval & Reranking

Achieved 91%+ Context Precision (benchmarked via RAGAS) by implementing a dense (semantic) and sparse (BM25) RAG pipeline with a Cross-Encoder Reranking layer and Parent Document Retrieval Item-level chunking to optimize context window relevance.

OCR & Structuring

Utilizes PaddleOCR combined with Instructor (Pydantic) to extract and validate type-safe JSON from receipts, featuring real-time OCR coordinate visualization on the frontend.

Quantized SLM Inference

Runs quantized Phi-3.5 (4-bit) and Cross-Encoder models via ONNX Runtime to eliminate heavy PyTorch dependencies, ensuring fast inference.

Containerized & Privacy-First

Deployed a fully containerized stack using Docker Compose with persistent volumes for vector stores and model weights, ensuring 100% data privacy and offline capability[cite: 47].

Tech Stack

Backend

FastAPI
High-performance async API utilizing Server-Sent Events (SSE) for streaming

Frontend

React 19 & TypeScript
Modern UI with TanStack Query for optimistic updates and caching

Database

ChromaDB & BM25
Vector database and sparse retrieval engine for hybrid search

AI/ML

ONNX Runtime
Accelerated inference for embedding and reranking models
PaddleOCR
Lightweight, state-of-the-art text detection and extraction
Phi-3.5 (Quantized)
Efficient LLM for structuring OCR text into type-safe data

DevOps

Docker
Container orchestration ensuring offline capability and data privacy

Challenges & Solutions

!
Challenge

Semantic search alone failed on exact keyword matches (e.g., retrieving specific receipt totals or merchant names).

Solution

Architected a Hybrid RAG pipeline combining Dense Vector Search (ChromaDB) and Sparse Keyword Search (BM25).

!
Challenge

Token usage and context windows were bloated by passing entire receipt documents into the LLM.

Solution

Implemented semantic chunking by deconstructing receipts into 'Full Document' vectors for high-level summaries and 'Item Granularity' vectors for line-item accuracy.

!
Challenge

Full-size LLMs and heavy framework dependencies were too expensive and slow for per-receipt processing at scale.

Solution

Engineered a cost-effective ingestion pipeline using a Quantized SLM (Phi-3.5 4-bit), reducing ingestion costs by 90%.

Results

91%+ Context Precision (benchmarked via RAGAS)
90% reduction in data ingestion costs by utilizing local quantized models
100% data privacy and offline capability through Docker deployment
Eliminated PyTorch dependencies in production via ONNX Runtime