Tag: howto

All the articles with the tag "howto".

Hybrid Search: BM25 and Embeddings Are Better Together

14 May, 2024

Pure vector search quietly fails on the exact terms, codes, and acronyms users actually type. Combining BM25 with dense retrieval, fusing the two, and paying the latency bill it costs.
vLLM, Quantization, and Serving LLMs on a Budget

16 Apr, 2024

Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.
Your RAG Is Bad Because Your Chunking Is Bad

16 Jan, 2024

A year into production RAG, the retrieval problems teams keep blaming on the model are almost always chunking, metadata, and document structure. Concrete fixes, with the splitting code I actually run.
A Forecasting Ensemble That Actually Ships

19 Sep, 2023

A demand-forecasting ensemble (a classical statistical model, a sequence model, and gradient boosting) that took accuracy far enough to cut inventory hard, plus the boring data problems that mattered more than the model.

Hybrid Search: BM25 and Embeddings Are Better Together