Tag: howto
All the articles with the tag "howto".
-
Hybrid Search: BM25 and Embeddings Are Better Together
Pure vector search quietly fails on the exact terms, codes, and acronyms users actually type. Combining BM25 with dense retrieval, fusing the two, and paying the latency bill it costs.
-
vLLM, Quantization, and Serving LLMs on a Budget
Self-hosting an open model when GPUs are scarce and finance is reading the bill. Continuous batching, KV-cache, what quantization actually costs you, and when to just call a hosted API instead.
-
Your RAG Is Bad Because Your Chunking Is Bad
A year into production RAG, the retrieval problems teams keep blaming on the model are almost always chunking, metadata, and document structure. Concrete fixes, with the splitting code I actually run.
-
A Forecasting Ensemble That Actually Ships
A demand-forecasting ensemble (a classical statistical model, a sequence model, and gradient boosting) that took accuracy far enough to cut inventory hard, plus the boring data problems that mattered more than the model.