Follow along the latest installment of Angela Shi's enterprise RAG series to learn why stacking a reranker on top of a weak retrieval doesn’t save it, and what cross-encoders actually fix.
Towards Data Science
Internet Publishing
San Francisco, California 646,420 followers
Publish insights on the world-leading AI, ML & data-science platform and reach data professionals worldwide.
About us
Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward. Contributors receive editorial guidance, best-in-class publishing tools and prominent placement on our site, newsletter and social feeds. Accepted articles are eligible for the TDS Author Payment Program, which compensates writers based on reader engagement. If you have an idea worth sharing, submit your draft, join the conversation and connect with a global audience of data professionals. Insight Partners is an investor in Towards Data Science.
- Website
-
http://towardsdatascience.com
External link for Towards Data Science
- Industry
- Internet Publishing
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Specialties
- Data Science, Machine Learning, Artificial Intelligence, Data Visualization, Data, Data Engineering, AI Agents, Software Development, DevOps, Programming, Technology, and Digital Publishing
Locations
-
Primary
Get directions
548 Market St
San Francisco, California 94104, US
Employees at Towards Data Science
Updates
-
"But what if we could exploit this structural predictability? What if we could predict the value of a section before we ever send it to the LLM, drastically cutting ingestion costs by strategically ignoring the noise?" Partha Sarkar introduces a novel, effective approach to named-entity resolution in RAG systems.
-
"This group doesn’t just think with AI—they actively think about how they’re thinking while using AI. And this skill may quietly become the defining human advantage in the AI era. That skill is: metacognitive regulation." Rashi Desai proposes a counterintuitive area for AI practitioners to focus on.
-
We're thrilled to share a new article by Minh Chien Vu: a thorough and accessible introduction to Qdrant TurboQuant, a recently released quantization method.
-
What do we lose when we outsource research — and other similar, cognitively demanding tasks — to AI agents? Jacopo Tagliabue offers a nuanced and frank reflection on an emerging conundrum.
-
Towards Data Science reposted this
If you ever wondered what "Lineage" means in DAX and how to manipulate it, read my latest piece on Towards Data Science, where I dive into this topic to show you how you can improve your DAX code by using the lineage.
Why does lineage matter in DAX? How can we use it in our day-to-day PowerBI projects? Follow along Salvatore Cagliari's new tutorial to find out.
-
Towards Data Science reposted this
I spent weeks benchmarking Qdrant's new TurboQuant and my honest take is: it's not a silver bullet, but it's the most thoughtful quantization I've seen in production vector search. Most engineers treat quantization as a simple tradeoff: compress more, lose recall. TurboQuant asks a different question — 𝑤ℎ𝑎𝑡 𝑖𝑓 𝑡ℎ𝑒 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑖𝑡𝑠𝑒𝑙𝑓 𝑤𝑎𝑠 𝑠𝑚𝑎𝑟𝑡𝑒𝑟? The idea behind it (from a Google Research paper presented at ICLR 2026): 𝐫𝐨𝐭𝐚𝐭𝐞 𝐭𝐡𝐞 𝐯𝐞𝐜𝐭𝐨𝐫 𝐛𝐞𝐟𝐨𝐫𝐞 𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐧𝐠 𝐢𝐭. That rotation spreads energy evenly across all dimensions, so no single dimension carries too much signal or too much noise. Then apply one codebook to everything equally. Scalar quantization applies the same fixed grid to every dimension regardless of variance. Binary quantization throws away almost everything except the sign. TurboQuant changes the shape of the problem first, then spends bits on a better-prepared vector. Here's what I actually measured across 10K / 50K / 100K vectors on the DBpedia OpenAI embeddings dataset (1536-dim, high variance ratio of 233x): → 𝐓𝐐 𝟒-𝐛𝐢𝐭 reached 0.965 recall@10 at 100K vectors, only 1.5 points below Scalar Quantization, at 𝟖× 𝐜𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 vs Scalar's 4× → 𝐁𝐢𝐧𝐚𝐫𝐲 𝐐𝐮𝐚𝐧𝐭𝐢𝐳𝐚𝐭𝐢𝐨𝐧 fell from 0.916 to 0.78 recall as the dataset doubled. TQ variants held much more stable → Adding rescoring, 𝐓𝐐 𝟒-𝐛𝐢𝐭 𝐡𝐢𝐭 𝟎.𝟗𝟗𝟔 — effectively matching Float32 recall at half the memory → Latency with TQ 4-bit + rescore: 𝟔.𝟒𝐦𝐬 vs Float32's 7.6ms The thing that surprised me most? TurboQuant's recall doesn't degrade as fast as the corpus grows. That's the rotation step doing its job. My practical conclusion after all this: → TQ 4-bit is the most balanced starting point. Better compression than Scalar, similar recall. → TQ 1.5-bit + rescoring is the move when you're storage-constrained but can't sacrifice retrieval quality. → TQ 1-bit: skip it unless you've tested it hard on your own embeddings. → Still prefer Binary Quantization if throughput is the only goal. TurboQuant costs more per query. One important caveat worth calling out: TurboQuant launched May 11, 2026. Real production experience is still limited. The geometry preservation works great for L2/cosine/dot product. For Manhattan distance, it needs full vector reconstruction — stick with Scalar Quantization there. I wrote up the full technical breakdown of the pipeline, the benchmarks, and a decision flowchart for when to use each method, shared in Towards Data Science — link in comments. If you're running Qdrant in production and have tried TurboQuant on real data — I'm genuinely curious whether your recall numbers held up at larger scales. Link to the full article: https://lnkd.in/g6QUXJtZ #VectorDatabase #Qdrant #MachineLearning #VectorSearch #NLP
-