The dense fog of RAG: navigating dense retrieval's blind spots
Dense retrieval — an essential component of Retrieval-Augmented Generation (RAG) pipelines — faces distinct lexical and contextual challenges that can significantly affect how well RAG systems perform. RAG aims to connect user queries to knowledge bases in a way that closely reflects user intent, rather than merely returning broadly related documents. While dense retrieval often improves contextual alignment, it can still miss nuances or fail in ambiguous scenarios.
The key takeaways of this post are as follows:
- Embeddings have inherent limits. They can’t easily differentiate fine-grained meanings or handle domain-specific jargon without additional strategies.
- Load-time vs. query-time fixes. We’ll explore how to mitigate issues at indexing/load time (via chunking, domain tagging, or word-sense disambiguation) and at query time (via re-ranking, expansions, or clarifying prompts).
- Where to begin? Practitioners often need quick wins — like text preprocessing and domain-aware embeddings — before moving on to more advanced multi-step retrieval or specialised fine-tuning.
This post delves into challenges of dense retrieval such as polysemy, domain nuance, noise sensitivity, query ambiguity, and training pitfalls. The focus is on the nuances and pitfalls practitioners need to keep in mind before deploying dense retrieval at scale.
This post is written for technical profesionals who have built at least a basic RAG chatbot. It is both a blessing and a curse that RAG chatbots seem to work almost out of the box, but in reality achieving and sustaining good sustained long-tail performance requires more heavy-lifting, and we believe this realization mostly comes with experience.
For a more introductory treatment of RAG, please see Reliable RAG: preprocessing is all you need.
The Core Problem: dense retrieval and its obstacles
Before generative models can produce meaningful answers, dense retrieval must locate truly relevant chunks of data. In many RAG designs, dense retrieval acts as the first filter to locate relevant data chunks before generative models produce answers. While other approaches (like re-ranking or hybrid BM25+Dense) can refine these results, this initial retrieval step is crucial for providing strong context. However, evidence from both research (Chen et al., 2023) and real-world deployments indicates that metrics like recall and precision often drop when:
- Queries include polysemous terms (words with multiple meanings)
- Domain-specific details are present
- Linguistic “noise” appears, such as unexpected phrasing
In a multi-turn or ‘agentic’ RAG environment, where the system may autonomously retrieve information across multiple steps—every detail matters. Minor retrieval errors can propagate and compound over each generation cycle and can derail the entire process.
The challenge is further complicated by several factors:
- The black-box nature of embeddings
- Errors tend to stay hidden or become guesswork to diagnose ,
- Contextual nuances proliferate through:
- Timeline references
- Multi-turn queries
- Partial or elliptical user inputs
- Within-dataset domain shifts
All these elements test the limits of dense retrieval systems. In the sections ahead, we’ll examine how these challenges play out.
Many dense retrieval systems rely on dual-encoder architectures that encode queries and documents into a shared vector space. By ‘alignment,’ we mean that semantically similar content should form nearby clusters in this space, while unrelated content stays further apart. When the encoders drift—due to parameter mismatch or data imbalance, the system can fail to match semantically related content.
This issue often manifests as a ‘many‑to‑one’ encoding problem, where a single vector must encapsulate multiple thematic or factual facets of a document. Consequently, even if a query touches on a minor yet crucial detail, the dense retriever might fail to match it appropriately due to the averaging of multiple signals. The fundamental disparity in length and detail between short queries and comprehensive documents means that even these advanced techniques struggle to completely resolve the alignment gap.
Such misalignment undercuts retrieval precision and can cause downstream issues. For example, a generative model may hallucinate or return incomplete details if the retrieved passages only partially match the query’s real intent.
Semantic ambiguity and polysemy
These alignment and representation challenges are most visible in situations involving polysemy, ambiguous phrases, or words that vary in meaning depending on context.
1 Navigating contextual ambiguity
In typical dual-encoder setups, queries are encoded as standalone text without incorporating prior conversation or user context, causing the system to lose any additional disambiguating signals that might clarify user intent. This isolation can be problematic when user queries are vague, for example, “python installation issues”, or when they rely on implicit contextual information.
Consider the query “What did he do in the final scene?” Without additional context—like who ‘he’ is or what work we’re discussing—the model often defaults to the interpretation it saw most frequently during training, which may not match the user’s unique context. Similarly, a query like ‘Jaguar release date’ might be referencing a Jaguar automobile launch or Apple’s ‘macOS Jaguar’ (released in 2002)—two very different domains.
Because a pure dense retrieval layer doesn’t typically ask follow-up questions, many systems rely on external processes like query expansion or user prompts to refine ambiguous requests.
Polysemy, where a single word has multiple distinct senses,remains a stumbling block. While older static embeddings (Word2Vec, GloVe) collapse all senses into one vector, even modern contextual models struggle when context is too vague or minimal. As a result this makes it difficult to differentiate between “bank” in a financial sense vs. “bank” in a geographical sense. The mismatch can cause a user searching for “bank erosion effects” to retrieve policy documents instead of content focused on riverbank erosion.
This challenge deepens with acronyms like ‘IRA’ (Individual Retirement Account, Irish Republican Army, or Inflation Reduction Act). Even BERT-based embeddings may fail if context doesn’t clarify which sense applies.Dense retrievers often struggle to pick the right meaning, even with surrounding context. Technical fields face similar issues: “ML” might refer to machine learning, maximum likelihood, or milliliters, depending on the domain.
Context itself can shift mid-passage. A sustainability report might discuss riverbank conservation before pivoting to banking institutions’ environmental policies. Dense embeddings tend to blur these distinct meanings into a single vector, losing the nuanced relationship between them. Even within a single sentence, multiple meanings can coexist: “The bank (which started as a community effort near the riverbank) expanded its services.” Such nested meanings often confuse embedding models.
Time-based shifts also matter.
Words evolve—“cloud” has shifted from weather to computing, and “banking” from physical locations to digital services. When training data spans decades, embeddings can merge historical and modern contexts, leading to retrieval errors. A search for “modern banking regulations” might surface outdated documents simply because the embedding model treats all instances of “banking” as e quivalent.
Because dense retrieval lacks interactive clarification mechanisms, systems must rely on supplementary processes such as query expansion, clarifying questions, or conversational query rewriting to resolve such ambiguity.
As we have seen, missing contextual cues lead not only to ambiguity but also to the dilution of distinct meanings, a precursor to the challenge of differentiating subtle semantics.
Several tactics can enhance dense embeddings to handle polysemy more effectively:
- Load time pre-processing:
- Dictionary-based word sense disambiguation
- Domain-specific tagging (e.g labeling “bank” as financial/geographical) before retrieval using WordNet or domain lexicons
- Query time:
- Knowledge-based query expansion
- Re-ranking results using domain context
These approaches help reduce ambiguity and improve retrieval precision.
2 Multi-interpretation queries
Dense retrieval systems are constrained by the fact that they generate only one representation per query. This single vector acts as a centroid, averaging all potential meanings and diluting distinct interpretations. How does one handle queries that genuinely span multiple interpretations? “Apple innovations” could refer to Apple Inc. or orchard farming breakthroughs. Many dense retrieval setups rely on broad semantic overlap. For instance, they may surface both tech- and agriculture-focused documents among the top results if both mention ‘Apple’ and ‘innovations,’ confusing user intent. This shortfall complicates advanced agentic RAG scenarios where user context, role, or environment should help disambiguate meaning.
In practice, context-limited query parsing also plays a significant part. When the system treats “apple innovations” as a short two- or three-word query, it lacks enough information to differentiate corporate or horticultural intent. Embeddings then fall back on broader statistical patterns (e.g Apple Inc. references might dominate a tech-heavy dataset), overshadowing orchard-related content. For example, ‘Apple innovations in 2020’ may be treated as separate tokens (‘Apple,’ ‘innovations,’ and ‘2020’), so the model can’t always link ‘2020’ specifically to Apple’s product timeline or orchard research from that same year. Dense retrieval alone may not capture that temporal layer, resulting in outdated or irrelevant references.
If the system knows the user’s domain or role, it can filter out orchard-based documents. The concept of “session context” or user historical queries often plays a big role in real production systems.
The challenge of averaging multiple meanings highlights the importance of contextual cues, which dense retrieval systems must also handle, particularly when faced with underspecified queries.
Without dedicated processes, such as generating multiple query embeddings or incorporating disambiguation routines—the resulting representation fails to capture any one specific intent accurately. This not only reduces retrieval precision but also affects user satisfaction, since the returned results may not align with the user’s intended meaning.
3 Shifting sense within a single passage
By default, some systems compute only a single embedding per passage, even if it shifts topics partway. More advanced solutions often chunk passages further or use multi-vector approaches to capture separate themes. This design forces the encoder to blend multiple semantic signals into a single vector. As a result, if a passage begins by covering a historical event and then transitions to a biographical note, the dominant portion may overshadow later, equally relevant details.
esearchers have proposed multi-vector models like ColBERT, which store token-level embeddings and preserve more fine-grained distinctions within a single document.
Researchers (Khattab et al., 2020; Freymuth et al., 2025) have investigated approaches such as multi-vector (ColBERT) representations, which store token-level embeddings and preserve more fine-grained distinctions within a single document, or hierarchical retrieval pipelines (CHARM) that first retrieve documents at a coarse level and then refine at a more granular level. Without these enhancements, dense retrieval often underperforms when dealing with content-rich texts where the central theme shifts.
This limitation reinforces the importance of more advanced indexing and retrieval strategies, because it compounds the issues of subtle semantic differences, contextual ambiguity, and multi-interpretation queries. Addressing topic shifts within a single passage is another step towards creating retrieval pipelines that can deliver context-specific responses with fewer omissions.
This inability to differentiate subtle semantic nuances is a key contributor to errors in downstream retrieval. Equally significant is the challenge posed by queries that encapsulate multiple interpretations, a topic we now examine further.
4 Failure to differentiate subtle semantics
When queries are very specific—like “Research on renewable energy subsidies in Europe” the system might retrieve broad documents on worldwide energy subsidies. Often, the terms ‘renewable energy’ and ‘subsidies’ appear so frequently that the model prioritises them over qualifiers like ‘in Europe,’ retrieving global studies instead of Europe-specific research. Dense retrieval naturally rewards global thematic overlap over pinpoint details, leading to general answers from the generative model or missing location-specific insights. The outcome? Frustrated users who feel the system ignores the specifics.
Although dense retrieval excels at capturing broad semantic similarity, it often lacks the finesse to distinguish between nuanced yet essential differences in text. When two passages discuss the same general topic and differ only by a negation, temporal shift, or a single qualifying word, the encoder’s compression can produce nearly identical vectors. This flattening effect overlooks the fine-grained details that may be determinative in responding accurately to user queries.
A significant contributor to this gap is the clash between general and specific terminology. Queries that blend generic terms (e.g “energy subsidies”) with precise qualifiers (“in Europe,” “by 2023”) can confuse the model, luring it into picking documents with strong general overlap but ignoring critical contextual elements such as time or region. Unless specialised fine-tuning or domain adaptation explicitly penalises ignoring qualifiers, the embedding space may blur these distinctions and fail to capture the boundaries that users care about—like “Europe” vs. “global,” or “by 2023” vs. “any time period.”
The subtle difference between semantic approximation and exact intent compounds these challenges. Similar text in an embedding space does not guarantee that the retrieved passage matches the user’s direct objective. Because wind energy is a subset of renewables, the model may lump ‘wind subsidies’ under general ‘renewable subsidies,’ returning documents on solar or hydropower instead of wind-specific content. Sometimes, a user wants comparative details, like differences between wind and solar subsidies—yet a dense embedding might lump everything under a broad “clean energy” category, missing that comparative nuance.
Further granularity comes into play when distinguishing between an entity and an action. Carbon capture" (the technology) vs. “capturing carbon” (the process), for instance. A single embedding might merge both under the same conceptual heading, losing how the text frames them. Even straightforward measurement disparities can be tossed into a single vector space: “energy subsidies” could refer to kilowatt-hour pricing or annual budget allocations, and retrieval might deliver content about energy production metrics when the user specifically wants data on fiscal outlays.
Finally, micro-context disconnect amplifies these pitfalls. Within the same source, one paragraph might address solar power in Asia, while the next pivots to wind in Europe. A chunk-based retrieval could lump both topics together as “renewables,” diluting the exact details the user is after. This phenomenon is not an outright error, but it diminishes the relevance that a user seeking “research on wind subsidies in Europe” expects from a refined retrieval pipeline.
When a query requires multiple precise details, is a single embedding enough? For example, ‘Who chaired the EU summit that proposed new carbon taxes in 2021?’ might need a two-step retrieval: first, find the summit details, then identify the chair. A single embedding can struggle to capture both pieces in one go.
Query expansion is a technique that adds synonyms, e.g. to “renewable energy subsidies in Europe,” such as “green energy” or “EU renewable policies.” While this can enhance retrieval coverage, it also risks introducing noise.
Losing fine-grained detail
Dense retrievers compress text into fixed-dimensional embeddings, which often leads to losing precise numeric values or rare entity names, unless training or fine-tuning specifically highlights them. This lossy transformation process inevitably omits fine nuances—precise numeric values, rare or unique entity names, and subtle semantic distinctions—if these features are not adequately accentuated during training or pre‑processing data augmentation. Reichman and Heck (2024) demonstrate that the pre-trained model’s internal knowledge, its vocabulary, factual grounding, and learned entity relationships—ultimately sets the ceiling on what the retrieval layer can encode accurately. In other words, if a fact or concept was not sufficiently highlighted during pre‑training, the dense retriever is unlikely to encode it accurately, causing subtle yet critical factual details to vanish.
These limitations prove particularly problematic in fields where precision is decisive. In legal contexts, important distinctions between terms like “limited liability” and “unlimited liability” can produce embeddings so similar that the system fails to distinguish them, despite representing fundamentally different concepts with unique compliance requirements and risk profiles. Similarly, variations such as “materially” versus “significantly” may be treated as near‑synonymous, even though they establish entirely different thresholds for liability or compliance. In securities law, “material” carries a specific criterion linked to investor decision-making, while “significant” typically indicates a lower level of importance. The retriever’s reliance on its pre‑training effectively sets a hard boundary on its representational capacity, further compounding the loss of critical details during the conversion to vector space.
The opaque nature of dense embeddings exacerbates these challenges. Unlike traditional lexical retrieval, where matches can be traced to specific terms, dense methods operate through hidden dimensions that resist straightforward interpretation. This opacity surfaces when unrelated content—such as a contract termination clause and a change-of-ownership provision—receive similar vector representations. Troubleshooting becomes complex without clear visibility into the embedding process. Issues may stem from:
- Training data anomalies
- Domain-specific language patterns
- Context window limitations
- Vector space compression effects
Fortunately, several strategies can help preserve these meaningful distinctions. Domain-specific embeddings, such as BioBERT for medical texts, Legal-BERT for legal documents, and FinBERT for financial materials, provide a foundation attuned to field-specific terminology and concepts. When combined with propositional chunking — breaking text into self-contained, meaningful units—these approaches help maintain the integrity of important phrases and concepts. This combination ensures that distinct notions like “Statute of limitations” and “Limitation of liability” maintain their separate identities in vector space, leading to more accurate and reliable retrieval results.
Sensitivity to noise and variations
Real users introduce noise/mistakes: typos, irregular formatting, hyphenation, and synonyms. Non-contextual embeddings (e.g Word2Vec, GloVe) treat ‘renewable-energy’ and ‘renewable energy’ as different tokens, often missing exact matches. Even contextual transformers can fail for significantly malformed input, e.g ‘renuable enrg’, words that scarcely resemble their intended forms from the training data. These failures throttle RAG pipelines, especially in high-stakes settings (such as legal or financial retrieval) where incomplete matches lead to incomplete answers.
In addition, specialised fields often contain uncommon synonyms and technical neologisms—words like “kerfuffles” in politics or “polymorphisms” in genetics—that appear rarely in everyday language. Dense embeddings might represent them poorly or group them with unrelated synonyms due to limited training data. Mixed-language queries add another layer of complexity: a single request might interleave words from multiple languages (for example, “Seguro de vida (life insurance) for contractors in the U.S.”), yet many embedding models lack robust cross-lingual alignment. Ambiguous user inputs also pose difficulties. Queries such as “the big merger docs” or “that 2021 case” provide minimal context, and dense retrieval can falter unless specifically trained on partial or elliptical phrases.
These nuances also raise questions of overfitting to “clean” corpora. Many training datasets consist of carefully curated text, whereas real-world environments include abbreviations, slang, and mobile-typed errors. Such mismatches can degrade performance in ways that standard training procedures do not anticipate. Effective pipelines account for these variations by deploying preprocessors that correct or unify formatting, as well as embeddings trained or fine-tuned on diverse, perhaps less pristine data sources. Hence, load-time augmentation with noisy or domain-specific data can help
Training complexities
Dense retrieval requires heavy-lifting at training time too. Dense retrieval models often require large-scale fine-tuning (on millions of query-document pairs) and specialised sampling strategies, making the training phase computationally expensive. Current solutions often fail to generalise smoothly across supervised and zero-shot tasks (How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval). Synthetic queries or pseudo-relevance labels used for data augmentation can be too formal or unrealistic, diverging from how actual users phrase real queries, leading to skewed model behaviors, making it harder to capture the actual variability found in production.
In addition, training demands can introduce additional (often hidden) complexities—without offering direct solutions:
-
Inconsistent or Noisy Labels (Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization) Pseudo-relevance labels, especially in large corpora, may incorrectly mark borderline documents as “relevant.” The model then “learns” partial or spurious correlations, embedding them incorrectly. This shows up later in real usage when those spurious connections surface.
-
Catastrophic Forgetting with Ongoing Updates (Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives) Periodic re-training or fine-tuning for new terms (for example, a newly introduced brand name or fresh legal statute) can cause the system to forget older associations, ‘catastrophic forgetting’. Over time, these mini-updates degrade overall embedding consistency, creating retrieval anomalies that are tough to trace.
-
Sparse Domains (Domain Adaptation for Dense Retrieval and Conversational Dense Retrieval through Self-Supervision by Meticulous Pseudo-Relevance Labeling) Some specialised fields (for instance, marine biology or rare historical archives) contain few training documents. The embedding model might overfit or fail to separate subtle topics if it lacks enough data variety. This leads to the illusion that the model can handle specialised queries—until real testing reveals that it lumps unrelated niche topics together.
Strategies to strengthen dense retrieval in RAG
Despite these issues, dense retrieval remains a backbone for RAG pipelines. So, how can we fix or mitigate the key pain points?
Below is a quick mapping from the problems to practical fixes, distinguishing between Load/Index-Time and Query-Time strategies:
Challenge | Load/Index-Time Fixes | Query-Time Fixes |
---|---|---|
Polysemy & Domain Ambiguity | - Tag acronyms/terms (WordNet) - Use domain-specific embeddings |
- Query expansion - Domain re-ranking |
Shifting Topics Within a Passage | - Semantic chunking (optimal chunk size) - Multi-vector indexing |
- Multi-hop retrieval - Step-by-step generative queries |
Sensitivity to Typos/Noise | - Normalise text formats - Train on “dirty” corpora for robust embeddings |
- Spelling correction - Clarification prompts (if multi-turn) |
Training Pitfalls (spurious labels…) | - Carefully curated or validated training sets | - Confidence-based re-ranking - Reinforcement from user signals |
Failing on Fine-Grained Details | - Propositional transformations - Strict domain embeddings |
- Post-retrieval re-ranking for exact values or numeric matches |
Some enhancements apply at query-time, while some apply at load-time.
In general, query-time interventions have the benefit of closer proximity to user intent, since they can react to the precise query from the user. Load-time interventions, on the other hand, apply before actual usage, but they have a benefit in terms of unit economics, since load-time interventions incur a one-off cost, which can amortize over multiple queries.
Loading a knowledgebase into a retrievable index is itself a one-off cost which needs to be amortized over multiple queries, and as we’ve explored in a previous post, it may not take a lot of queries to recover the cost of indexing. Therefore load-time enhancements which do not significantly introduce risk of performance degradation are likely to be worthwhile.
1 Propositional transformations
Propositional transformations help preserve essential details that might otherwise get lost in embedding space. By normalising context across text sections, this technique amplifies key linguistic features including provenance, entities, temporal markers, quantitative values, confidence levels, obligations, and thematic elements. This structured approach ensures critical meaning remains intact throughout the embedding process. See our guide on propositional chunking for more details
2 Text preprocessing
Text preprocessing involves deserialising and annotating content in structured formats while preserving hierarchy. This includes breaking PDFs and HTML into logical sections, identifying footnotes, disclaimers, and headings, and maintaining relationships between elements. The goal is to create a unified representation that retains context and references through the embedding process.
3 Semantic chunking
Split larger documents into semantically self-contained excerpts that preserve contextual integrity by building on the ‘shifting sense’ concept to group coherent content into smaller units. This prevents one vector from mixing multiple topics and preserves clarity in retrieval.
4 Agentic access controls
Basing retrieval on user roles and related signals can refine retrieval, especially in tightly regulated industries. For example, at load time, you could store role-based or domain metadata; at query time, you can filter or re-rank on that metadata, e.g. if the system knows a user is in the “environmental science” domain, it can weight results differently than for a “banking” role.
Which to implement first?
- If your corpora is large and domain-specific, start with load-time improvements (e.g chunking, domain tagging) so your embeddings have solid grounding from the outset.
- If your queries are short or ambiguous, consider query-time expansions or re-ranking to refine results.
flowchart TD subgraph X[ ] VectorDB end style X fill:none,stroke:#333,stroke-width:0px subgraph Query Path UserQuery{User query} --> SpellCheck[Spelling correction] SpellCheck --> QueryPlanner[Query Planner] VectorDB -- retrieved chunks --> Reranker QueryPlanner -- application context --> Reranker Reranker -- reranked chunks --> QueryPlanner QueryPlanner -- prompt --> LLM --> Response{Response} QueryPlanner -- query --> QueryExpander QueryPlanner -- application context --> QueryExpander -- query' --> VectorDB QueryPlanner <--> MultihopEngine[Multihop Reasoning Engine] end subgraph Load Path Document{Document} -- text extraction --> Chunker Chunker --> AcronymTagger[Domain-specific acronym tagger] AcronymTagger --> EmbeddingSelector[Embedding selector] EmbeddingSelector --> SemanticChunker SemanticChunker --> VectorDB end style SemanticChunker fill:#f9f,stroke:#333,stroke-width:2px style EmbeddingSelector fill:#f9f,stroke:#333,stroke-width:2px style AcronymTagger fill:#f9f,stroke:#333,stroke-width:2px style SpellCheck fill:#f9f,stroke:#333,stroke-width:2px style MultihopEngine fill:#f9f,stroke:#333,stroke-width:2px style QueryExpander fill:#f9f,stroke:#333,stroke-width:2px style Reranker fill:#f9f,stroke:#333,stroke-width:2px
Why it all matters
While Google sets the bar for speed, RAG pipelines often address multi-turn, domain-specific queries where accuracy and context fidelity can matter as much as raw speed. Many RAG systems operate in specialised or enterprise environments, handling interactive, conversation-like queries after a user has exhausted simpler web searches and now needs deeper context. Dense retrieval often acts as a ‘gatekeeper’ in RAG—though it can be complemented by hybrid or re-ranking approaches. Moreover, in naive RAG implementations, where there’s minimal domain adaptation, the gatekeeping problem is even more pronounced. If early retrieval is off-target, the generative component can produce inaccurate or speculative answers And in advanced agentic systems that deliver role-sensitive or policy-bound responses, precision is paramount.
In practice, domain-specific jargon and noisy queries underscore how quickly dense retrieval can yield inconsistent or low-recall results when the model lacks explicit disambiguation. Implementing semantic preprocessing and domain-aware embedding strategies can significantly reduce these risks. Likewise, propositional chunking ensures that critical data points remain intact—no matter how large or intricate the source documents might be.
Ultimately, bridging these gaps isn’t just operational, it’s about user trust and system reliability. When a system consistently returns contextually accurate results, users gain confidence in its answers. Users often expect the AI to ‘understand’ their queries in a human sense, not just match a few keywords. While large language models approximate understanding via patterns, strong retrieval alignment ensures those patterns align with user intent. By combining semantic chunking (addressing topic-shifts), robust preprocessing (handling formatting/noise), specialised embeddings (mitigating domain ambiguity), and thoughtful indexing (improving retrieval precision), you can significantly boost the accuracy and reliability of your RAG pipeline.
References
- Chen et al. (2023) — Empirical drop in recall/precision for dense retrieval under polysemy
- Reichman and Heck (2024) — Pre-trained model bounds for embedding capacity https://arxiv.org/abs/2402.11035v2
- Khattab et al. (2020) — ColBERT multi-vector retrieval architecture https://arxiv.org/abs/2004.12832
- How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval (2023) https://arxiv.org/abs/2302.07452
- Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization (2024) https://arxiv.org/abs/2401.00165
- Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives (2024) https://arxiv.org/abs/2210.17167
- Domain Adaptation for Dense Retrieval (2024) https://arxiv.org/abs/2403.08970