An overview of Retrieval Augmented Generation with Vector Similarity Search
“oh, it looks like you’ve screwed the rear leg backwards”it would be something like
“seems like you’re assembling chair CX-184. You may have skipped step 8 in the assembly process, since the rear leg is screwed backwards. Here is a step-by-step solution from the assembly guide: …”.Note how both answers recognized the issue correctly, but since the LLM had additional context in the second answer, it was also able to provide a solution and more specific details. That’s the jist of RAG - LLMs provide higher-quality responses when provided with more context surrounding a query. While the core concept itself is quite obvious, the complexity arises in how we can effectively retrieve the correct information. In the following sections, we explain one way to effectively perform RAG based on the concept of vector embeddings and similarity search (we’ll explain what these mean!).
Why do we use cosine similarity?
“Why does the chair feel unstable?”The retrieval might return chunks like:
“User query: ‘Why does the chair feel unstable?’ Context from manual: ‘Ensure screws (B3) from step 5 are tightened completely. Instability may result if the cross-bar (part D) from step 7 is incorrectly positioned.’”This detailed context enables the LLM to generate a highly informed, actionable response.
“The chair’s instability is likely caused by loose screws (B3) from step 5 or an incorrectly positioned cross-bar (part D). Verify these areas, tightening screws fully and checking that part D matches the orientation shown in step 7 of the assembly manual.”This demonstrates how RAG improves the response quality by leveraging external knowledge effectively.