This post is co-written with Elliott Choi from Cohere.
The ability to quickly access relevant information is a key differentiator in today’s competitive landscape. As user expectations for search accuracy continue to rise, traditional keyword-based search methods often fall short in delivering truly relevant results. In the rapidly evolving landscape of AI-powered search, organizations are looking to integrate large language models (LLMs) and embedding models with Amazon OpenSearch Service. In this blog post, we’ll dive into the various scenarios for how Cohere Rerank 3.5 improves search results for best matching 25 (BM25), a keyword-based algorithm that performs lexical search, in addition to semantic search. We will also cover how businesses can significantly improve user experience, increase engagement, and ultimately drive better search outcomes by implementing a reranking pipeline.
Amazon OpenSearch Service
Amazon OpenSearch Service is a fully managed service that simplifies the deployment, operation, and scaling of OpenSearch in the AWS Cloud to provide powerful search and analytics capabilities. OpenSearch Service offers robust search capabilities, including URI searches for simple queries and request body searches using a domain-specific language for complex queries. It supports advanced features such as result highlighting, flexible pagination, and k-nearest neighbor (k-NN) search for vector and semantic search use cases. The service also provides multiple query languages, including SQL and Piped Processing Language (PPL), along with customizable relevance tuning and machine learning (ML) integration for improved result ranking. These features make OpenSearch Service a versatile solution for implementing sophisticated search functionality, including the search mechanisms used to power generative AI applications.
Overview of traditional lexical search and semantic search using bi-encoders and cross-encoders
Two important techniques for using end-user search queries are lexical search and semantic search. OpenSearch Service natively supports BM25. This method, while effective for keyword searches, lacks the ability to recognize the intent or context behind a query. Lexical search relies on exact keyword matching between the query and documents. For a natural language query searching for “super hero toys,” it retrieves documents containing those exact terms. While this method is fast and works well for queries targeted at specific terms, it fails to capture context and synonyms, potentially missing relevant results that use different words such as “action figures of superheroes.” Bi-encoders are a specific type of embedding model designed to independently encode two pieces of text. Documents are first turned into an embedding or encoded offline and queries are encoded online at search time. In this approach, the query and document encodings are generated with the same embedding algorithm. The query’s encoding is then compared to pre-computed document embeddings. The similarity between query and documents is measured by their relative distances, despite being encoded separately. This allows the system to recognize synonyms and related concepts, such as “action figures” is related to “toys” and “comic book characters” to “super heroes.”
By contrast, processing the same query—”super hero toys”—with cross-encoders involves first retrieving a set of candidate documents using methods such as lexical search or bi-encoders. Each query-document pair is then jointly evaluated by the cross-encoder, which inputs the combined text to deeply model interactions between the query and document. This approach allows the cross-encoder to understand context, disambiguate meanings, and capture nuances by analyzing every word in relation to each other. It also assigns precise relevance scores to each pair, re-ranking the documents so that those most closely matching the user’s intent—specifically about toys depicting superheroes—are prioritized. Therefore, this significantly enhances search relevancy compared to methods that encode queries and documents independently.
It’s important to note that the effectiveness of semantic search, such as two-stage retrieval search pipelines, depend heavily on the quality of the initial retrieval stage. The primary goal of a robust first-stage retrieval is to efficiently recall a subset of potentially relevant documents from a large collection, setting the foundation for more sophisticated ranking in later stages. The quality of the first-stage results directly impacts the performance of subsequent ranking stages. The goal is to maximize recall and capture as many relevant documents as possible because the later ranking stage has no way to recover excluded documents. A poor initial retrieval can limit the effectiveness of even the most sophisticated re-ranking algorithms.
Overview of Cohere Rerank 3.5
Cohere is an AWS third-party model provider partner that provides advanced language AI models, including embeddings, language models, and reranking models. See Cohere Rerank 3.5 now generally available on Amazon Bedrock to learn more about accessing Cohere’s state-of- the-art models using Amazon Bedrock. The Cohere Rerank 3.5 model focuses on enhancing search relevance by reordering initial search results based on deeper semantic understanding of the user query. Rerank 3.5 uses a cross-encoder architecture where the input of the model always consists of a data pair (for example, a query and a document) that is processed jointly by the encoder. The model outputs an ordered list of results, each with an assigned relevance score, as shown in the following GIF.
Cohere Rerank 3.5 with OpenSearch Service search
Many organizations rely on OpenSearch Service for their lexical search needs, benefiting from its robust and scalable infrastructure. When organizations want to enhance their search capabilities to match the sophistication of semantic search, they are challenged with overhauling their existing systems. Often it is a difficult engineering task for teams or may not be feasible. Now through a single Rerank API call in Amazon Bedrock, you can integrate Rerank into existing systems at scale. For financial services firms, this means more accurate matching of complex queries with relevant financial products and information. For e-commerce businesses, they can improve product discovery and recommendations, potentially boosting conversion rates. The ease of integration through a single API call with Amazon OpenSearch enables quick implementation, offering a competitive edge in user experience without significant disruption or resource allocation.
In benchmarks conducted by Cohere, the normalized Discounted Cumulative Gain (nDCG), Cohere Rerank 3.5 improved accuracy when compared to Cohere’s previous Rerank 3 model as well as BM25 and hybrid search across a financial, e-commerce and project management data sets. The nDCG is a metric that’s used to evaluate the quality of a ranking system by assessing how well ranked items align with their actual relevance and prioritizes relevant results at the top. In this study, @10 indicates that the metric was calculated considering only the top 10 items in the ranked list. The nDCG metric is helpful because metrics such as precision, recall, and the F-score measure predictive performance without taking into account the position of ranked results. Whereas the nDCG normalizes scores and discounts relevant results that are returned lower on the list of results. The following figures below shows these performance improvements of Cohere Rerank 3.5 for financial domain as well as e-commerce evaluation consisting of external datasets.
Also, Cohere Rerank 3.5, when integrated with OpenSearch, can significantly enhance existing project management workflows by improving the relevance and accuracy of search results across engineering tickets, issue tracking systems, and open-source repository issues. This enables teams to quickly surface the most pertinent information from their extensive knowledge bases and boosting productivity. The following figure demonstrates the performance improvements of Cohere Rerank 3.5 for project management evaluation.
Combining reranking with BM25 for enterprise search is supported by studies from other organizations. For instance Anthropic, an artificial intelligence startup founded in 2021 that focuses on developing safe and reliable AI systems, conducted a study that found using reranked contextual embedding and contextual BM25 reduced the top-20-chunk retrieval failure rate by 67%, from 5.7% to 1.9%. The combination of BM25’s strength in exact matching with the semantic understanding of reranking models addresses the limitations of each approach when used alone and delivers a more effective search experience for users.
As organizations strive to improve their search capabilities, many find that traditional keyword-based methods such BM25 have limitations in understanding context and user intent. This leads customers to explore hybrid search approaches that combine the strengths of keyword-based algorithms with the semantic understanding of modern AI models. OpenSearch Service 2.11 and later supports the creation of hybrid search pipelines using normalization processors directly within the OpenSearch Service domain. By transitioning to a hybrid search system, organizations can use the precision of BM25 while benefiting from the contextual awareness and relevance ranking capabilities of semantic search.
Cohere Rerank 3.5 acts as a final refinement layer, analyzing the semantic and contextual aspects of both the query and the initial search results. These models excel at understanding nuanced relationships between queries and potential results, considering factors like customer reviews, product images, or detailed descriptions to further refine the top results. This progression from keyword search to semantic understanding, and then applying advanced reranking, allows for a dramatic improvement in search relevance.
How to integrate Cohere Rerank 3.5 with OpenSearch Service
There are several options available to integrate and use Cohere Rerank 3.5 with OpenSearch Service. Teams can use OpenSearch Service ML connectors which facilitate access to models hosted on third-party ML platforms. Every connector is specified by a connector blueprint. The blueprint defines all the parameters that you need to provide when creating a connector.
In addition to the Bedrock Rerank API, teams can use the Amazon SageMaker connector blueprint for Cohere Rerank hosted on Amazon Sagemaker for flexible deployment and fine-tuning of Cohere models. This connector option works with other AWS services for comprehensive ML workflows and allows teams to use the tools built into Amazon SageMaker for model performance monitoring and management. There is also a Cohere native connector option available that provides direct integration with Cohere’s API, offering immediate access to the latest models and is suitable for users with fine-tuned models on Cohere.
See this general reranking pipeline guide for OpenSearch Service 2.12 and later or this tutorial to configure a search pipeline that uses Cohere Rerank 3.5 to improve a first-stage retrieval system that can run on the native OpenSearch Service vector engine.
Conclusion
Integrating Cohere Rerank 3.5 with OpenSearch Service is a powerful way to enhance your search functionality and deliver a more meaningful and relevant search experience for your users. We covered the added benefits a rerank model could bring to various businesses and how a reranker can enhance search. By tapping into the semantic understanding of Cohere’s models, you can surface the most pertinent results, improve user satisfaction, and drive better business outcomes.
About the Authors
Breanne Warner is an Enterprise Solutions Architect at Amazon Web Services supporting healthcare and life science (HCLS) customers. She is passionate about supporting customers to use generative AI on AWS and evangelizing model adoption for 1P and 3P models. Breanne is also on the Women@Amazon board as co-director of Allyship with the goal of fostering inclusive and diverse culture at Amazon. Breanne holds a Bachelor of Science in Computer Engineering from University of Illinois at Urbana Champaign (UIUC).
Karan Singh is a generative AI Specialist for 3P models at AWS where he works with top-tier 3P foundational model providers to define and execute join GTM motions that help customers train, deploy, and scale models to enable transformative business applications and use cases across industry verticals. Karan holds a Bachelor of Science in Electrical and Instrumentation Engineering from Manipal University, a Masters in Science in Electrical Engineering from Northwestern University, and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.
Hugo Tse is a Solutions Architect at Amazon Web Services supporting independent software vendors. He strives to help customers use technology to solve challenges and create business opportunities, especially in the domains of generative AI and storage. Hugo holds a Bachelor of Arts in Economics from the University of Chicago and a Master of Science in Information Technology from Arizona State University.
Elliott Choi is a Staff Product Manager at Cohere working on the Search and Retrieval Team. Elliott holds a Bachelor of Engineering and a Bachelor of Arts from the University of Western Ontario.