From Keywords Search to Semantic (Embeddings) Search
Boxalino is proud to announce the integration of advanced Text Embedding capabilities into its suite of core technologies. This leap forward in AI-driven semantic intelligence significantly enhances e-commerce and lead-generation websites. The implementation particularly revolutionizes the in-site search experience by understanding and interpreting customer queries with unprecedented accuracy. By converting text into numerical vectors, Boxalino's system can grasp the nuanced meanings behind search terms, differentiate between synonyms and related terms, and ultimately provide more relevant search results. This not only streamlines the shopping experience for users but also aligns with business goals by connecting customers to the products they're most likely to purchase. Boxalino's commitment to innovation empowers your website to deliver smarter, faster, and more intuitive search functions, thus elevating the overall customer journey.
What are Text Embeddings and why should you care?
Text embeddings are a method used in machine learning to represent words, phrases, or entire documents as numerical vectors in a high-dimensional space. The example in the image showcases how text embeddings work by illustrating the concept of semantic similarity.
When a user searches for the word "cat," the search system needs to understand that "kitty," a synonym, is highly relevant even though the letters in the two words are completely different. Through the process of text embedding, both "cat" and "kitty" are translated into numerical vectors that are very close to each other in the vector space. This proximity reflects their semantic similarity.
On the other hand, a word like "dog," while still an animal, is less semantically similar to "cat" or "kitty." Therefore, its vector representation would be positioned farther away from "cat" and "kitty" in the embedding space.
This method enables the system to recognize that a search for "kitty" should also return products and content related to "young cats," and vice versa, even if the exact search term isn't present in the product's name or description. The ability of text embeddings to capture this semantic similarity is what makes them so powerful for enhancing search functions in AI-driven applications like e-commerce and lead-generation websites.
How do we process your products and content data to empower them with Text Embeddings?
The data processing workflow within the data platform is a structured and systematic approach aimed at enhancing the search capabilities of an e-commerce platform through the use of embeddings.
The process begins with the data platform collecting detailed information about every new content or product that is exported from the e-commerce website. This data encompasses a variety of attributes, including product names, descriptions, features, categories, and relevant tags.
Once this information is gathered, a Natural Language Processing (NLP) model takes center stage. This model utilizes sophisticated text processing techniques to convert the rich textual information of product descriptions into high-dimensional numerical vectors, known as embeddings. These embeddings capture the semantic essence of the products.
After the embeddings are created, they are treated as a new attribute for each piece of content or product. This embedding attribute encapsulates the semantic properties of the items in a form that machines can swiftly process.
The newly minted embedding attribute is then dispatched to the real-time platform. Here, the attribute is processed and indexed in the same way as any other property, such as brand, category, or recent sales count. However, the indexing process is specialized for dense vector search, optimizing the search engine to recognize and respond to the semantic relationships within the data.
In summary, the data platform not only collects and processes new content but also enhances it with embeddings. This attribute is then meticulously indexed to power a robust and semantically intelligent search experience for the users of the e-commerce platform.
How do we use Text Embeddings our Real-Time Platform?
In the real-time platform, embeddings play a critical role in enhancing the search and personalization experience on e-commerce sites. Here's how it works:
Real-Time Vector Search:
As products and content are added to the e-commerce platform, each item's textual information is transformed into an embedding vector using NLP techniques.
These embedding vectors are then indexed in the real-time platform, optimized for dense vector search.
When a user performs a search, the real-time platform retrieves the embeddings of the search query. If the query's embeddings are already known, they are used directly for performance efficiency.
The platform calculates the semantic distance between the search query's embedding and the product embeddings in real-time. This helps to determine which products are most relevant to the search term, even if there isn't an exact text match.
Combining with Other Factors:
The real-time platform doesn't rely solely on embeddings. It combines these with other personalization and marketing rules to compute a dynamic score for each product.
This dynamic score is used to select and sort product listings, ensuring that each user sees the most relevant products based on their unique search and their profile.
A/B Testing:
To measure the effectiveness of using embeddings, A/B testing is conducted. Some users are shown results influenced by embeddings, while others are shown results without the use of embeddings.
This allows for a comparison of user engagement and sales conversions, providing insights into the impact of embeddings on user experience and business metrics.
Beyond Search:
Embeddings are also leveraged for functions other than search. They are used to calculate the similarity distance between products, enabling recommendations of similar items.
They can assess the relationship between products and content, facilitating the display of relevant articles or videos to customers.
Embeddings contribute to real-time personalization by adjusting what the user sees based on their immediate interactions with the platform.
Performance and Flexibility:
The platform can call external AI APIs in real time when new data is needed, ensuring that the embeddings are always up-to-date and relevant.
This flexibility allows the platform to maintain high performance while adapting to the changing behaviors and preferences of users.
In essence, the use of embeddings within a real-time platform is a sophisticated approach that enhances user experience by delivering personalized, relevant, and semantically intelligent search results and product recommendations on e-commerce sites.
1st Use Case : Semantic Site Search
The image illustrates the relevance of semantic search for e-commerce sites. By using embeddings, a search for "hard disk" not only returns exact matches but also semantically similar products like "SSD" and "Micro SD". This happens because in the vector space, the embeddings for "SSD" and "HDD" are closer to "hard disk" than to unrelated products like "processor" or "fan". Semantic search enhances the user's experience by finding a range of relevant products, even if the search term isn't explicitly mentioned in the product title or description, thus improving product discovery and potentially increasing sales.
2nd Use Case : Similar (Content<->Product) Suggestions
The second use-case for embeddings is to establish similarities between products and between products and content. When a user reads an article like "Feeding the kittens," the AI analyzes the article's embedding and compares it with the embeddings of other products and content. Products such as kitten food or accessories, or related articles with the closest embeddings, are then recommended to the user. This process eliminates the need for manual rule-setting for product and content associations, streamlining the process of providing relevant recommendations and enhancing the user experience on e-commerce platforms.
3rd Use Case : Advanced (Newsletter) Personalization
For advanced newsletter personalization, the AI system aggregates embeddings of a customer's purchases, as well as their recent searches and clicks. By analyzing the vector space where these embeddings reside, the AI can identify patterns and preferences, determining both enduring interests and emerging trends in the customer's behavior. This insight allows the AI to personalize content, recommending new products that align closely with the customer's profile, which enhances engagement and potentially increases sales through a tailored and relevant newsletter experience.
Conclusion
In conclusion, we currently utilize the generation 3 OpenAI Ada 002 model for our embedding needs, which has served us well thus far. However, we are excited to announce our plans to upgrade to the new generation 3 models. The new models promise enhanced accuracy, particularly for languages like German, which is crucial for providing a more nuanced and effective service for our diverse user base.
Furthermore, the new generation 3 models offer the capability for lower-dimensional embeddings, which is a significant advancement. This feature allows for more streamlined and efficient real-time performance, essential for maintaining the responsiveness our users expect from our platform.
By incorporating these new models as well as the ones from other suppliers like Google, we aim to strike an optimal balance between the fidelity of the embeddings and the computational efficiency required for fast, real-time applications. This upgrade underscores our commitment to staying at the forefront of AI-powered solutions, constantly improving the user experience, and providing the best possible service.