Using Azure Cognitive Search Capabilities for B2B eCommerce Catalog

In the first article about the B2B portal catalog search service, we talked about the benefits of the Elasticsearch engine. This is a very popular and fast search engine, and it works effectively with structured data. However, in B2B, there is a lot of unstructured data associated with the products listed in the catalog. Various manuals, photos, and even handwritten notes can enrich the description of products and help customers improve the user experience and make their choice.

In this article, we explore Azure Cognitive Search capabilities that leverage hosted AI Services. For a B2B ecommerce portal, Azure Cognitive Search allows you to quickly index unstructured data in any format, enabling your clients can knowledge mining from untapped information inside various client-oriented databases.

Azure Cognitive Search is a Platform as a Service, or a PaaS solution, that allows you to build sophisticated search capabilities within your applications on your data. It is easy to integrate search within your line of ecommerce applications.

The real advantage of the Azure Cognitive Search platform is that it allows you to bring in data from many different formats, indexing it so that it is traceable and searchable using AI.

Developers are offered search extensions called "cognitive skills" to search across different types of media, including vision, language, and speech. It is extremely high-tech to use custom machine learning models to extract information from all types of content. Azure Cognitive Search offers Semantic Search that uses advanced machine learning techniques to understand a user's area of interest and contextual rankings to show the most relevant search results first.

Most B2B vendors that have been in business for a while estimate that at least 80% of their data is unstructured within documents of various types that contains valuable information. Since the data are unstructured, it is hard to use the information for product catalog enrichment until the release of Azure Cognitive Search.

Azure Cognitive Search AI-driven capabilities

The Azure Cognitive Search service allows you to quickly discover, enrich, and explore your data. You can automatically pull data from Azure data stores such as blob storage, Azure sequel, Cosmos DB, and more. You can search not only by databases (Azure Cosmos DB, Azure SQL Database, SQL Server hosted in an Azure VM), but also by Blob (Azure Blob Storage, Azure Table Storage). The flexibility of the service is so great that it allows optionally to push data directly into the search index from other cloud storage locations using the Azure push API.

Additional information enrichment is available for searchable metadata with the prebuilt cognitive skills integrated into the Azure Cognitive Search platform. These skills allow exploring multimedia sources to do things like extracting key phrases, image metatags, detect foreign language, and more. Different file formats such as PDF, Word documents, images, JSON files, and more are supported. Azure could also add a language detection skill for your search and provide relevant search results.

Among out-of-the-box skills, custom skills could be defined and integrated into your machine learning model. For example, for a B2B portal in manufacturing, materials used for an exact product could be extracted from vendor documentation and appear in response to a search query.

All of these capabilities combined give you a powerful search experience. Imagine, you use Azure Cognitive Search to develop a search application where a product search should return technical manuals associated with the product. As documents are ingested into the Azure platform, they are processed and categorized according to the search index.

Azure’s out-of-the-box cognitive skills support keyphrase extraction, and named entity recognition generates a rich corpus of metadata. Among out-of-the-box skills, you can customize the cognitive search pipeline to extract and enrich metadata specific to your business. This process can be done programmatically as the Azure Cognitive Search service is well integrated with Azure databases.

If you don't need an ML model, deploying an Azure function is the quickest and easiest way to create a custom skill. Creating tags for domain-specific terms like product material is an example of how to use the Azure function. Then continue with publishing this skill set and get a new skill programmatically connected to the document exploring pipeline.

For files uploaded to the blob, it is possible to use OCR (Optical Character Recognition). Recognition of handwritten (so far only English) and printed text is possible. With the help of cognitive services, it is possible to identify various objects in the photo; for example, famous places or celebrities.

To summarize, you might consider using built-in cognitive skills if your original content consists of unstructured text, images, or content that requires language detection and translation. When using AI, you might consider adding a custom skill if you have open source, third party, or native code that you want to integrate into the search pipeline. These are classification models that define the characteristics of different types of documents.

Few practical links to learn about Azure Cognitive Search

Azure Cognitive Search has a free plan that allows you to create indexes with a relatively appropriate size. The free plan does not have any advanced features, but it is quite suitable for use at the beginning, especially to learn Azure Cognitive Search capabilities.

To learn more about how to create ML-driven custom skills, please visit https://docs.microsoft.com/en-us/azure/search/cognitive-search-create-custom-skill-example.

Visit Azure samples on GitHub to get code and manuals leveraging Azure Cognitive Search: https://github.com/Azure-Samples/azure-search-knowledge-mining/tree/main/00%20-%20Resource%20Deployment.

For information, how Azure-driven catalog search works in Virto Commerce and its architectural details, please visit the Search Fundamentals section of the platform documentation.

DISCOVER VIRTO COMMERCE ARCHITECTURAL GUIDELINES
Oleg Zhuk
Technical Product Owner