A.I. Transformer models for SMB E-Commerce
In this post we present a few possible directions for using the new A.I. Transformer in the field of E-Commerce.
The sections are not ordered in a particular way (as we are not sure where the size of the realistic impact will be the biggest):
Better personalized product recommendations than NCF?
Data Scientists like Denis Rothman present this case that Transformer models could be used not with natural language, but behavior data from E-Shops to predict better what they are likely to buy.
https://www.youtube.com/watch?v=QByLSKFSk0w
Other large platforms seem to do it already : https://paperswithcode.com/paper/behavior-sequence-transformer-for-e-commerce or apply it for specific cases like size : https://www.researchgate.net/publication/351342516_PreSizE_Predicting_Size_in_E-Commerce_using_Transformers or https://medium.com/nvidia-merlin/winning-the-sigir-ecommerce-challenge-on-session-based-recommendation-with-transformers-v2-793f6fac2994
We use already for a while Neural Collaborative Filtering as a standard AI recommendation algorithm in Boxalino: https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/387121333
However, this algorithm mainly showed good results for predicting the next purchase of logged in customers (and less clear results for cross-selling, up-selling or behavior-based recommendations for non logged in customers which we call VJP).
It could be interesting to see if a Transformer model could provide better results for the non typical of personalized recommendations based on the purchase history (which we call CJP).
Key opportunities:
WELCOME: Product suggestions on start pages - VJP : based on the prior visited pages and clicks
https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/384729167 - VJP : based on the prior visited pages and clicks
UP-SELL: Similar Product Suggestions on PDP - VJP & PCC: could we find what other products / content are similar using these behaviors?
CROSS-SELL: Product Suggestions on Overlay and Basket Page - VJP & PCC: could we find what other products are good to add to a what is already in the basket using these behaviors?
https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/384761963
INDIVIDUALIZE: Product Suggestions in E-Mails VJP : same as WELCOME but here preprocessing is less a problem as we have the time to generate them
PROMOTE: Banner Suggestions in E-shop and E-Mail - VJP : same as WELCOME but with marketing banners and not with products
READ: Promote Blog and Editorial Content - VJP : same as WELCOME but with blog/magazine/… content and not with products
Key challenges / questions:
How to run the prediction in real-time? which is optimal for VJP
If not possible, what type of preprocessing cases would be prepared and how should they be detected (to be applied) in real-time
Is the approach usable in a non-personalized way for up-selling and cross-selling? or only for VJP?
What should the transformer predicts? is it more a short list of products to highlights (recommend or put at the top of a listing) or more generalizable rules (which would be less precise, but affect more than a short list of products) or a mix of both?
“Best Next Click”: Visitor Pathways Optimization
Identifying and reporting on the key patterns (positive or negative, or simply important) of the visitor journey is not easy to do in a way which shows valuable and usable insight to our clients.
Maybe a Transformer model trained on the online behaviors could highlight important navigation flows (sequences / patterns) and make suggestions where the flows are failing (dead-ends, …) or are working well and should be amplified (with more traffic).
We consider this a key opportunity for the visual guidance in the e-shop, as Boxalino can not only show products (listing and recommendations), but also suggest “next pages” either as a filtering or a change of the current context of the page.
As example, the a session (or a visit) can be viewed as a sequence of steps (events) mainly (but not 100%) corresponding to a sequence of page views.
To make it simple, we can consider here for the description a page view (the event) as URL visited of a web-site.
Each of these event represent a step in the session journey and has the following properties:
name | format | description |
---|---|---|
session_id | STRING | A unique string identifier for the session |
page_view_number | INT64 | a number form 1 to n defining the step in the session (so first page view is 1, second is 2, etc.) |
event_timestamp | TIMESTAMP | the timetamp of the current event |
next_event_timestamp | TIMESTAMP | the timestamp of the next event |
event | EVENT (RECORD) | details about the event |
page_location | STRING | url of the page for this event |
related_requests | ARRAY<REQUEST (RECORD)> | additional information giving us more details about what the page is and what the user has seen Requests represent information which define what type of filters (for instance a brand if the user is on a brand page) was applied and what type of products was shown on the page |
related_events | ARRAY<EVENT (RECORD)> | additional information giving us more details about what the page is and what the user has seen These other events represent more information about the type of page (category page, search page, etc.) as well as engagement of the user with products (and content) on the page (scroll when the content/product is displayed or click when a content/product is clicked) |
Detailed information about the Records (event and request):
Event
name | format | description |
---|---|---|
event_timestamp | TIMESTAMP | timestamp of the event |
event_name | STRING | name of the event:
|
event_params | RECORD | associative array of parameters about the event |
visitor | RECORD | information about the visitor |
device | RECORD | information about the device |
geo | RECORD | information about the geo-ip location |
traffic_source | RECORD | information about the traffic source |
ecommerce | RECORD | only for purchase event, indicate the information of the purchase revenue and other sales parameters |
items | ARRAY<RECORD> | for purchase, view_item and add_to_basket, information about the product(s) |
system | RECORD | some technical parameters |
creation_tm | TIMESTAMP | technical debugging field, please ignore |
client_id | STRING | technical debugging field, please ignore |
src_sys_id | STRING | technical debugging field, please ignore |
Request:
Name | Format | Description |
---|---|---|
request_id | STRING | a unique identifier of the request |
variant_id | STRING | the identifier of the algorithm test variant which was applied |
bundle_id | STRING | a groupind identifer (technical debugging field, please ignore) |
request_ts | TIMESTAMP | the request timestamp |
session_id | STRING | the request session id |
request_order_num | INT64 | the numeric identifier for bundling (technical debugging field, please ignore) |
choice_id | STRING | the name of the logic used (for example ‘search’ for a search result page, ‘navigation’ for a category listing page, etc.) see widget here : Narrative API - Technical Reference | Base parameters |
scenario_id | STRING | a specific scenario of the test variant algorithm which was applied |
language_cd | STRING | the language of the request, as here: Narrative API - Technical Reference | Base parameters |
groupby_cd | STRING | what field the results of the requests where grouped by |
products_group_id | STRING | see details here : Narrative API - Technical Reference | Typical parameters |
offset | INT64 | see details here : Narrative API - Technical Reference | Typical parameters |
query_txt | STRING | see details here : Narrative API - Technical Reference | Typical parameters |
final_query_txt | STRING | see details here : Narrative API - Technical Reference | Typical parameters |
sort_field_name | STRING | see details here : Narrative API - Technical Reference | Typical parameters |
sort_direction_cd | STRING | see details here : Narrative API - Technical Reference | Typical parameters |
items | LIST<RECORD> | see details here : Narrative API - Technical Reference | UP SELL / CROSS SELL REQUEST |
filters | LIST<RECORD> | see details here : Narrative API - Technical Reference | Typical parameters |
facets | LIST<RECORD> | see details here : Narrative API - Technical Reference | Facets without selected Value by with additional parameters: |
parameters | LIST<RECORD> | see details here : Narrative API - Technical Reference | Typical parameters |
others | LIST<RECORD> | see details here : Narrative API - Technical Reference | Typical parameters |
response | LIST<RECORD> | the products (and other content) which were returned and shown on the page |
creation_tm | TIMESTAMP | technical debugging field, please ignore |
client_id | STRING | technical debugging field, please ignore |
src_sys_id | STRING | technical debugging field, please ignore |
Similarity between products, contents and across both
Transformer models are well adapted to work with NLP for which the textual description of products as well as the textual content of blog articles / magazine.
Such content could be used for several reasons (as highlighted in other sections of the post) but one of them could be the creation of similarities / related contents between products, between contents and between products and content.
Generate Texts for SEO
Our clients use SEO tools like SEMRush which could be (but are not yet) exported to Google BigQuery (https://www.semrush.com/kb/5-api ).
Our clients who use such tools are able to identify many (small) opportunities that they have to improve each time manually.
Generating SEO improvement (new texts, better texts based on suggested keywords, etc.) could be a massive win.
Boxalino could automate the implementation of these SEO texts on the web-site and provide this way a feed-back loop if the SEO data are exported to BigQuery to check and fine-tune the results automatically.
The logic could also simply improve / enrich the product textual description as described here:
It is also important to mention that we have (sometimes) the product description of competitors. So comparing the text and make it more unique could be a strong advantage.
A/B test different texts
Boxalino can easily provide different textual content for a product page or a content page in a personalized or a/b tested way.
Currently, this option is not used because of the challenge of automatically creating an alternative version of the textual description of thousands of products and/or contents.
To be able to “change the tone”, “change the style”, or other type of systematic changes of the textual content could be very interesting to a/b test and to use for personalization.
Generate Texts for Landing Pages
Boxalino generates many landing pages for which a list of products is displayed. Adding automatically generated texts to these landing pages would make them directly more effective for SEO and SEA.
Here is an example showing a list of products and 2 random banners on top (which is not optimal):
https://www.mcdrogerie.ch/t-Eisen
Generate Small Netflix-like label from filter
Boxalino generates mechanically small prompts on the home page of e-shops like Qualipet based on filters:
For example : “NEUHEITEN VON HARMONY FÜR IHRE KATZE”
is generated from a filter on the property : “new” the brand “Harmony” and the animal “cat”.
The mechanism is very limited and can easily generate non-optimal prompts.
It would be interesting to see if a Transformer model would be able to generate better prompts.
Emotional analysis of user comments
Understanding better for which product (or group of products) user comments are positive or not could be valuable for many aspects (as described here):
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0247984