A.I. Transformer models for SMB E-Commerce

In this post we present a few possible directions for using the new A.I. Transformer in the field of E-Commerce.

The sections are not ordered in a particular way (as we are not sure where the size of the realistic impact will be the biggest):

Better personalized product recommendations than NCF?

Data Scientists like Denis Rothman present this case that Transformer models could be used not with natural language, but behavior data from E-Shops to predict better what they are likely to buy.

https://www.youtube.com/watch?v=QByLSKFSk0w

Other large platforms seem to do it already : https://paperswithcode.com/paper/behavior-sequence-transformer-for-e-commerce or apply it for specific cases like size : https://www.researchgate.net/publication/351342516_PreSizE_Predicting_Size_in_E-Commerce_using_Transformers or https://medium.com/nvidia-merlin/winning-the-sigir-ecommerce-challenge-on-session-based-recommendation-with-transformers-v2-793f6fac2994

We use already for a while Neural Collaborative Filtering as a standard AI recommendation algorithm in Boxalino: https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/387121333

However, this algorithm mainly showed good results for predicting the next purchase of logged in customers (and less clear results for cross-selling, up-selling or behavior-based recommendations for non logged in customers which we call VJP).

It could be interesting to see if a Transformer model could provide better results for the non typical of personalized recommendations based on the purchase history (which we call CJP).

Key opportunities:

Key challenges / questions:

  • How to run the prediction in real-time? which is optimal for VJP

  • If not possible, what type of preprocessing cases would be prepared and how should they be detected (to be applied) in real-time

  • Is the approach usable in a non-personalized way for up-selling and cross-selling? or only for VJP?

  • What should the transformer predicts? is it more a short list of products to highlights (recommend or put at the top of a listing) or more generalizable rules (which would be less precise, but affect more than a short list of products) or a mix of both?

“Best Next Click”: Visitor Pathways Optimization

Identifying and reporting on the key patterns (positive or negative, or simply important) of the visitor journey is not easy to do in a way which shows valuable and usable insight to our clients.

Maybe a Transformer model trained on the online behaviors could highlight important navigation flows (sequences / patterns) and make suggestions where the flows are failing (dead-ends, …) or are working well and should be amplified (with more traffic).

We consider this a key opportunity for the visual guidance in the e-shop, as Boxalino can not only show products (listing and recommendations), but also suggest “next pages” either as a filtering or a change of the current context of the page.

As example, the a session (or a visit) can be viewed as a sequence of steps (events) mainly (but not 100%) corresponding to a sequence of page views.

To make it simple, we can consider here for the description a page view (the event) as URL visited of a web-site.

Each of these event represent a step in the session journey and has the following properties:

name

format

description

name

format

description

session_id

STRING

A unique string identifier for the session

page_view_number

INT64

a number form 1 to n defining the step in the session (so first page view is 1, second is 2, etc.)

event_timestamp

TIMESTAMP

the timetamp of the current event

next_event_timestamp

TIMESTAMP

the timestamp of the next event

event

EVENT (RECORD)

details about the event

page_location

STRING

url of the page for this event

related_requests

ARRAY<REQUEST (RECORD)>

additional information giving us more details about what the page is and what the user has seen

Requests represent information which define what type of filters (for instance a brand if the user is on a brand page) was applied and what type of products was shown on the page

related_events

ARRAY<EVENT (RECORD)>

additional information giving us more details about what the page is and what the user has seen

These other events represent more information about the type of page (category page, search page, etc.) as well as engagement of the user with products (and content) on the page (scroll when the content/product is displayed or click when a content/product is clicked)

 

Detailed information about the Records (event and request):

Event

name

format

description

name

format

description

event_timestamp

TIMESTAMP

timestamp of the event

event_name

STRING

name of the event:

  • page_view: visit a page

  • add_to_cart: add a product to the cart

  • purchase: make a purchase

  • view_search: visit the search page (with listing of results)

  • scroll: some content/product appears on the screen

  • click: some content/product is clicked

  • login: the visitor logged in

  • view_item: visited a product page

  • view_listing: visited a category listing page

event_params

RECORD

associative array of parameters about the event

visitor

RECORD

information about the visitor

device

RECORD

information about the device

geo

RECORD

information about the geo-ip location

traffic_source

RECORD

information about the traffic source

ecommerce

RECORD

only for purchase event, indicate the information of the purchase revenue and other sales parameters

items

ARRAY<RECORD>

for purchase, view_item and add_to_basket, information about the product(s)

system

RECORD

some technical parameters

creation_tm

TIMESTAMP

technical debugging field, please ignore

client_id

STRING

technical debugging field, please ignore

src_sys_id

STRING

technical debugging field, please ignore

Request:

Name

Format

Description

Name

Format

Description

request_id

STRING

a unique identifier of the request

variant_id

STRING

the identifier of the algorithm test variant which was applied

bundle_id

STRING

a groupind identifer (technical debugging field, please ignore)

request_ts

TIMESTAMP

the request timestamp

session_id

STRING

the request session id

request_order_num

INT64

the numeric identifier for bundling (technical debugging field, please ignore)

choice_id

STRING

the name of the logic used (for example ‘search’ for a search result page, ‘navigation’ for a category listing page, etc.)

see widget here : Narrative API - Technical Reference | Base parameters

scenario_id

STRING

a specific scenario of the test variant algorithm which was applied

language_cd

STRING

the language of the request, as here: Narrative API - Technical Reference | Base parameters

groupby_cd

STRING

what field the results of the requests where grouped by

products_group_id

STRING

see details here : Narrative API - Technical Reference | Typical parameters

offset

INT64

see details here : Narrative API - Technical Reference | Typical parameters

query_txt

STRING

see details here : Narrative API - Technical Reference | Typical parameters

final_query_txt

STRING

see details here : Narrative API - Technical Reference | Typical parameters

sort_field_name

STRING

see details here : Narrative API - Technical Reference | Typical parameters

sort_direction_cd

STRING

see details here : Narrative API - Technical Reference | Typical parameters

items

LIST<RECORD>

see details here : Narrative API - Technical Reference | UP SELL / CROSS SELL REQUEST

filters

LIST<RECORD>

see details here : Narrative API - Technical Reference | Typical parameters

facets

LIST<RECORD>

see details here : Narrative API - Technical Reference | Facets without selected Value by with additional parameters:

parameters

LIST<RECORD>

see details here : Narrative API - Technical Reference | Typical parameters

others

LIST<RECORD>

see details here : Narrative API - Technical Reference | Typical parameters

response

LIST<RECORD>

the products (and other content) which were returned and shown on the page

creation_tm

TIMESTAMP

technical debugging field, please ignore

client_id

STRING

technical debugging field, please ignore

src_sys_id

STRING

technical debugging field, please ignore

Similarity between products, contents and across both

Transformer models are well adapted to work with NLP for which the textual description of products as well as the textual content of blog articles / magazine.

Such content could be used for several reasons (as highlighted in other sections of the post) but one of them could be the creation of similarities / related contents between products, between contents and between products and content.

Generate Texts for SEO

Our clients use SEO tools like SEMRush which could be (but are not yet) exported to Google BigQuery (https://www.semrush.com/kb/5-api ).

Our clients who use such tools are able to identify many (small) opportunities that they have to improve each time manually.

Generating SEO improvement (new texts, better texts based on suggested keywords, etc.) could be a massive win.

Boxalino could automate the implementation of these SEO texts on the web-site and provide this way a feed-back loop if the SEO data are exported to BigQuery to check and fine-tune the results automatically.

The logic could also simply improve / enrich the product textual description as described here:

https://machine-learning-company.nl/technisch/transformer-based-language-models-for-large-scale-e-commerce-product-data-enrichment/

It is also important to mention that we have (sometimes) the product description of competitors. So comparing the text and make it more unique could be a strong advantage.

A/B test different texts

Boxalino can easily provide different textual content for a product page or a content page in a personalized or a/b tested way.

Currently, this option is not used because of the challenge of automatically creating an alternative version of the textual description of thousands of products and/or contents.

To be able to “change the tone”, “change the style”, or other type of systematic changes of the textual content could be very interesting to a/b test and to use for personalization.

Generate Texts for Landing Pages

Boxalino generates many landing pages for which a list of products is displayed. Adding automatically generated texts to these landing pages would make them directly more effective for SEO and SEA.

Here is an example showing a list of products and 2 random banners on top (which is not optimal):

https://www.mcdrogerie.ch/t-Eisen

Generate Small Netflix-like label from filter

Boxalino generates mechanically small prompts on the home page of e-shops like Qualipet based on filters:

https://www.qualipet.ch/

For example : “NEUHEITEN VON HARMONY FÜR IHRE KATZE”

is generated from a filter on the property : “new” the brand “Harmony” and the animal “cat”.

The mechanism is very limited and can easily generate non-optimal prompts.

It would be interesting to see if a Transformer model would be able to generate better prompts.

Emotional analysis of user comments

Understanding better for which product (or group of products) user comments are positive or not could be valuable for many aspects (as described here):

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0247984