...
Product properties
the item brand and categories are indicated in GA4 view_item events, but Boxalino has a lot more product attribute from the e-shop, including (the properties would be provided as an associate array with property names and values: ARRAY<STRUCT<name, ARRAY<string> values>>):brand
category (several levels)
price (before or after discount)
number of stars
state of discount (in discount and how much)
pictures (number of pictures, urls, …)
description (to see if the length or other aspects could play a role)
delivery state (in stock, …)
…
in example after: product_properties : have string properties for the recommended products
GA4 data
view_item indicating that it is a PDP
traffic source
device
geo
…
in the data : all the ga4 view_item event are therepage_views (to see what page view happens before / after, because it is maybe very different if it is a landing page)
traffic source
device
geo
…
in the data: array_length of prior_pages
fyi- the sample data after are computed (except for the product_properties) fully on the basis of GA4 data.
Performance metrics:
Engagement rate / Bounce rate
in the data: next_pages would be emptyNumber of page views
in the data: array_length of next_pagessession conversion rate
in the data: count(distinct session_id) / count(distinct next_purchases.ecommerce.transaction_Id)in scope conversion rate (not only there is a conversion, but there is a conversion with this product)
session $ in the data: compare the main item with the items of the next purchase itemssession $ value (average value per session)
in the data: next_purchases.ecommerce.purchase_revenue and session_idAOV (averge order value)
Factor analysis of parameters affecting the performance of product recommendations on the PDP
Identify automatically what are the factors of a good performance or a bad performance of a product recommendation
Example: recommending products which have bad ratings might be a bad idea
Example: recommending products of the same brand than the one of the currently viewed product might be a good idea for some brands and a bad idea for other brands
These parameters can be related to the product data of either the PDP or of the recommended products, as well as the session data (device, …) or the combination of them (e.g.: on products of the brand X recommending products of the brand Y doesn’t work)
Data Available:
Product properties (for both the PDP and the recommended products)
the item brand and categories are indicated in GA4 view_item events, but Boxalino has a lot more product attribute from the e-shop, including (the properties would be provided as an associate array with property names and values: ARRAY<STRUCT<name, ARRAY<string> values>>):number of stars
state of discount (in discount and how much)
pictures (number of pictures, urls, …)
description (to see if the length or other aspects could play a role)
delivery state (in stock, …)
…
Boxalino tracking data
the display of each recommended product
the click on each recommended product
the add-to-basket following a click on the product
the purchase following a click and an add-to-basket on the product
test variant (it is often the case that there are several algorithms tested as different test variant, we could use the test variant as a factor to see if it makes a different (positive or negative) for specific products)
GA4 data
view_item indicating that it is a PDP
page_views (to see what page view happens before / after, because it is maybe very different if it is a landing page)
traffic source
device
geo
…
Performance metrics:
Click-Through-Rate (the number of times the recommended product is clicked compared to the number of times it is shown)
Add-To-Basket-RAte (the number of times the recommended product is clicked and then added to the basket compared to the number of times it is shown)
Buy-Through-Rate (the number of times the recommended product is clicked and then added to the basket and then bought compared to the number of times it is shown)
Display $ Value (the average value bought by the Buy-Throughs compared to the number of times the product is shown, so the average value of showing the product as a recommendation)
Engagement rate / Bounce rate
Number of page views
session conversion rate
session $ value (average value per session)
AOV (averge order value)in the data: next_purchases.ecommerce.purchase_revenue / next_purchases.ecommerce.transaction_id
Expand | |||||
---|---|---|---|---|---|
| |||||
|
First attempts
As the first attempt, we did a logistic regression based on the basic item properties of GA4 (brand, category, and price):
Code Block |
---|
CREATE OR REPLACE TABLE `@reports.Data_PDPPrediction_logistic_reg_v1_11M`
as
SELECT
count(distinct vi.session_id) as sessions
, sum(case when p.session_id is null then 0 else 1 end) as purchases
, vi.items[safe_offset(0)].item_brand
, vi.items[safe_offset(0)].item_category
, vi.items[safe_offset(0)].item_category2
, vi.items[safe_offset(0)].item_category3
, vi.items[safe_offset(0)].item_category4
, vi.items[safe_offset(0)].item_category5
, vi.items[safe_offset(0)].price
FROM
`@core.ga_events_enriched` as vi
left join `@core.ga_events_enriched`as p on vi.session_id = p.session_id and p.date >= date_sub(current_date(), interval 6 month) and p.event_name = 'purchase'
WHERE vi.date >= date_sub(current_date(), interval 6 month) and vi.event_name = 'view_item'
group by item_brand, item_category, item_category2, item_category3, item_category4, item_category5, price
;
CREATE OR REPLACE MODEL `@reports.Model_PDPPrediction_logistic_reg_v1_11M`
OPTIONS(
model_type='logistic_reg'
, CATEGORY_ENCODING_METHOD = 'DUMMY_ENCODING'
, CALCULATE_P_VALUES = TRUE,
ENABLE_GLOBAL_EXPLAIN = TRUE
) AS
SELECT
(IF(purchases >= it,1,0)) AS label
, item_brand
, item_category
, item_category2
, item_category3
, item_category4
, item_category5
, price
FROM
`@reports.Data_PDPPrediction_logistic_reg_v1_11M`
join unnest(GENERATE_ARRAY(1,sessions)) as it
; |
The results showed a quite limited predictive power (which was to be expected with such a basic set of parameters):
...
The Interpretability tab shows some information about the attribution to each parameter, showing that the brand and the category have much higher attribution than the price (but this could be partially because the price is a numeric field):
...
Radical Tracking
As we have released our Radical Tracking on the PDP of some of our customers like https://www.nahrin.ch/de/raclette-mix we did a second attempt considering these visual markers:
Code Block |
---|
CREATE OR REPLACE TABLE `live_nahrin_reports.Data_PDPRadicalPrediction_logistic_reg_v1_11M`
as
SELECT
count(distinct session_id) as sessions
, count(distinct order_id) as purchases
, ifnull((select string_agg(v order by v) from unnest(feature_1) as v where v != ''),'none') as feature_1
, ifnull((select string_agg(v order by v) from unnest(feature_2) as v where v != ''),'none') as feature_2
, ifnull((select string_agg(v order by v) from unnest(feature_3) as v where v != ''),'none') as feature_3
, ifnull((select string_agg(v order by v) from unnest(feature_4) as v where v != ''),'none') as feature_4
, ifnull((select string_agg(v order by v) from unnest(feature_5) as v where v != ''),'none') as feature_5
, ifnull((select string_agg(v order by v) from unnest(feature_6) as v where v != ''),'none') as feature_6
, ifnull((select string_agg(v order by v) from unnest(feature_7) as v where v != ''),'none') as feature_7
, ifnull((select string_agg(v order by v) from unnest(feature_8) as v where v != ''),'none') as feature_8
, ifnull((select string_agg(v order by v) from unnest(feature_9) as v where v != ''),'none') as feature_9
, ifnull((select string_agg(v order by v) from unnest(feature_10) as v where v != ''),'none') as feature_10
from
(
SELECT
s.session_id
, any_value(s.order_id) as order_id
, array_agg(distinct case when feature = 1 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_1
, array_agg(distinct case when feature = 2 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_2
, array_agg(distinct case when feature = 3 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_3
, array_agg(distinct case when feature = 4 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_4
, array_agg(distinct case when feature = 5 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_5
, array_agg(distinct case when feature = 6 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_6
, array_agg(distinct case when feature = 7 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_7
, array_agg(distinct case when feature = 8 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_8
, array_agg(distinct case when feature = 9 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_9
, array_agg(distinct case when feature = 10 then ifnull(concat(v.name, '-', v.value), '-') else '' end) as feature_10
FROM `live_nahrin_views.pdp_radical_tracking_markers`
join unnest(values) as v
join unnest(sessions) as sid
join `live_nahrin_reports.session_primary` as s on s.session_id = sid
group by session_id, feature
)
group by feature_1, feature_2, feature_3, feature_4, feature_5, feature_6, feature_7, feature_8, feature_9, feature_10
;
CREATE OR REPLACE MODEL `live_nahrin_reports.Model_PDPRadicalPrediction_logistic_reg_v1_11M`
OPTIONS(
model_type='logistic_reg'
, CATEGORY_ENCODING_METHOD = 'DUMMY_ENCODING'
, CALCULATE_P_VALUES = TRUE,
ENABLE_GLOBAL_EXPLAIN = TRUE
) AS
SELECT
(IF(purchases >= it,1,0)) AS label
, feature_1, feature_2, feature_3, feature_4, feature_5, feature_6, feature_7, feature_8, feature_9, feature_10
FROM
`live_nahrin_reports.Data_PDPRadicalPrediction_logistic_reg_v1_11M`
join unnest(GENERATE_ARRAY(1,sessions)) as it
; |
However, it is still not very clear how this will work
...
and the features indicated as most important as simply the ones which are detected as so by the feature number logic, so no value so far there in the outcome…
Factor analysis of parameters affecting the performance of product recommendations on the PDP
Identify automatically what are the factors of a good performance or a bad performance of a product recommendation
Example: recommending products which have bad ratings might be a bad idea
Example: recommending products of the same brand than the one of the currently viewed product might be a good idea for some brands and a bad idea for other brands
These parameters can be related to the product data of either the PDP or of the recommended products, as well as the session data (device, …) or the combination of them (e.g.: on products of the brand X recommending products of the brand Y doesn’t work)
Data Available:
Product properties (for both the PDP and the recommended products)
the item brand and categories are indicated in GA4 view_item events, but Boxalino has a lot more product attribute from the e-shop, including (the properties would be provided as an associate array with property names and values: ARRAY<STRUCT<name, ARRAY<string> values>>):number of stars
state of discount (in discount and how much)
pictures (number of pictures, urls, …)
description (to see if the length or other aspects could play a role)
delivery state (in stock, …)
…
in example data :
string_properties : have string properties for the recommended products
numeric_properties : have numeric properties for the recommended products
source_string_properties : have string properties for the current PDP
source_numeric_properties : have numeric properties for the current PDP
Other data
test variant (it is often the case that there are several algorithms tested as different test variant, we could use the test variant as a factor to see if it makes a different (positive or negative) for specific products)
in example data : variant_id
Performance metrics:
Click-Through-Rate (the number of times the recommended product is clicked compared to the number of times it is shown)
in example data : KPI_defined.click / KPI_defined.mainImpressionAdd-To-Basket-RAte (the number of times the recommended product is clicked and then added to the basket compared to the number of times it is shown)
in example data : KPI_defined.click_n_atb/ KPI_defined.mainImpression
Buy-Through-Rate (the number of times the recommended product is clicked and then added to the basket and then bought compared to the number of times it is shown)
in example data : KPI_defined.click_n_buy / KPI_defined.mainImpression
Display $ Value (the average value bought by the Buy-Throughs compared to the number of times the product is shown, so the average value of showing the product as a recommendation)
in example data : KPI_defined.click_n_revenue / KPI_defined.mainImpression
Expand | |||||
---|---|---|---|---|---|
| |||||
|
First attempts
We did the first attempt for the case of accessories recommendations (cross-selling on the PDP) as follows:
Code Block |
---|
CREATE OR REPLACE TABLE `@reports.Data_AccessoriesPrediction_logistic_reg_v1_11M`
as
SELECT
sum(KPI_defined.mainImpressions) as mainImpressions
, sum(KPI_defined.clicks) as clicks
, sum(KPI_defined.clicks) / sum(KPI_defined.mainImpressions) as ctr
, case when source_exact_price > 0 and abs(exact_price / source_exact_price) between 0.8 and 1.2 then 1 else 0 end as same_price_20
, case when source_exact_price > 0 and abs(exact_price / source_exact_price) between 0.9 and 1.1 then 1 else 0 end as same_price_10
, case when source_exact_price > 0 and abs(exact_price / source_exact_price) between 0.7 and 1.3 then 1 else 0 end as same_price_30
, case when ifnull((select property_values[safe_offset(0)] from unnest(product_properties) where property_name = 'category_leaf_de' limit 1),'-') in unnest((select ifnull(source_property_values, []) from unnest(source_product_properties) where source_property_name = 'category_leaf_de' limit 1)) then 1 else 0 end as same_leaf_category
, case when ifnull((select property_values[safe_offset(0)] from unnest(product_properties) where property_name = 'products_brand' limit 1),'-') in unnest((select ifnull(source_property_values, []) from unnest(source_product_properties) where source_property_name = 'products_brand' limit 1)) then 1 else 0 end as same_brand
, case when ifnull((select property_values[safe_offset(0)] from unnest(product_properties) where property_label = 'Produktlinie' limit 1),'-') in unnest((select ifnull(source_property_values, []) from unnest(source_product_properties) where source_property_label = 'Produktlinie' limit 1)) then 1 else 0 end as same_product_line
FROM
`@reports.widget_context_item_product_analytics`
WHERE date(day) >= date_sub(current_date(), interval 6 month)
and widget = 'accessories'
group by same_brand, same_leaf_category, same_price_10, same_price_20, same_price_30, same_product_line
;
CREATE OR REPLACE MODEL `@reports.Model_AccessoriesPrediction_logistic_reg_v1_11M`
OPTIONS(
model_type='logistic_reg'
, CATEGORY_ENCODING_METHOD = 'DUMMY_ENCODING'
, CALCULATE_P_VALUES = TRUE,
ENABLE_GLOBAL_EXPLAIN = TRUE
) AS
SELECT
(IF(clicks >= it,1,0)) AS label
, same_price_10
, same_price_20
, same_price_30
, same_leaf_category
, same_brand
, same_product_line
FROM
`@reports.Data_AccessoriesPrediction_logistic_reg_v1_11M`
join unnest(GENERATE_ARRAY(1,mainImpressions)) as it
; |
As a result, the predictive power only based on these few similar parameters is quite low (as it was expected), but could grow with more parameters.
...
Also, the Explainable AI tab shows possible usage of the model as factor analysis, showing here that the attribution is highest on the same product line and brand and only after the category and price.
...