Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There are a few options available in order to design automated checks for the account`s data integration status.

  1. By the SYNC REQUEST response

  2. By an endpoint check

Table of Contents
stylenone

SYNC Request Response

If desired, your data integration flow/process can read the response from the SYNC REQUEST.

The response will be of type JSON:

  • in case of error: the 400 BAD REQUEST HTTP code + an error message

  • in case of success: 200 OK HTTP code + a combination key-value: {“taskId”:”task value”}. withthe <taskId> you can "taskId":"task value"}

    • usethe <taskId> to make request to <endpoint>/task/status/<taskId> in order to check the data synchronization status in account’s data index.

Account review (WEB)

You can review the status of the triggered events in the Account page: <endpoint>/account like :

https://boxalino-di-process-krceabfwya-ew.a.run.app/account

Account Review (CLI)

Endpoint

https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review

1

Method

POST

2

Headers

Content-Type

application/json

3

Body

key

DATASYNC API key

4

 

client

account name

5

limit

number of logs (ordered by most recent)

6

 

index

dev / prod

(default: none)

7

 

mode

D for delta , I for instant update, F for full

(default: none)

8

 

type

product, user, content, user_content, order

(default: none)

9

status

SYNC - sync requests and the status
LOAD - load requests and status
OK - ok processes
FAIL - fail statuses
none - all logs

(default: none)

 

For example, this request will return the last SYNC OK (succesfull sync request):

Code Block
curl https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review \
  -X POST \
  -d "{\n  \"client\": \"BOXALINO_ACCOUNT\",\n  \"key\": \"BOXALINO_ACCOUNT_ADMIN_KEY\",\n  \"index\": \"prod\",\n  \"mode\": \"F\",\n  \"type\": \"product\",\n  \"status\": \"noneSYNC OK\",\n  \"limit\": 61\n}" \
  -H "Content-Type: application/json" 

...

The API response for a request for status: “SYNC OK”, limit:1 - would be a JSON list, like:

Code Block
[
  {
    "ID": "UUID-FOR-THE-SYNC-REQUEST",
    "RequestReceivedAt": "Y-m-d H:i:s",
    "Status": "SYNC OK",
    "Message": null,
    "Timestamp": "YmdHis,
    "VersionTs": "TIME-IN-UNIX-MS",
    "Project": null,
    "Dataset": null,
    "Document": null,
    "Default": "[]"
  }
]

Data Integration Statuses

Status Code

Meaning

STATUS CODES DURING SYNC REQUESTS

SYNC OK

the data content was exported to SOLR;
(for doc_product/ doc_content) the status of the process is available upon calling the <endpoint>/task/status/<taskId> service

SYNC REQUEST

a SYNC REQUEST was done (<endpoint>/sync); once the SYNC REQUEST is received, the compute process starts

SYNC FAIL

the data update failed;

SYNCPRODUCT FAIL
SYNCCONTENT FAIL
SYNCINDEX FAIL
FAIL CORPUS COMPUTED
FAIL COMPUTED FIELDS
FAIL COMPUTED

fail of SYNC REQUEST during compute

STOP SYNC

the SYNC REQUEST was stopped (for ex: the content quota was not reached: min X products to be synced, OR the doc_X table is empty)

DENIED SYNC

it appears in the case of product delta sync requests when the doc_X content is too much (ex: over 1GB BQ table size).

BIG SOLR CONTENT
SOLRCOMPUTE REQUEST
SOLRCOMPUTE OK

generating the SOLR file for SOLR export from the doc_X_<mode>_<tm> file

SOLRSYNC REQUEST

exporting the solr-compute file (above) to SOLR for sync

DISPATCHED SYNC REQUEST
SYNCCOMPUTE REQUEST

the BQ compute process log (for dispatched requests)

RESYNCACCOUNT REQUEST
RESYNCACCOUNT OK

re-sync request (triggered internally, on client request);
(triggers the /sync request for given tm/index/type/mode)

SYNCCHECK OK

a synccheck request was done (<endpoint>/sync/check); this is done for ex for D/I to access the last SYNC OK status for the account and type

FAIL AUTH

authentication headers are invalid / not a match for the account

FAIL SOLR EXPORT

the export of the generated file failed (data index not updated)

STATUS CODES DURING LOAD REQUESTS

LOAD OK

the doc_X data structure was loaded succesfully in BQ

LOAD REQUEST

a LOAD REQUEST was received (<endpoint>/load); once the LOAD REQUEST is received it:
1. creates GCS bucket for account (if needed)
2. creates the BQ dataset for account & mode (if needed)
3. loads the content in GCS file (doc_<type>_<mode>_<tm>.json file)
4. loads the GCS file in BQ

FAIL BQ LOAD

BQ load step failed

LOADBYCHUNK REQUEST

a LOAD BY CHUNK request was received. when this happens - it loads the content in a GCS file (doc_<type>_<mode>_<tm>-<chunk>.json)

LOADBYCHUNK OK

the GCS file (doc_<type>_<mode>_<tm>-<chunk>.json) was created

LOADBYCHUNK FAIL

the GCS file was not properly loaded

FAIL GCS

the GCS bucket / content failed to generate

LOADBQ REQUEST

loads all the chunk files in BQ

LOADBQ OK

succesfully loaded the doc_<type> content in BQ;
the table <index>_<mode>.doc_<type>_<mode>_<tm> is available

LOADBQ FAIL

the BQ table was unable to generate based on the available doc_<type>_<mode>_<tm>-*.json content

Data (Item) Review

The item review is available as a web service in https://boxalino-di-process-krceabfwya-ew.a.run.app/

...

The requested link has the following structure:
<di-process-endpoing>/item/<API-Key-admin>/<data-index>/<type>/<mode>/<field>/<value>

1

<di-process-endpoing>

https://boxalino-di-process-krceabfwya-ew.a.run.app/

2

<API-Key-admin>

the API Key with the role ADMIN from Intelligence Admin

(the API Key used for the DI SYNC REQUEST)

3

<data-index>

dev | prod

4

<type>

The value of the type parameter in the SYNC REQUEST

ex: product | user | order | content

5

<mode>

F | f

6

<field>

id | sku | products_group_id

7

<value>

the value for the given field

In the WEB/CLI the content will be returned as a JSON. You can use any JSON formatter to structure it for an easier view.

Using BigQuery

The client`s team has read & view access to the data integration BigQuery datasets from the rtux-data-integration GCP project.

Note

In order to be able to execute BQ queries in BQ view https://console.cloud.google.com/bigquery - the integrator/client:

  1. must be authenticated with a Google account.

  2. The scope of a project (other than bx-bdp-53322 or rtux-data-integration ) must be used in order to run BigQuery jobs.

image-20240318-114519.pngImage Added

The following SQLs can be used to check the exported data and, as well, the computed data:

Expand
titleCheck the exported product data by SKU
Code Block
SELECT pg.* FROM `rtux-data-integration.<index>_F.doc_product_F_<tm>`
JOIN UNNEST(product_line.product_groups) pg
JOIN UNNEST(pg.skus) sku
WHERE sku.sku = "<sku>"

The SQL can be used to also check different properties at the level of sku (sku) or product_groups (pg).

Expand
titleCheck the exported product data by ID
Code Block
SELECT pg.* FROM `rtux-data-integration.<index>_F.doc_product_F_<tm>`
JOIN UNNEST(product_line.product_groups) pg
JOIN UNNEST(pg.skus) sku
WHERE sku.internal_id = "<id>"
Expand
titleCheck final product attributes
Code Block
SELECT DISTINCT(property_name) FROM `rtux-data-integration.<index>_F.doc_computed_F_<tm>`
-- WHERE id="<product-id>" #optional add filter by product id
Expand
titleIdentify values for a specific property (and nr of matching products)
Code Block
SELECT pv, COUNT(*) AS products_match FROM `rtux-data-integration.<index>_F.doc_computed_F_<tm>`
JOIN UNNEST(property_values) pv
WHERE property_name = "<property_name>"
GROUP BY pv