Status Review

There are a few options available in order to design automated checks for the account`s data integration status.

By the SYNC REQUEST response
By an endpoint check

1 SYNC REQUEST Response
2 Account review (WEB)
3 Account Review (CLI)
- 3.1 Data Integration Statuses
4 Data (Item) Review
5 Using BigQuery

SYNC REQUEST Response

If desired, your data integration flow/process can read the response from the SYNC REQUEST.

The response will be of type JSON:

in case of error: 400 BAD REQUEST HTTP code + a JSON payload like {"status":"SYNC FAIL", "code":400, "tm": YmdHis, "payload":[], "scope":"<logged status - for failure>", "errors":["error messages"]}
in case of success: 200 OK HTTP code + a JSON payload like {"status":"SYNC OK", "code":200, "tm": YmdHis, "payload":[<list of loaded resources>], "errors":[], "taskId":"solr-load-task-id"}
- use the <taskId> to make request to <endpoint>/task/status/<taskId> in order to check the data synchronization status in account’s data index (for product & content).

The SYNC REQUEST has a series of failure points & clauses. They are described in the Status Review | Data Integration Statuses section.

Account review (WEB)

You can review the status of the triggered events in the Account page: <endpoint>/account like :

Boxalino DI Account Review

Account Review (CLI)

1	Endpoint	https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review
2	Method	POST
3	Headers	Content-Type	application/json
4	Body	key	DATASYNC API key
5		client	account name
6		created	max: `1 MONTH`, default: `1 DAY` a filter on the statuses` created timestamp (ordered by most recent) the data is available in BQ up to 15 days for full, 7 days for delta and 1 day for instant; the data is available in SOLR (data index) up to 3 months.
7		limit	number of logs (ordered by most recent) (default: none)
8		index	account name (with _dev for dev: true) (default: none)
9		mode	D for delta , I for instant update, F for full (default: none)
10		type	product, user, content, user_content, order (default: none)
11		status	SYNC - sync requests and the status LOAD - load requests and status OK - ok processes FAIL - fail statuses none - all logs (default: none)

For example, this request will return the last SYNC OK (succesfull sync request):

curl https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review \
  -X POST \
  -d "{\n  \"client\": \"BOXALINO_ACCOUNT\",\n  \"key\": \"BOXALINO_ACCOUNT_ADMIN_KEY\",\n  \"index\": \"prod\",\n  \"mode\": \"F\",\n  \"type\": \"product\",\n  \"status\": \"SYNC OK\",\n  \"created\": \"1 DAY\"\n}" \
  -H "Content-Type: application/json"

The API response for a request for status: “SYNC OK”, limit:1 - would be a JSON list, like:

[
  {
    "ID": "UUID-FOR-THE-SYNC-REQUEST",
    "RequestReceivedAt": "Y-m-d H:i:s",
    "Status": "SYNC OK",
    "Message": null,
    "Timestamp": "YmdHis,
    "VersionTs": "TIME-IN-UNIX-MS",
    "Project": null,
    "Dataset": null,
    "Document": null,
    "Default": "[]"
  }
]

Data Integration Statuses

Status Code	Meaning

Status Code	Meaning
STATUS CODES DURING SYNC REQUESTS
SYNC REQUEST SYNCR REQUEST	a SYNC REQUEST was done (<endpoint>/sync); once the SYNC REQUEST is received, the compute process starts
SYNC OK	the data content was exported to SOLR; (for doc_product/ doc_content) the status of the process is available upon calling the <endpoint>/task/status/<taskId> service API response: HTTP code `200`, JSON body with `payload`(created BQ resources), `errors,tm, code, status` (SYNC OK)
SYNC FAIL	the data update failed; API response: HTTP code `400`, JSON body with `scope` (the logged status - reason for failure), `payload`, `errors, tm, code, status` (SYNC FAIL)
SYNC REQUEST FREQUENCY REACH	SYNC REQUEST was denied; too many requests in the last hour. The limits are: - product : max 2 F, max 20 D, max 30 I -others: max 10 F, max 30 D/I. API response: HTTP code `429`JSON body with `scope` (this logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
FAIL AUTH	authentication headers are invalid / not a match for the account API response: HTTP code `401`, JSON body with `scope` (this logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
EMPTY CONTENT	there is no content to be synchronized. API response: HTTP code `204`, JSON body with `scope` (this logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
STOP SYNC	the SYNC REQUEST was stopped (for ex: the content quota was not reached: min X products to be synced, OR the doc_X table is empty) API response: HTTP code `412`, JSON body with `scope` (the logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
DENIED SYNC	it appears in the case of product delta sync requests when the doc_X content is too much (ex: over 1GB BQ table size). API response: HTTP code `400`, JSON body with `scope` (this logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
FAIL SOLR EXPORT	the export of the generated file failed (data index not updated) API response: HTTP code `400`, JSON body with `scope` (this logged status), `payload`, `errors, tm, code, status` (SYNC FAIL)
BIG SOLR CONTENT SOLRCOMPUTE REQUEST SOLRCOMPUTE OK	generating the SOLR file for SOLR export from the doc_X_<mode>_<tm> file (results in SYNC OK response)
SOLRSYNC REQUEST	exporting the solr-compute file (above) to SOLR for sync
DISPATCHED SYNC REQUEST SYNCCOMPUTE REQUEST	the BQ compute process log (for dispatched requests) (results in SYNC OK response)
RESYNCACCOUNT REQUEST RESYNCACCOUNT OK	re-sync request (triggered on client request, in additional flows for internal service failure); (triggers the /sync request for given tm/index/type/mode)
SYNCCHECK OK	a synccheck request was done (<endpoint>/sync/check); this is done for ex for D/I to access the last SYNC OK status for the account and type
STATUS CODES DURING LOAD REQUESTS
LOAD REQUEST	a LOAD REQUEST was received (<endpoint>/load); once the LOAD REQUEST is received it: 1. creates GCS bucket for account (if needed) 2. creates the BQ dataset for account & mode (if needed) 3. loads the content in GCS file (doc_<type>_<mode>_<tm>.json file) 4. loads the GCS file in BQ
LOAD OK	the doc_X data structure was loaded succesfully in BQ API response: HTTP code `200`, JSON body with `payload` (created resource), `status` (LOAD OK), `code`, `tm`
FAIL BQ LOAD	BQ load step failed API response: HTTP code `400`, JSON body with `payload` (created resource), `status`, `code` , `errors`, `tm`
LOADBYCHUNK REQUEST	a LOAD BY CHUNK request was received. when this happens - it loads the content in a GCS file (doc_<type>_<mode>_<tm>-<chunk>.json) API response: HTTP code `200`, STRING response with the GCS public URL
LOADBYCHUNK OK	the GCS file (doc_<type>_<mode>_<tm>-<chunk>.json) was created
LOADBYCHUNK FAIL	the GCS file was not properly loaded API response: HTTP code `400`, JSON body with `payload` (created resource), `status`, `code` , `errors`, `tm`
FAIL GCS	the GCS bucket / content failed to generate API response: HTTP code `400`, JSON body with `payload` (created resource), `status`, `code` , `errors`, `tm`
LOADBQ REQUEST	loads all the chunk files in BQ
LOADBQ OK	succesfully loaded the doc_<type> content in BQ; the table <index>_<mode>.doc_<type>_<mode>_<tm> is available API response: HTTP code `200`, JSON body with `payload` (created resource), `status` (LOAD OK), `code`, `tm`
LOADBQ FAIL	the BQ table was unable to generate based on the available doc_<type>_<mode>_<tm>-.json content API response: HTTP code* `400`, JSON body with `payload` (created resource), `status`, `code` , `errors`, `tm`

Data (Item) Review

The item review is available as a web service in Boxalino DI

If desired to avoid the web form, you can access directly in the WEB/CLI the content exported for a given item SKU / ID or products group.

The requested link has the following structure:
<di-process-endpoing>/item/<API-Key-admin>/<data-index>/<type>/<mode>/<field>/<value>

1	<di-process-endpoing>	Boxalino DI
2	<API-Key-admin>	the API Key with the role `ADMIN` from Intelligence Admin (the API Key used for the DI SYNC REQUEST)
3	<data-index>	dev \| prod
4	<type>	The value of the `type` parameter in the SYNC REQUEST ex: product \| user \| order \| content
5	<mode>	F \| f
6	<field>	id \| sku \| products_group_id
7	<value>	the value for the given field

In the WEB/CLI the content will be returned as a JSON. You can use any JSON formatter to structure it for an easier view.

Using BigQuery

The client`s team has read & view access to the data integration BigQuery datasets from the rtux-data-integration GCP project.

In order to be able to execute BQ queries in BQ view https://console.cloud.google.com/bigquery - the integrator/client:

must be authenticated with a Google account.
The scope of a project (other than bx-bdp-53322 or rtux-data-integration ) must be used in order to run BigQuery jobs.

Select a GCP project from which you have access to run BQ jobs

Before continuing, make sure to identify:

index value: if dev=true → <account>_dev ; if dev=false (or unset) → `<account>`
tm is the last SYNC OK tm (or LOADBQ OK) (check CLI command above to retrieve it)

The following SQLs can be used to check the exported data and, as well, the computed data

SELECT pg.* FROM `rtux-data-integration.<index>_F.doc_product_F_<tm>`
JOIN UNNEST(product_line.product_groups) pg
JOIN UNNEST(pg.skus) sku
WHERE sku.sku = "<sku>"

The SQL can be used to also check different properties at the level of sku (sku) or product_groups (pg).