Sync Request

At this point, the following steps should already be achieved by the Data Integration (DI) team:

  1. Export JSONL files https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252149803/Data+Integration#1.-Export-JSONL-files

  2. Load the files to GCS https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252149803/Data+Integration#2.-Load-files-to-Google-Storage

  3. Load the files to BigQuery https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252149803/Data+Integration#3.-Load-files-to-BigQuery-tables (following the required )

The sync request is done once all documents required for your Data Integration process are available in your BigQuery dataset, following the required Data Structure.

For a product data sync, make sure the following tables have been prepared and are provided: doc_product, doc_attributes, doc_attribute_values and doc_languages

 

Request Definition

1

Endpoint

full data sync

https://boxalino-di-full-krceabfwya-ew.a.run.app

2

 

delta data sync

https://boxalino-di-delta-krceabfwya-ew.a.run.app

3

 

instant-update data sync

https://boxalino-di-instant-update-krceabfwya-ew.a.run.app

4

Action

/sync

5

Method

POST

6

Body

none

7

Headers

Authorization

Basic base64<<DATASYNC API key : DATASYNC API Secret>>

8

 

Content-Type

application/json

9

 

client

account name

10

 

dev

(optional) use this to mark the content for the dev data index (1 or true)

11

 

mode

data sync mode: D for delta , I for instant update, F for full

technical: the constants from the GcpClientInterface https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClientInterface.php#L20

12

 

type

product, user, content, user_content, order.

13

 

tm

time, in format: YmdHis

technical: used to identify the documents version

14

 

ts

timestamp, must be millisecond based in UNIX timestamp

technical: calculated from the time the data has been exported to BQ; the update requests will be applied in version ts order; https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClient.php#L254

**it`s a match for the ts send with the LOAD request

15

 

outsource

bool (1/true)
(optional)(for product delta updates only)
when set to true - the doc_product content is the only requirement for the load; the doc_attribute_value/doc_attribute/doc_language content is inherited from the latest FULL

16

 

fields

(for instant-update requests)

a list of field names to be updated as part of that instant sync request, divided by comma (,) ( stock,visibility)

internal_id (matching your content ID) is always part of the exported data

17

 

threshold

numeric

(optional)(for full exports) recommended if you want to set a limit to validate that the exported full content is within a safety scope (ex: the data index is not updated if there are less than THRESHOLD amount of items)

18

 

project

(optional) the project name (where the dataset is available);

19

 

dataset

(optional) the dataset in which the doc_X tables are stored;
if not provided, the service will check the <index>_<mode> dataset in the provided project.

20

 

default

(optional)JSON with the structure [<doc-type> : <table name>];

ex: {“doc_language”:”default-table-for-doc_language”}

if project&dataset is provided - the <table name> must exist in that context.

21

 

dispatch
(deprecated)

bool (1 / true)

(optional) recommended for processes that takes a long time to compute in BQ. Boxalino will confirm if/when it is required for integration

 

A SYNC REQUEST code-sample is available in the data-integration-doc-php library: https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClient.php#L165

Optionally, if you are to use an advanced rest client , you can configure it as such:

Advanced REST Client SYNC request structure

 

If desired, your process can read the response from the SYNC REQUEST. The response will be of type JSON:

Request REVIEW

You can review the status of the triggered events in the Account page: https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review/account like :

More information about it on the

 

SYNC CHECK

The <endpoint>/sync/check service can be used in order to identify the latest date for a successful data sync for a given type (product, order, user, etc) & mode (full, delta).

The following POST request can be done to access this information:

curl "https://boxalino-di-process-krceabfwya-ew.a.run.app/sync/check" \ -X POST \ -H "client: <boxalino account name>" \ -H "Authorization: Basic base64<<DATASYNC API key : DATASYNC API Secret>>" \ -H "Content-Type: text/plain" \ -H "type: order" \ -H "mode: F" \ -H "dev: false"

 

Threshold

Use this service to identify the total number of items synchronized as part of the latest full.
This value (with some error margin, at the integrator`s discretion) can be configured as threshold property in the SYNC REQUEST.

When the threshold is set, once a full export is received, the export will update the account data index if there are at least the amount of declared items per your threshold configuration.

curl "https://boxalino-di-process-krceabfwya-ew.a.run.app/threshold" \ -X POST \ -H "client: <boxalino account name>" \ -H "Authorization: Basic base64<<DATASYNC API key : DATASYNC API Secret>>" \ -H "Content-Type: text/plain" \ -H "type: product"

 

Using private GCP resources for DI

For the case when the client wants to use their own GCP resources for the Data Integration with Boxalino, the following requirements must be complied with:

BigQuery Permissions

In order to have read access to your private dataset, please provide the following permissions to the Data Integration Service Account (DISA) 55483703770-compute@developer.gserviceaccount.com and with our organization email organization@boxalino.com (to be able to identify the errors in the processing):

  1. BigQuery Data Viewer

  2. BigQuery Metadata Viewer

BigQuery Dataset Location

Upon the creation of your dataset, please use the Data Location: EU

BigQuery Dataset Name (recommended)

 

The BigQuery dataset in which your account documents are loaded can be named after the data index and data sync mode it is meant to synchronize to:

  1. if the request is for the dev data index - <client>_dev_<mode>

  2. if the request is for the production data index - <client>_<mode>

, where:

  • <client> is the account name provided by Boxalino.

  • <mode> is the process F (for full), D (for delta), I (for instant)

Example, for our an account boxalino_client, the following datasets must exist (upon your integration use-cases):

  • for dev index: boxalino_client_dev_I, boxalino_client_dev_F, boxalino_client_dev_D

  • for production data index: boxalino_client_I, boxalino_client_F, boxalino_client_D