Sync Request
At this point, the following steps should already be achieved by the Data Integration (DI) team:
Export JSONL files Data Integration | 1. Export JSONL files
Load the files to GCS Data Integration | 2. Load files to Google Storage
Load the files to BigQuery Data Integration | 3. Load files to BigQuery tables (following the required Data Structure )
The sync request is done once all documents required for your Data Integration process are available in your BigQuery dataset, following the required Data Structure.
For a product data sync, make sure the following tables have been prepared and are provided: doc_product, doc_attributes, doc_attribute_values and doc_languages
Request Definition
1 | Endpoint | full data sync | |
---|---|---|---|
2 |
| delta data sync | |
3 |
| instant-update data sync | |
4 | Action | /sync | |
5 | Method | POST | |
6 | Body | none | |
7 | Headers | Authorization | Basic base64<<DATASYNC API key : DATASYNC API Secret>> |
8 |
| Content-Type | application/json |
9 |
| client | account name |
10 |
| dev | (optional) use this to mark the content for the dev data index (1 or true) |
11 |
| mode | data sync mode: D for delta , I for instant update, F for full technical: the constants from the GcpClientInterface https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClientInterface.php#L20 |
12 |
| type | product, user, content, user_content, order. |
13 |
| tm | time, in format: YmdHis technical: used to identify the documents version |
14 |
| ts | timestamp, must be millisecond based in UNIX timestamp technical: calculated from the time the data has been exported to BQ; the update requests will be applied in version ts order; https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClient.php#L254 |
15 |
| outsource | bool (1/true) |
16 |
| fields | (for instant-update requests) internal_id (matching your content ID) is always part of the exported data |
17 |
| threshold | numeric |
18 |
| project | (optional) the project name (where the dataset is available); |
19 |
| dataset | (optional) the dataset in which the doc_X tables are stored; |
20 |
| default | (optional)JSON with the structure [<doc-type> : <table name>]; ex: {“doc_language”:”default-table-for-doc_language”} if project&dataset is provided - the <table name> must exist in that context. |
21 |
| dispatch | bool (1 / true) (optional) recommended for processes that takes a long time to compute in BQ. Boxalino will confirm if/when it is required for integration |
A SYNC REQUEST code-sample is available in the data-integration-doc-php library: https://github.com/boxalino/data-integration-doc-php/blob/master/src/Service/GcpClient.php#L165
Optionally, if you are to use an advanced rest client , you can configure it as such:
If desired, your process can read the response from the SYNC REQUEST. The response will be of type JSON:
in case of error: the message
in case of success: a combination key-value: {“taskId”:”task value”}. with the <taskId> you can make request to https://boxalino-di-process-krceabfwya-ew.a.run.app//task/status/<taskId>
Request REVIEW
You can review the status of the triggered events in the Account page: https://boxalino-di-process-krceabfwya-ew.a.run.app/account/review/account like :
More information about it on the Status Review
SYNC CHECK
The <endpoint>/sync/check
service can be used in order to identify the latest date for a successful data sync for a given type (product, order, user, etc) & mode (full, delta).
The following POST request can be done to access this information:
curl "https://boxalino-di-process-krceabfwya-ew.a.run.app/sync/check" \
-X POST \
-H "client: <boxalino account name>" \
-H "Authorization: Basic base64<<DATASYNC API key : DATASYNC API Secret>>" \
-H "Content-Type: text/plain" \
-H "type: order" \
-H "mode: F" \
-H "dev: false"
Threshold
Use this service to identify the total number of items synchronized as part of the latest full.
This value (with some error margin, at the integrator`s discretion) can be configured as threshold property in the SYNC REQUEST.
When the threshold is set, once a full export is received, the export will update the account data index if there are at least the amount of declared items per your threshold configuration.
curl "https://boxalino-di-process-krceabfwya-ew.a.run.app/threshold" \
-X POST \
-H "client: <boxalino account name>" \
-H "Authorization: Basic base64<<DATASYNC API key : DATASYNC API Secret>>" \
-H "Content-Type: text/plain" \
-H "type: product"
Using private GCP resources for DI
For the case when the client wants to use their own GCP resources for the Data Integration with Boxalino, the following requirements must be complied with:
BigQuery Permissions
In order to have read access to your private dataset, please provide the following permissions to the Data Integration Service Account (DISA) 55483703770-compute@developer.gserviceaccount.com and with our organization email organization@boxalino.com (to be able to identify the errors in the processing):
BigQuery Data Viewer
BigQuery Metadata Viewer
BigQuery Dataset Location
Upon the creation of your dataset, please use the Data Location: EU
BigQuery Dataset Name (recommended)
The BigQuery dataset in which your account documents are loaded can be named after the data index and data sync mode it is meant to synchronize to:
if the request is for the dev data index - <client>_dev_<mode>
if the request is for the production data index - <client>_<mode>
, where:
<client> is the account name provided by Boxalino.
<mode> is the process F (for full), D (for delta), I (for instant)
Example, for our an account boxalino_client, the following datasets must exist (upon your integration use-cases):
for dev index: boxalino_client_dev_I, boxalino_client_dev_F, boxalino_client_dev_D
for production data index: boxalino_client_I, boxalino_client_F, boxalino_client_D