Page Comparison

Info

At this point, the following steps should already be achieved by the Data Integration (DI) team:

Create the JSONL files https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252149803/Data+Integration#1.-Export-JSONL-files

...

saving your JSONL (for every data type required for the data sync type) in a Google Cloud Storage (GCS) Bucket from the Boxalino project
1. the files are available for 30 days in the clients` GCS bucket
loading every JSONL in a BigQuery table in Boxalino project
1. for Instant Updates : the tables have a retention policy of 1 day
2. for Delta Updates: the tables have a retention policy of 7 days
3. for Full Updates: thetables have a retention policy of 30 days

...

	Endpoint	full data sync	https://boxalino-di-full-krceabfwya-ew.a.run.app
1		delta data sync	https://boxalino-di-delta-krceabfwya-ew.a.run.app
2		instant-update data sync	https://boxalino-di-instant-update-krceabfwya-ew.a.run.app
3		stage / testing	https://boxalino-di-stage-krceabfwya-ew.a.run.app
4	Action	/load
5	Method	POST
6	Body	the document JSONL
7	Headers	Authorization	Basic base64<<DATASYNC API key : DATASYNC API Secret>> Note: only the API credentials with role=ADMIN are valid. https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/713785345/API+Credentials#Roles
8		Content-Type	application/json
9		project	(optional) the project name where the documents are to be stored;
10		dataset	(optional) the dataset in which the doc_X tables must be stored; if not provided, the service will check the <index>_<mode> dataset in the Boxalino project, to which you will have access
11		bucket	(optional) the storage bucket where the doc_X will be loaded; if not provided, the service will use the Boxalino project.
12		doc	the document type (as described in Data Structure )
13		client	your Boxalino account name
14		dev	only add it if the dev data index must be updated
15		mode	D for delta , I for instant update, F for full technical: the constants from the GcpClientInterface https://github.com/boxalino/data-integration-doc-php/blob/3.0.0/src/Service/GcpRequestInterface.php#L18
16		tm	time, in format: YmdHis requirement: the same tm value must be used from the begging of the DI process until the end, for all files. technical: used to identify the version of the documents and create the content.
17		ts	timestamp, must be millisecond based in UNIX timestamp technical: calculated from the time the data has been exported to BQ; the update requests will be applied in version ts order; https://github.com/boxalino/data-integration-doc-php/blob/3.0.0/src/Service/Flow/DiRequestTrait.php#L140
18		type	integration type (product, order, etc) https://github.com/boxalino/data-integration-doc-php/blob/3.0.0/src/Service/GcpRequestInterface.php#L22
19		chunk	(optional) for loading content by chunks see https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Load-By-Chunks

...

doc_attribute, doc_attribute_value, doc_language can be synchronized with a call to the load endpoint
- because GCP has a size limit of 32MB for POST requests, do not use the /load service for big data exports
doc_product / doc_order / doc_user / doc_content / etc and other content bigger than 32 MB must be loaded in batches (max size of 32MB) or via GCS Public Signed URL /load/chunk endpoint

...

The /load , /load/chunk , /load/bq have a succesfull response on average in 10sec (so the wait time is minimal). Howether, we recommend for your fallback/retry policy, to use a timeout configuration of 3-5 min.

Based on your system, the timeout is represented by different parameters for the HTTP request.

...

The BigQuery dataset in which your account documents are loaded can be named after the data index and process it is meant to synchronize to:

if the request is for the dev data index - <client>_dev_<mode>
if the request is for the production data index - <client>_<mode>

, where:

<client> is the account name provided by Boxalino.
<mode> is the process F (for full), D (for delta), I (for instant)

Example, for our an account boxalino_client, the following datasets must exist (upon your integration use-cases):

for dev index: boxalino_client_dev_I, boxalino_client_dev_F, boxalino_client_dev_D
for production data index: boxalino_client_I, boxalino_client_F, boxalino_client_D

The above datasets must exist in your project.

...

Versions Compared

Old Version 15

New Version Current

Key