Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

At this point, the following steps should already be achieved by the Data Integration (DI) team:

  1. Create the JSONL files https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252149803/Data+Integration#1.-Export-JSONL-files

...

  1. saving your JSONL (for every data type required for the data sync type) in a Google Cloud Storage Bucket from the Boxalino project

    1. for Instant Updates : the files have a retention policy of 1 day

    2. for Delta Updates: the files have a retention policy of 7 days

    3. for Full Updates: the files have a retention policy of 30 days

  2. loading every JSONL in a BigQuery table in Boxalino project

Boxalino will provide access to the data Storage Buckets and BigQuery datasets that store your data.

...

The BigQuery dataset in which your account documents are loaded can be named after the data index and process it is meant to synchronize to:

  1. if the request is for the dev data index - <client>_dev_<mode>

  2. if the request is for the production data index - <client>_<mode>

, where:

  • <client> is the account name provided by Boxalino.

  • <mode> is the process F (for full), D (for delta), I (for instant)

Example, for our an account boxalino_client, the following datasets must exist (upon your integration use-cases):

  • for dev index: boxalino_client_dev_I, boxalino_client_dev_F, boxalino_client_dev_D

  • for production data index: boxalino_client_I, boxalino_client_F, boxalino_client_D

The above datasets must exist in your project.

...

  • doc_attribute, doc_attribute_value, doc_language can be synchronized with a call to the load endpoint

  • doc_product / doc_order / doc_user must be loaded in batches / via public GCS link

    • GCP has a size limit of 256MB 32MB for POST requests

    • we recommend avoiding it by receiving a public GCS load URL (steps bellow)

...