Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. You can test that your JSONL is valid by doing a test load in your own GCP project https://github.com/boxalino/data-integration-doc-schema#are-you-an-integrator

  2. You can test that your JSONL is valid by doing a test with using the generator https://github.com/boxalino/data-integration-doc-schema/blob/master/schema/generator.html (guidelines in the repository README.md)

For certain headless CMS, Boxalino has designed a Transformer service Transformer

...

  1. The content is exported as the body of your POST request

  2. The content is exported with the help of a public GCS Signed URL (https://cloud.google.com/storage/docs/access-control/signed-urls )

Option #1 is allowed recommended for data volume less than 32MB.

Option #2 is allowed for any data size.

...

A. Loading content less than 32 MB

For this use-case, there is a single request to be made: https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Request-Definition

Code Block
 curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "chunk: <batch nr>" \
  -H "doc: product|language|attribute_value|attribute|order|user|communication_history|communication_planning|user_generated_content" \
  -d "<JSONL>" \
  -H "Authorization: Basic <encode of the account>" 

...

Warning

If the service response is an error like: 413 Request Entity Too Large - please use the 2nd flow.

...

B. Loading undefined data size

...

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/chunk" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "chunk: <id>" \
  -H "doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content" \
  -H "Authorization: Basic <encode of the account>"

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is only required if there is a project need to export data the data is exported in batches.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /load/chunk request.

Only after the full content is available in GCS, you can move on to step#3.

...

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/bq" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content" \
  -H "Authorization: Basic <encode of the account>"

...

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/sync" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "Authorization: Basic <encode of the account>"project: <client GCP project>" \
  -H "dataset: <client GCP dataset>" \
  -H "Authorization: Basic <encode of the account>"

For the test scenario before - product data synchronization, the following SYNC request can be made once the documents have been loaded to BQ:

...

Info

If your project uses their own private GCP project & resources, please add as well include the headers for project, dataset

Tip

After making the SYNC request, the data is being computed . For more options, always review the https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761/Sync+Request#Request-Definition

Tip

After making the SYNC request, the data is being computed and updated in relevant feeds (data index, real time injections, reports, etc)

Panel
panelIconId1f913
panelIcon:nerd:
panelIconText🤓
bgColor#FFEBE6#FFFAE6

We encourage to have a stable fallback & retry policy.
We recommend to also review the status of your requests as they happen: Status Review

Panel
panelIconId1f9d0
panelIcon:face_with_monocle:
panelIconText🧐
bgColor#FFBDAD

In the technical samples, the stage endpoint https://boxalino-di-stage-krceabfwya-ew.a.run.app was used (for testing purposes). Once your integration flow is ready for production, make sure to use the adequate endpoints:

  1. For mode: F (full) data pushes: https://boxalino-di-full-krceabfwya-ew.a.run.app

  2. For mode: D (delta) data pushes: https://boxalino-di-delta-krceabfwya-ew.a.run.app

  3. For mode: I (instant) data pushes: https://boxalino-di-instant-krceabfwya-ew.a.run.app