Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Overview

...

  1. You can test that your JSONL is valid by doing a test load in your own GCP project https://github.com/boxalino/data-integration-doc-schema#are-you-an-integrator

  2. You can test that your JSONL is valid by doing a test with using the generator https://github.com/boxalino/data-integration-doc-schema/blob/master/schema/generator.html (guidelines in the repository README.md)

For certain headless CMS, Boxalino has designed a Transformer service Transformer

...

  1. The content is exported as the body of your POST request

  2. The content is exported with the help of a public GCS Signed URL (https://cloud.google.com/storage/docs/access-control/signed-urls )

Option #1 is allowed recommended for data volume less than 32MB.

Option #2 is allowed for any data size.

...

A. Loading content less than 32 MB

For this use-case, there is a single request to be made: https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Request-Definition

Code Block
 curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "chunk: <batch nr>" \
  -H "doc: product|language|attribute_value|attribute|order|user|communication_history|communication_planning|user_generated_content" \
  -d "<JSONL>" \
  -H "Authorization: Basic <encode of the account>" 

...

Warning

If the service response is an error like: 413 Request Entity Too Large - please use the 2nd flow.

...

B. Loading undefined data size

...

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/chunk" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "chunk: <id>" \
  -H "doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content" \
  -H "Authorization: Basic <encode of the account>"

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is only required if there is a project need to export data the data is exported in batches.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /load/chunk request.

Only after the full content is available in GCS, you can move on to step#3.

...

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/bq" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content" \
  -H "Authorization: Basic <encode of the account>"

...

3. How to register a compute request

Panel
panelIconId26a0
panelIcon:warning:
panelIconText⚠️
bgColor#FF8F73

This step is required. With this step, the data exported is being computed and made available throughout the Boxalino Data Warehouse Ecosystem for future uses.

For this step, there is a single request to be made: Sync Request

Code Block
curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/sync" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "dev: true|false" \
  -H "tm: YYYYmmddHHiiss" \
  -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \
  -H "mode: F|D|I" \
  -H "project: <client GCP project>" \
  -H "dataset: <client GCP dataset>" \
  -H "Authorization: Basic <encode of the account>"

...

Info

If your project uses their own private GCP project & resources, please add as well include the headers for project, dataset. For more options, always review the https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761/Sync+Request#Request-Definition

Tip

After making the SYNC request, the data is being computed and updated in relevant feeds (data index, real time injections, reports, etc)

Panel
panelIconId1f913
panelIcon:nerd:
panelIconText🤓
bgColor#FFEBE6#FFFAE6

We encourage to have a stable fallback & retry policy.
We recommend to also review the status of your requests as they happen: Status Review

Panel
panelIconId1f9d0
panelIcon:face_with_monocle:
panelIconText🧐
bgColor#FFBDAD

In the technical samples, the stage endpoint https://boxalino-di-stage-krceabfwya-ew.a.run.app was used (for testing purposes). Once your integration flow is ready for production, make sure to use the adequate endpoints:

  1. For mode: F (full) data pushes: https://boxalino-di-full-krceabfwya-ew.a.run.app

  2. For mode: D (delta) data pushes: https://boxalino-di-delta-krceabfwya-ew.a.run.app

  3. For mode: I (instant) data pushes: https://boxalino-di-instant-krceabfwya-ew.a.run.app