Integration Flow

Overview

As an integrator, you are tasked to expose the clients` data to Boxalino Data Structure. The following steps can be used as guidelines for a technical approach:

  1. Transform your data source to Boxalino Data Structure https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252280881 as a JSONL content

  2. Load the transformed content to GCS https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770

  3. Load the content to BQ

  4. Tell our system to compute your content https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761

In the upcoming sections, we will present sample cURL requests that, further on, can be translated into your own programming language of choice.

For clients who use their own Google Cloud Platform project for storage of the documents, the documented rules/naming patterns of the BigQuery resources must be used . In such scenarios, only the step #4 is of interest.

1. Transform your data source

The Boxalino Data Structure is publicly available in our git repository:

You can use the repository to identify the data formats & data elements expected for each document.

  1. You can test that your JSONL is valid by doing a test load in your own GCP project https://github.com/boxalino/data-integration-doc-schema#are-you-an-integrator

  2. You can test that your JSONL is valid by using the (guidelines in the repository README.md)

For certain headless CMS, Boxalino has designed a Transformer service

2. Loading content to GCS and BigQuery

There are 2 available flows, based on the size of your data:

  1. The content is exported as the body of your POST request

  2. The content is exported with the help of a public GCS Signed URL ( )

Option #1 is recommended for data volume less than 32MB.

Option #2 is allowed for any data size.

A. Loading content less than 32 MB

For this use-case, there is a single request to be made:

curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load" \ -X POST \ -H "Content-Type: application/json" \ -H "client: <account>" \ -H "dev: true|false" \ -H "tm: YYYYmmddHHiiss" \ -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \ -H "mode: F|D|I" \ -H "doc: product|language|attribute_value|attribute|order|user|communication_history|communication_planning|user_generated_content" \ -d "<JSONL>" \ -H "Authorization: Basic <encode of the account>"

For example, the request bellow would create a doc_language_F_20230301161554.json in your clients` GCS bucket. This is a necessary data for the type:product integration.

curl --connect-timeout 30 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load" \ -X POST \ -H "Content-Type: application/json" \ -H "client: <account>" \ -H "tm: 20230301161554" \ -H "type: product" \ -H "mode: F" \ -H "doc: language" \ -d "{\"language\":\"en\",\"country_code\":\"en-GB\",\"creation_tm\":\"2023-03-01 16:15:54\",\"client_id\":0,\"src_sys_id\":0}\n{\"language\":\"de\",\"country_code\":\"de-CH\",\"creation_tm\":\"2023-03-01 16:15:54\",\"client_id\":0,\"src_sys_id\":0}" \ -H "Authorization: Basic <encode of the account>"

The same tm value must be used across your other requests. This identifies the timestamp of your computation process.

B. Loading undefined data size

This is the generic POST request:

curl --connect-timeout 60 --max-time 300 "https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/chunk" \ -X POST \ -H "Content-Type: application/json" \ -H "client: <account>" \ -H "dev: true|false" \ -H "tm: YYYYmmddHHiiss" \ -H "type: product|content|order|user|communication_history|communication_planning|user_generated_content" \ -H "mode: F|D|I" \ -H "chunk: <id>" \ -H "doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content" \ -H "Authorization: Basic <encode of the account>"

The response will be an upload link that can only be used in order to create the document doc_<doc>_<mode>_<tm>-<chunk>.json in the clients` GCS bucket. The link is valid for 30minutes.

Lets say, we need a public link to upload the product JSONL. The following request can be used to generate the link:

3. Loading content to BigQuery

Once the content was created in GCS, it is time to import it in BigQuery.

After the request #2, the file doc_product_F_20230301161554.json is available to load in the BQ table <client>_F.doc_product_F_20230301161554 :

3. How to register a compute request

For this step, there is a single request to be made:

For the test scenario before - product data synchronization, the following SYNC request can be made once the documents have been loaded to BQ: