Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform).

...

The DI-SAAS SYNC request

The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.

...

Expand
titleSample request with Boxalino connector (ex: data is already available in Boxalino GCP project, in client`s GCS bucket)
Code Block
[
  {
    "connector": {
      "type": "boxalino",
       "load": {
        "options": {
          "format": "NEWLINE_DELIMITED_JSON",
          "autodetect": true,
          "schema": "",
          "skipRows": 0,
          "max_bad_records": 0,
          "write_disposition": "WRITE_TRUNCATE"
        },
        "doc_product": {
          "entity": [
            {
              "source": "product.jsonl",
              "autodetect": 0,
              "schema": "ProductId:INT64,Name:STRING,ProductTypeId:INT64,Sku:STRING,GroupId:STRING,Price:FLOAT64,SalePrice:FLOAT64,created:DATETIME"
            }
          ],
          "product_relations": [
            {
              "source": "crosssell.jsonl",
              "name": "crosssell"
            }
          ],
          "link": [
            {
              "source": "urlrecord.jsonl"
            }
          ]
        }
      }
    },
    "di": {
      "configuration": {
        "languages": [
          "de",
          "fr"
        ],
        "currencies": [
          "CHF"
        ],
        "mapping": {
          "languagesCountryCode": {"de":"CH_de", "fr":"CH_fr"}
        },
        "default": {
          "language": "de",
          "currency": "CHF"
        }
      }
    }
  }
]

The DI-SAAS CORE request

...

Use this flow if the purpose is to expose data -as is- from your system to Boxalino. The data will be available in the clients` _core dataset

...

in the Boxalino ecosystem.

In order to achieve this, there are 2 requests to be made:

  1. load data to GCS https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#Loading-content-to-Boxalino-GCS-(connector%3A-boxalino)

  2. load data from GCS to <client>_core.<destination> table

Note

The data payload (CSV or JSONL) must have a field to be used as primary field (ex: id).

REQUEST DEFINITION

As an integrator, please create the bellow request to the provided endpoint.

...

Expand
titleLoading data to _core dataset

For the integration bellow, the rti JSON payload is loaded in the client`s dataset.

Note

The same tm value is used.The jsonl payload data should have a primary field (ex: id)

Code Block
curl  "https://boxalino-di-stage-krceabfwya-ew.a.run.app/transformer/load" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "client: <account>" \
  -H "tm: 20231106000000" \
  -H "type: content" \
  -H "mode: T" \
  -H "doc: <filename>.json" \
  -d "<JSONL data>" \
  -H "Authorization: Basic <encode<apiKey:apiSecret>>"
Code Block
curl "https://boxalino-di-saas-krceabfwya-ew.a.run.app/core" \
  -X POST \
  -d "[{\"connector\":{\"type\":\"boxalino\",\"load\":{\"options\":{\"format\":\"NEWLINE_DELIMITED_JSON\",\"field_delimiter\":\"\",\"autodetect\":true,\"schema\":\"\",\"skipRows\":0,\"max_bad_records\":0,\"quote\":\"\",\"write_disposition\":\"WRITE_TRUNCATE\",\"create_disposition\":\"\",\"encoding\":\"\"},\"doc_content\":{\"rti\":[{\"source\":\"<filename>*.json\",\"autodetect\":1,\"destination\":\"<table name from core dataset>\",\"primary_field\":\"<primary field in your table>\"}]}}}}]" \
  -H "Content-Type: application/json" \
  -H "mode: F" \
  -H "dev: 0" \
  -H "tm: 20231106000000" \
  -H "type: content" \
  -H "client: <client>" \
  -H "Authorization: Basic <encode<apiKey:apiSecret>>"
Note

Note the JSON payload for table definition
"doc_content": {
"rti": [
{
"source": "rti*.jsonl",
"autodetect": 1,
"destination": "doc_content_rti",
"primary_field": "id"
}
]
}

The final output table is <client>_core.doc_content_rti.
Depending on the write_disposition property, the data is either rewritten in the table or appended.

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is required if the same file/document is exported in batches/sections.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /transformer/load/url request.

Tip

Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type)

Only after all the files are available in GCS, you can move on to step#3.

Tip

After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino.

...