Tip |
---|
At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform). |
...
The DI-SAAS SYNC request
The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.
...
Expand |
---|
title | Sample request with Boxalino connector (ex: data is already available in Boxalino GCP project, in client`s GCS bucket) |
---|
|
Code Block |
---|
[
{
"connector": {
"type": "boxalino",
"load": {
"options": {
"format": "NEWLINE_DELIMITED_JSON",
"autodetect": true,
"schema": "",
"skipRows": 0,
"max_bad_records": 0,
"write_disposition": "WRITE_TRUNCATE"
},
"doc_product": {
"entity": [
{
"source": "product.jsonl",
"autodetect": 0,
"schema": "ProductId:INT64,Name:STRING,ProductTypeId:INT64,Sku:STRING,GroupId:STRING,Price:FLOAT64,SalePrice:FLOAT64,created:DATETIME"
}
],
"product_relations": [
{
"source": "crosssell.jsonl",
"name": "crosssell"
}
],
"link": [
{
"source": "urlrecord.jsonl"
}
]
}
}
},
"di": {
"configuration": {
"languages": [
"de",
"fr"
],
"currencies": [
"CHF"
],
"mapping": {
"languagesCountryCode": {"de":"CH_de", "fr":"CH_fr"}
},
"default": {
"language": "de",
"currency": "CHF"
}
}
}
}
] |
|
The DI-SAAS CORE request
...
Use this flow if the purpose is to expose data -as is- from your system to Boxalino. The data will be available in the clients` _core
dataset
...
in the Boxalino ecosystem.
In order to achieve this, there are 2 requests to be made:
load data to GCS https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#Loading-content-to-Boxalino-GCS-(connector%3A-boxalino)
load data from GCS to <client>_core.<destination>
table
Note |
---|
The data payload (CSV or JSONL) must have a field to be used as primary field (ex: id). |
REQUEST DEFINITION
As an integrator, please create the bellow request to the provided endpoint.
...
Expand |
---|
title | Loading data to _core dataset |
---|
|
For the integration bellow, the rti JSON payload is loaded in the client`s dataset. Note |
---|
The same tm value is used.The jsonl payload data should have a primary field (ex: id ) |
Code Block |
---|
curl "https://boxalino-di-stage-krceabfwya-ew.a.run.app/transformer/load" \
-X POST \
-H "Content-Type: application/json" \
-H "client: <account>" \
-H "tm: 20231106000000" \
-H "type: content" \
-H "mode: T" \
-H "doc: <filename>.json" \
-d "<JSONL data>" \
-H "Authorization: Basic <encode<apiKey:apiSecret>>" |
Code Block |
---|
curl "https://boxalino-di-saas-krceabfwya-ew.a.run.app/core" \
-X POST \
-d "[{\"connector\":{\"type\":\"boxalino\",\"load\":{\"options\":{\"format\":\"NEWLINE_DELIMITED_JSON\",\"field_delimiter\":\"\",\"autodetect\":true,\"schema\":\"\",\"skipRows\":0,\"max_bad_records\":0,\"quote\":\"\",\"write_disposition\":\"WRITE_TRUNCATE\",\"create_disposition\":\"\",\"encoding\":\"\"},\"doc_content\":{\"rti\":[{\"source\":\"<filename>*.json\",\"autodetect\":1,\"destination\":\"<table name from core dataset>\",\"primary_field\":\"<primary field in your table>\"}]}}}}]" \
-H "Content-Type: application/json" \
-H "mode: F" \
-H "dev: 0" \
-H "tm: 20231106000000" \
-H "type: content" \
-H "client: <client>" \
-H "Authorization: Basic <encode<apiKey:apiSecret>>" |
Note |
---|
Note the JSON payload for table definition "doc_content": { "rti": [ { "source": "rti*.jsonl", "autodetect": 1, "destination": "doc_content_rti", "primary_field": "id" } ] } The final output table is <client>_core.doc_content_rti . Depending on the write_disposition property, the data is either rewritten in the table or appended. |
|
...
Panel |
---|
panelIconId | 1f44c |
---|
panelIcon | :ok_hand: |
---|
panelIconText | 👌 |
---|
bgColor | #FFEBE6 |
---|
|
The use of the header chunk is required if the same file/document is exported in batches/sections. Repeat steps 1+2 for every data batch loaded in GCS. Make sure to increment the value of the chunk property for each /transformer/load/url request. |
Tip |
---|
Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type) Only after all the files are available in GCS, you can move on to step#3. |
Tip |
---|
After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino. |
...