Tip |
---|
At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform). |
...
The DI-SAAS SYNC request
The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.
...
Info |
---|
There should be a process within your own project that triggers the data sync between a 3rd party source (connector) and Boxalino. |
Expand |
---|
|
This is a sample of a triggered full product sync (minimal data): Code Block |
---|
curl "https://boxalino-di-saas-krceabfwya-ew.a.run.app/sync" \
-X POST \
-d "[{\"connector\":{\"type\":\"gcs\",\"options\":{\"source\":{\"bucket\":\"my-account-bucket\",\"pattern\":\"boxalino/product/__NODASHDATE__/\"}},\"load\":{\"options\":{\"format\":\"NEWLINE_DELIMITED_JSON\",\"autodetect\":true,\"schema\":\"\",\"skipRows\":0,\"max_bad_records\":0,\"write_disposition\":\"WRITE_TRUNCATE\"},\"doc_product\":{\"entity\":[{\"source\":\"product.jsonl\",\"autodetect\":0,\"schema\":\"ProductId:INT64,Name:STRING,ProductTypeId:INT64,Sku:STRING,GroupId:STRING,Price:FLOAT64,SalePrice:FLOAT64,created:DATETIME\"}],\"product_relations\":[{\"source\":\"crosssell.jsonl\",\"name\":\"crosssell\"}],\"link\":[{\"source\":\"urlrecord.jsonl\"}]}}},\"di\":{\"configuration\":{\"languages\":[\"de\",\"fr\"],\"currencies\":[\"CHF\"],\"mapping\":{\"languagesCountryCode\":{\"de\":\"CH_de\",\"fr\":\"CH_fr\"}},\"default\":{\"language\":\"de\",\"currency\":\"CHF\"}}}}]" \
-H "Content-Type: application/json" \
-H "account: <boxalino-account-name>" \
-H "mode: F" \
-H "tm: 202303112000" \
-H "type: product" \
-H "Authorization: <base64_encode(api_key:api_secret)>" |
The request above created the following resources: GCS (raw data, as migrated from the connector) gs://prod_rtux-data-integration_<account>/product/202303112000/F_product.jsonl
gs://prod_rtux-data-integration_<account>/product/202303112000/F_crossell.jsonl
gs://prod_rtux-data-integration_<account>/product/202303112000/F_urlrecord.jsonl
BQ (the T dataset - raw data as loaded from BQ) rtux-data-integration.<account>_T.202303112000-doc_product-entity
rtux-data-integration.<account>_T.202303112000-doc_product-product_relations-crossell
rtux-data-integration.<account>_T.202303112000-doc_product-link
GCS (transformed doc_X JSONL) gs://prod_rtux-data-integration_<account>/doc_product_F_202303112000.jsonl
gs://prod_rtux-data-integration_<account>/doc_language_F_202303112000.jsonl
BQ (the F dataset - transformed data to doc_X data structure) rtux-data-integration.<account>_F.doc_product_F_202303112000
rtux-data-integration.<account>_F.doc_language_F_202303112000
|
Expand |
---|
title | Sample request with Boxalino connector (ex: data is already available in Boxalino GCP project, in client`s GCS bucket) |
---|
|
Code Block |
---|
[
{
"connector": {
"type": "boxalino",
"load": {
"options": {
"format": "NEWLINE_DELIMITED_JSON",
"autodetect": true,
"schema": "",
"skipRows": 0,
"max_bad_records": 0,
"write_disposition": "WRITE_TRUNCATE"
},
"doc_product": {
"entity": [
{
"source": "product.jsonl",
"autodetect": 0,
"schema": "ProductId:INT64,Name:STRING,ProductTypeId:INT64,Sku:STRING,GroupId:STRING,Price:FLOAT64,SalePrice:FLOAT64,created:DATETIME"
}
],
"product_relations": [
{
"source": "crosssell.jsonl",
"name": "crosssell"
}
],
"link": [
{
"source": "urlrecord.jsonl"
}
]
}
}
},
"di": {
"configuration": {
"languages": [
"de",
"fr"
],
"currencies": [
"CHF"
],
"mapping": {
"languagesCountryCode": {"de":"CH_de", "fr":"CH_fr"}
},
"default": {
"language": "de",
"currency": "CHF"
}
}
}
}
] |
|
The DI-SAAS CORE request
The CORE request is used to facilitate the load of data (as is) in the clients` _core
dataset from the Boxalino ecosystem.
In order to achieve this, there are 2 requests to be made:
load data to GCS https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#Loading-content-to-Boxalino-GCS-(connector%3A-boxalino)
load data from GCS to <client>_core.<destination>
table
REQUEST DEFINITION
As an integrator, please create the bellow request to the provided endpoint.
Info |
---|
There should be a process within your own project that triggers the data sync between a 3rd party source (connector) and Boxalino. |
Expand |
---|
title | Loading data to _core dataset |
---|
|
For the integration bellow, the rti JSON payload is loaded in the client`s dataset. Note |
---|
The same tm value is used. The jsonl payload data should have a primary field (ex: id ) |
Code Block |
---|
curl "https://boxalino-di-stage-krceabfwya-ew.a.run.app/transformer/load" \
-X POST \
-H "Content-Type: application/json" \
-H "client: <account>" \
-H "tm: 20231106000000" \
-H "type: content" \
-H "mode: T" \
-H "doc: <filename>.json" \
-d "<JSONL data>" \
-H "Authorization: Basic <encode<apiKey:apiSecret>>" |
Code Block |
---|
curl "https://boxalino-di-saas-krceabfwya-ew.a.run.app/core" \
-X POST \
-d "[{\"connector\":{\"type\":\"boxalino\",\"load\":{\"options\":{\"format\":\"NEWLINE_DELIMITED_JSON\",\"field_delimiter\":\"\",\"autodetect\":true,\"schema\":\"\",\"skipRows\":0,\"max_bad_records\":0,\"quote\":\"\",\"write_disposition\":\"WRITE_TRUNCATE\",\"create_disposition\":\"\",\"encoding\":\"\"},\"doc_content\":{\"rti\":[{\"source\":\"<filename>*.json\",\"autodetect\":1,\"destination\":\"<table name from core dataset>\",\"primary_field\":\"<primary field in your table>\"}]}}}}]" \
-H "Content-Type: application/json" \
-H "mode: F" \
-H "dev: 0" \
-H "tm: 20231106000000" \
-H "type: content" \
-H "client: <client>" \
-H "Authorization: Basic <encode<apiKey:apiSecret>>" |
Note |
---|
Note the JSON payload for table definition "doc_content": { "rti": [ { "source": "rti*.jsonl", "autodetect": 1, "destination": "doc_content_rti", "primary_field": "id" } ] } The final output table is <client>_core.doc_content_rti . Depending on the write_disposition property, the data is either rewritten in the table or appended. |
|
Loading content to Boxalino GCS (connector: boxalino)
...
Panel |
---|
panelIconId | 1f44c |
---|
panelIcon | :ok_hand: |
---|
panelIconText | 👌 |
---|
bgColor | #FFEBE6 |
---|
|
The use of the header chunk is required if the same file/document is exported in batches/sections. Repeat steps 1+2 for every data batch loaded in GCS. Make sure to increment the value of the chunk property for each /transformer/load/url request. |
Tip |
---|
Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type) Only after all the files are available in GCS, you can move on to step#3. |
Tip |
---|
After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino. |
...