...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
For the purpose of account launch and best-practices configuration, the following data must be synchronized fully once:
...
product
...
order (online or/and offline)
...
Tip |
---|
At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform). |
The ELT flow is recommended when wanting to load your data as is (no transformation) in BQ and then perform the transformation logic (SQL) in Dataform. Integration-effort is minimal (push all data to GCS - Boxalino-owned or private client-owned, load data to BQ, write SQL in Dataform).
More information about the Integration Strategies is available on Confluence.
Integration Strategies
...
The Generic Data Integration (DI) Flow expects for the client, themselves to :
...
create each JSONL document
...
load to Boxalino GCS
...
load the files to Boxalino BQ
...
execute the steps described in Data Integration .
For certain platforms (Shopware6, Magento2), the step #1 can be done with a Boxalino Data Integration plugin.
...
Tip |
---|
As a integrator, you can do the steps #1-#4 Data Integration by following our recommended integration flowIntegration Flow |
...
Code Block |
---|
[ { "connector": { "type": "sftp|gcs|plentymarket|boxalino", "options": { // specific for each connector type }, "load": { "options": { // for loading the data in BQ (BQ parameters) "format": "CSV|NEWLINE_DELIMITED_JSON", "field_delimiter": ";", "autodetect": true, "schema": "", "skipRows": 1(CSV)|0(JSONL), "max_bad_records": 0, "quote": "", "write_disposition": "WRITE_TRUNCATE", "create_disposition": "", "encoding": "" }, "doc_X": { "property_node_from_doc_data_structure": [ { "source": "file___NODASHDATE__.jsonl", "name": "<used to create suffix for BQ table>", "schema": "" } ] } } }, "di": { "configuration": { "languages": [ "de", "fr" ], "currencies": [ "CHF" ], "mapping": { "languagesCountryCode": {"de":"CH_de", "fr":"CH_fr"} }, "default": { "language": "de", "currency": "CHF" } } } } ] |
1. Data Connector & Data Load
The access to the data is managed by the client. These are a few guidelines for naming patterns:
The relevant data sources are available in .csv or JSONL (prefered) format
The files have a timestamp in the naming or in the access path (ex: product_20221206.jsonl)
this will help automating the integration
The files required to update a certain data type (ex: product, order, etc) are available in the same path
The files are available on an endpoint (SFTP, GCS, public) to which Boxalino has access
for GCS/ GCP sources: access being shared to Boxalino`s Service Account
boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com
for AWS / SFTP : a Boxalino user & credentials are must be provided
Expand | ||
---|---|---|
| ||
|
...
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
The The requirements specified above (#1-#4) are necessary if the data is accessed from a remote (outside Boxalino) scope. If your integration exports the data directly in Boxalino (as described https://boxalino-internal.atlassian.net/wiki/spaces/DOC/pages/2606792705/Boxalino+Data+Integration+DI-SAAS+-+ELT+Flow#Loading-content-to-Boxalino-GCS-(connector%3A-boxalino) ), please continue with the Data Transformation step. |
2. Data Transformation
Once the data is loaded in GCS and BQ, it is time to transform it in the necessary data structure.
...
Endpoint | production | ||||
---|---|---|---|---|---|
1 | Action | test / | stagehttps://boxalino-di-transformer-stage-krceabfwya-ew.a.run.app | ||
2 | Action | /di | |||
sync | |||||
2 | Method | POST | |||
43 | Body | ||||
54 | Headers | Authorization | Basic base64<<DATASYNC API key : DATASYNC API Secret>> note: use the API credentials from your Boxalino account that have the ADMIN role assigned | ||
65 |
| Content-Type | application/json | ||
76 |
| client | account name | ||
87 |
| mode | data sync mode: F for full, D for delta | ||
98 |
| type | product, user, content, user_content, order. if left empty - it will check for all tables with the given tm | ||
109 |
| tm | (optional) time , in format: YmdHis;
technical: used to identify the documents version | ||
1110 |
| ts | (optional) timestamp, must be millisecond based in UNIX timestamp | ||
1211 |
| dev | (optional) use this to mark the content for the dev data index |
Expand | ||
---|---|---|
| ||
This is a sample of a triggered full product sync (minimal data):
The request above created the following resources:
|
Expand | ||
---|---|---|
| ||
|
...
|
Loading content to Boxalino GCS (connector: boxalino)
Note |
---|
By following these steps, you can push your data (JSONL or CSV) directly in your client`s GCS bucket in the Boxalino scope. After all data has been loaded in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino. |
There are 2 available flows, based on the size of your data:
...
Expand | |||
---|---|---|---|
| |||
For example, the request bellow would create a
|
Note |
---|
The same |
...
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
The use of the header |
Tip |
---|
Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type) Only after the full content is available in GCS, you can move on to step#3. |
Tip |
---|
After all required documents (doc) for the given |
Technical notes
Every request to our nodes returns a response (200 or 400). This can be used internally for testing purposes as well.
It is possible to review the STATUS of the requests at any given time Status Review
...