Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform).

...

  1. The client has access to a GCP project

  2. The client will create a Dataform repository https://cloud.google.com/dataform/docs/repositories

  3. The client has access to a GitHub or GitLab repository (to connect it to the Dataform repository) https://cloud.google.com/dataform/docs/connect-repository

  4. The client has given “Dataform Admin” permission to Boxalino Service Account boxalino-dataform@rtux-data-integration.iam.gserviceaccount.com

When using dataform for transforming the exported data (your custom CSV/JSONL files) into Boxalino Data Structure, the JSON body (for the SYNC REQUEST) has the following information (next to connector and di):

Code Block
"transform": {
    "vars": {
      "var1": "value"
    },
    "tags": [
      "order"
    ],
    "dataform": {
      "project": "rtux-data-integration",
      "location": "europe-west1",
      "repository": "boxalino-di-saas-elt",
      "workspace": "dataform"
    }
  }
Note

The values for dataform are available directly in your dataform project link:
https://console.cloud.google.com/bigquery/dataform/locations/<location>/repositories/<repository>/workspaces/<workspace>?project=<project>

The DI-SAAS SYNC request

The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is required if the same file/document is exported in batches/sections.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /transformer/load/url request.

Tip

Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type)

Only after all the files are available in GCS, you can move on to step#3.

Tip

After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino.

...