Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform).

...

The DI-SAAS SYNC request

The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is required if the same file/document is exported in batches/sections.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /transformer/load/url request.

Tip

Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type)

Only after all the files are available in GCS, you can move on to step#3.

Tip

After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino.

...

The DI-SAAS flow can be used to access privately owned files in GCS and to update/create privately owned BigQuery resources as well.

Tip

If your project already shared access to your Boxalino users group, bi-with-ai.<client>@boxalino.com , t

Google Cloud Storage GCS

If your connector is gcs and you want to push and read data to a privately owned resource, share access to boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com or bi-with-ai.<client>@boxalino.com (your Boxalino users group)

Required permissions:

  1. storage.objects.get,

  2. storage.buckets.get

  3. storage.objects.list

...

If the goal is to load the raw GCS file to a privately owned BQ resource (defined as: loadoptionsload_dataset_stage), you have to ensure that the dataset EXISTS and it is in using location EU.

We recommend to create a dedicated dataset for this purpose. To avoid BQ storage costs, we recommend to set an expiration date for the tables (ex: 7 days)

image-20241119-133935.png

The same Service Account boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com or bi-with-ai.<client>@boxalino.com (your Boxalino users group) must have read/write access to the dataset where the GCS data is loaded. For simplicity, BigQuery Admin is suggested.

Note

Share access only on the resources that Boxalino is allowed to access. For example, if your integrations only work on a temporary dataset - boxalino_saas_stage or boxalino_saas- you can share access at resource level (instead of project level).

...