Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform).

...

  1. The relevant data sources are available in .csv or JSONL (prefered) format

  2. The files have a timestamp in the naming or in the access path (ex: product_20221206.jsonl)

    1. this will help automating the integration

  3. The files required to update a certain data type (ex: product, order, etc) are available in the same path

  4. The files are available on an endpoint (SFTP, GCS, 3rd party API, public URL) to which Boxalino has access

    1. for GCS/ GCP sources: access being shared to Boxalino`s Service Account boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com (storage.objects.get, storage.buckets.getand storage.objects.list , which can be achieved by Storage Legacy Bucket Reader & Storage Legacy Object Reader)

    2. for AWS / SFTP : the client`s own AWS/SFTP environment with a Boxalino user & credentials

...

The DI-SAAS SYNC request

The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.

...

Panel
panelIconId1f44c
panelIcon:ok_hand:
panelIconText👌
bgColor#FFEBE6

The use of the header chunk is required if the same file/document is exported in batches/sections.
Repeat steps 1+2 for every data batch loaded in GCS.
Make sure to increment the value of the chunkproperty for each /transformer/load/url request.

Tip

Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type)

Only after all the files are available in GCS, you can move on to step#3.

Tip

After all required documents (doc) for the given type data sync (ex: product, order, etc) have been made available in GCS, the DI-SAAS request can be called https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/928874497/DI-SAAS+ELT+Flow#The-DI-request , assigning connector → type : boxalino.

...