Tip |
---|
At Boxalino we are strong advisors for ELT flows (vs ETL). The major benefit is speed, transparency and maintenence: data is loaded directly into a destination system (BQ), and transformed in-parallel (Dataform). |
...
The DI-SAAS SYNC request
The DI request will use the same headers (client, tm, mode, type, authorization)and a JSON request body that would provide mapping details between the loaded .jsonl files and data meaning.
...
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
The use of the header |
Tip |
---|
Step #1 - #2 must be repeated for every file that is required to be added for the given process (same tm, mode & type) Only after all the files are available in GCS, you can move on to step#3. |
Tip |
---|
After all required documents (doc) for the given |
...
The DI-SAAS flow can be used to access privately owned files in GCS and to update/create privately owned BigQuery resources as well.
Tip |
---|
If your project already shared access to your Boxalino users group, |
Google Cloud Storage GCS
If your connector is gcs and you want to push and read data to a privately owned resource, share access to boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com
or bi-with-ai.<client>@boxalino.com
(your Boxalino users group)
Required permissions:
storage.objects.get
,storage.buckets.get
storage.objects.list
...
If the goal is to load the raw GCS file to a privately owned BQ resource (defined as: load
→ options
→ load_dataset_stage
), you have to ensure that the dataset EXISTS and it is in using location EU.
We recommend to create a dedicated dataset for this purpose. To avoid BQ storage costs, we recommend to set an expiration date for the tables (ex: 7 days)
The same Service Account boxalino-di-api@rtux-data-integration.iam.gserviceaccount.com
or bi-with-ai.<client>@boxalino.com
(your Boxalino users group) must have read/write access to the dataset where the GCS data is loaded. For simplicity, BigQuery Admin is suggested.
Note |
---|
Share access only on the resources that Boxalino is allowed to access. For example, if your integrations only work on a temporary dataset - |
...