Skip to end of banner
Go to start of banner

Data Integration

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Current »

Overview

To integrate your Data with Boxalino, you can either follow these 4 steps (where you load your data in your own BigQuery Dataset) or only step 1 and 4 (where you send the JSONL files directly to the Boxalino Process):

1. Export JSONL files

Export your data in JSONL for each selected Data Type of the Data Structure.

JSONL = Newline delimited JSON ( https://en.wikipedia.org/wiki/JSON_streaming )

Make sure to name each file with the name starting with the Data Type (e.g.: “doc_product”), with the mode in the middle ('F': full, ‘D': delta) and finishing with the Datetime (YYYYMMDDHHMMSS) of when the export has started (it’s important to have it when it started, as if you have then new delta starting later but finishing before, they will need to be replayed based on their Datetime).
So as a result, file names should be like: “doc_product_ F_YYYYMMDDHHMMSS.json”

2. Load files to Google Storage

This step is optional, you can also decide to push the files directly to the Boxalino Proces

Load your files into a Google Cloud Storage bucket you need to set-up in your GCP environment.

More information about how set-up your GCP environment and how to load files into Google Cloud Storage can be found here: Integrate your Data in Boxalino BigQuery Data Science Eco-System

You can put an automated expiration for the files so they don’t accumulate over time

3. Load files to BigQuery tables

This step is optional, you can also decide to push the files directly to the Boxalino Proces

Create new tables with the Field Schema provided at the end of each Data Structure (e.g.: here for the doc_product table) for each of the files with exactly the same name in a BigQuery DataSet you need to set-up in your GCP environment.

More information about how set-up your GCP environment and how to load files from Google Cloud Storage to BigQuery can be found here: Integrate your Data in Boxalino BigQuery Data Science Eco-System

Make sure to use the full provided field Schema and not to use auto-detect from your JSON, as otherwise, only a part of the table structure will be created and Boxalino processing will fail as it expects the full table schema to be available.

4. Trigger Boxalino Process

If you only did the first step and want to send your files directly, then call first the Load Request for each of your files.

Whether you called the Load Process first or not before, you need then to call the then call the Sync Request (once for all the files).

  • No labels