{"type":"doc","content":[{"type":"extension","attrs":{"layout":"default","extensionType":"com.atlassian.confluence.macro.core","extensionKey":"toc","parameters":{"macroParams":{},"macroMetadata":{"macroId":{"value":"ff77f0a0-35ad-4bbf-b13f-accd125ab275"},"schemaVersion":{"value":"1"},"title":"Table of Contents"}},"localId":"33366bc3-31f8-448e-8f51-2ec069224847"}},{"type":"heading","attrs":{"level":1},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"As an integrator, you are tasked to expose the clients` data to Boxalino Data Structure. The following steps can be used as guidelines for a technical approach:","type":"text"}]},{"type":"orderedList","attrs":{"order":1},"content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Transform your data source to Boxalino Data Structure ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/252280881"}},{"text":" ","type":"text"},{"text":"as a JSONL content","type":"text","marks":[{"type":"link","attrs":{"href":"https://jsonlines.org/"}}]}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Load the transformed content to GCS ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770"}},{"text":" ","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Load the content to BQ","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Tell our system to compute your content ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761"}},{"text":" ","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"In the upcoming sections, we will present sample cURL requests that, further on, can be translated into your own programming language of choice.","type":"text"}]},{"type":"panel","attrs":{"panelType":"note"},"content":[{"type":"paragraph","content":[{"text":"For clients who use their own Google Cloud Platform project for storage of the documents, the documented rules/naming patterns of the BigQuery resources must be used ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Using-private-GCP-resources-for-DI"}},{"text":" . In such scenarios, only the step #4 is of interest.","type":"text"}]}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"1. Transform your data source ","type":"text"}]},{"type":"paragraph","content":[{"text":"The Boxalino Data Structure is publicly available in our git repository: ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://github.com/boxalino/data-integration-doc-schema"}},{"text":" ","type":"text"}]},{"type":"paragraph","content":[{"text":"You can use the repository to identify the data formats & data elements expected for each document.","type":"text"}]},{"type":"orderedList","attrs":{"order":1},"content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"You can test that your JSONL is valid by doing a test load in your own GCP project ","type":"text"},{"text":"https://github.com/boxalino/data-integration-doc-schema#are-you-an-integrator","type":"text","marks":[{"type":"link","attrs":{"href":"https://github.com/boxalino/data-integration-doc-schema#are-you-an-integrator"}}]}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"You can test that your JSONL is valid by using the ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://github.com/boxalino/data-integration-doc-schema/blob/master/schema/generator.html"}},{"text":" (guidelines in the repository README.md)","type":"text"}]}]}]},{"type":"panel","attrs":{"panelType":"note"},"content":[{"type":"paragraph","content":[{"text":"For certain headless CMS, Boxalino has designed a Transformer service ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/662536193"}},{"text":" ","type":"text"}]}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"2. Loading content to GCS and BigQuery","type":"text"}]},{"type":"paragraph","content":[{"text":"There are 2 available flows, based on the size of your data:","type":"text"}]},{"type":"orderedList","attrs":{"order":1},"content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"The content is exported as the ","type":"text"},{"text":"body","type":"text","marks":[{"type":"code"}]},{"text":" of your ","type":"text"},{"text":"POST","type":"text","marks":[{"type":"code"}]},{"text":" request","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"The content is exported with the help of a public GCS Signed URL (","type":"text"},{"type":"inlineCard","attrs":{"url":"https://cloud.google.com/storage/docs/access-control/signed-urls"}},{"text":" )","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Option #1 is recommended for data volume ","type":"text"},{"text":"less than 32MB.","type":"text","marks":[{"type":"strong"}]}]},{"type":"paragraph","content":[{"text":"Option #2 is allowed for any data size.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"A. Loading content less than 32 MB","type":"text"}]},{"type":"paragraph","content":[{"text":"For this use-case, there is ","type":"text"},{"text":"a single request","type":"text","marks":[{"type":"strong"}]},{"text":" to be made: ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Request-Definition"}},{"text":" ","type":"text"}]},{"type":"codeBlock","content":[{"text":" curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"dev: true|false\" \\\n -H \"tm: YYYYmmddHHiiss\" \\\n -H \"type: product|content|order|user|communication_history|communication_planning|user_generated_content\" \\\n -H \"mode: F|D|I\" \\\n -H \"doc: product|language|attribute_value|attribute|order|user|communication_history|communication_planning|user_generated_content\" \\\n -d \"\" \\\n -H \"Authorization: Basic \" ","type":"text"}]},{"type":"paragraph","content":[{"text":"For example, the request bellow would create a ","type":"text"},{"text":"doc_language_F_20230301161554.json","type":"text","marks":[{"type":"code"}]},{"text":" in your clients` GCS bucket. This is a necessary data for the ","type":"text"},{"text":"type:product","type":"text","marks":[{"type":"code"}]},{"text":" integration.","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 30 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"tm: 20230301161554\" \\\n -H \"type: product\" \\\n -H \"mode: F\" \\\n -H \"doc: language\" \\\n -d \"{\\\"language\\\":\\\"en\\\",\\\"country_code\\\":\\\"en-GB\\\",\\\"creation_tm\\\":\\\"2023-03-01 16:15:54\\\",\\\"client_id\\\":0,\\\"src_sys_id\\\":0}\\n{\\\"language\\\":\\\"de\\\",\\\"country_code\\\":\\\"de-CH\\\",\\\"creation_tm\\\":\\\"2023-03-01 16:15:54\\\",\\\"client_id\\\":0,\\\"src_sys_id\\\":0}\" \\\n -H \"Authorization: Basic \" ","type":"text"}]},{"type":"panel","attrs":{"panelType":"warning"},"content":[{"type":"paragraph","content":[{"text":"The same ","type":"text"},{"text":"tm","type":"text","marks":[{"type":"code"}]},{"text":" value must be used across your other requests. This identifies the timestamp of your computation process.","type":"text"}]}]},{"type":"panel","attrs":{"panelType":"success"},"content":[{"type":"paragraph","content":[{"text":"The use of ","type":"text"},{"text":"/load","type":"text","marks":[{"type":"code"}]},{"text":" endpoint is also loading the content to BigQuery.","type":"text"}]}]},{"type":"panel","attrs":{"panelType":"error"},"content":[{"type":"paragraph","content":[{"text":"If the service response is an error like: ","type":"text"},{"text":"413 Request Entity Too Large","type":"text","marks":[{"type":"code"}]},{"text":" - please use the 2nd flow.","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"B. Loading undefined data size","type":"text"}]},{"type":"panel","attrs":{"panelType":"note"},"content":[{"type":"paragraph","content":[{"text":"This flow is also described in other pages ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/415432770/Load+Request#Load-in-Batches-%2F-data-%3E-32-MB"}},{"text":" ","type":"text"}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"1. Make a request for public upload link","type":"text"}]},{"type":"paragraph","content":[{"text":"This is the generic POST request:","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/chunk\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"dev: true|false\" \\\n -H \"tm: YYYYmmddHHiiss\" \\\n -H \"type: product|content|order|user|communication_history|communication_planning|user_generated_content\" \\\n -H \"mode: F|D|I\" \\\n -H \"chunk: \" \\\n -H \"doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content\" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"paragraph","content":[{"text":"The response will be an upload link that can only be used in order to create the document ","type":"text"},{"text":"doc___-.json","type":"text","marks":[{"type":"code"}]},{"text":" in the clients` GCS bucket. The link is valid for 30minutes.","type":"text"}]},{"type":"paragraph","content":[{"text":"Lets say, we need a public link to upload the product JSONL. The following request can be used to generate the link:","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/chunk\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"tm: 20230301161554\" \\\n -H \"type: product\" \\\n -H \"mode: F\" \\\n -H \"chunk: 1\" \\\n -H \"doc: product\" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"2. Upload the content on the public link","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --timeout 0 \\\n -X PUT \\\n -H \"Content-Type: application/octet-stream\" \\\n -d \"\"","type":"text"}]},{"type":"panel","attrs":{"panelType":"note"},"content":[{"type":"paragraph","content":[{"text":"Read more about Google Cloud Signed URL ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://cloud.google.com/storage/docs/access-control/signed-urls"}},{"text":" (response samples, uses, etc)","type":"text"}]}]},{"type":"panel","attrs":{"panelIconId":"1f44c","panelIcon":":ok_hand:","panelIconText":"👌","panelColor":"#FFEBE6","panelType":"custom"},"content":[{"type":"paragraph","content":[{"text":"The use of the header ","type":"text"},{"text":"chunk","type":"text","marks":[{"type":"code"}]},{"text":" is required if the data is exported in batches. ","type":"text"},{"type":"hardBreak"},{"text":"Repeat steps 1+2 for every data batch loaded in GCS. ","type":"text"},{"type":"hardBreak"},{"text":"Make sure to increment the value of the ","type":"text","marks":[{"type":"textColor","attrs":{"color":"#ff5630"}},{"type":"strong"}]},{"text":"chunk","type":"text","marks":[{"type":"code"}]},{"text":" ","type":"text","marks":[{"type":"textColor","attrs":{"color":"#ff5630"}}]},{"text":"property for each ","type":"text","marks":[{"type":"textColor","attrs":{"color":"#ff5630"}},{"type":"strong"}]},{"text":"/load/chunk","type":"text","marks":[{"type":"code"}]},{"text":" request.","type":"text","marks":[{"type":"textColor","attrs":{"color":"#ff5630"}},{"type":"strong"}]},{"type":"hardBreak"},{"type":"hardBreak"},{"text":"Only after the full content is available in GCS, you can move on to step#3. ","type":"text"}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"3. Loading content to BigQuery","type":"text"}]},{"type":"paragraph","content":[{"text":"Once the content was created in GCS, it is time to import it in BigQuery.","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/bq\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"dev: true|false\" \\\n -H \"tm: YYYYmmddHHiiss\" \\\n -H \"type: product|content|order|user|communication_history|communication_planning|user_generated_content\" \\\n -H \"mode: F|D|I\" \\\n -H \"doc: doc_product|doc_language|doc_attribute_value|doc_attribute|doc_order|doc_user|communication_history|communication_planning|doc_user_generated_content\" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"paragraph","content":[{"text":"After the request #2, the file doc_product_F_20230301161554.json is available to load in the BQ table ","type":"text"},{"text":"_F.doc_product_F_20230301161554","type":"text","marks":[{"type":"code"}]},{"text":" :","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/load/bq\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"tm: 20230301161554\" \\\n -H \"type: product\" \\\n -H \"mode: F\" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"panel","attrs":{"panelType":"success"},"content":[{"type":"paragraph","content":[{"text":"After all required documents (doc) for the given ","type":"text"},{"text":"type","type":"text","marks":[{"type":"code"}]},{"text":" data sync (ex: product, order, etc) have been made available in BigQuery, the computation request can be called for.","type":"text"}]}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"3. How to register a compute request","type":"text"}]},{"type":"panel","attrs":{"panelIconId":"26a0","panelIcon":":warning:","panelIconText":"⚠️","panelColor":"#FF8F73","panelType":"custom"},"content":[{"type":"paragraph","content":[{"text":"This step is required. With this step, the data exported is being computed and made available throughout the Boxalino Data Warehouse Ecosystem for future uses.","type":"text"}]}]},{"type":"paragraph","content":[{"text":"For this step, there is ","type":"text"},{"text":"a single request","type":"text","marks":[{"type":"strong"}]},{"text":" to be made: ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761"}},{"text":" ","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/sync\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"dev: true|false\" \\\n -H \"tm: YYYYmmddHHiiss\" \\\n -H \"type: product|content|order|user|communication_history|communication_planning|user_generated_content\" \\\n -H \"mode: F|D|I\" \\\n -H \"project: \" \\\n -H \"dataset: \" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"paragraph","content":[{"text":"For the test scenario before - product data synchronization, the following SYNC request can be made once the documents have been loaded to BQ:","type":"text"}]},{"type":"codeBlock","content":[{"text":"curl --connect-timeout 60 --max-time 300 \"https://boxalino-di-stage-krceabfwya-ew.a.run.app/sync\" \\\n -X POST \\\n -H \"Content-Type: application/json\" \\\n -H \"client: \" \\\n -H \"tm: 20230301161554\" \\\n -H \"type: product\" \\\n -H \"mode: F\" \\\n -H \"Authorization: Basic \"","type":"text"}]},{"type":"panel","attrs":{"panelType":"info"},"content":[{"type":"paragraph","content":[{"text":"If your project uses their own private GCP project & resources, please include the headers for ","type":"text"},{"text":"project","type":"text","marks":[{"type":"code"}]},{"text":", ","type":"text"},{"text":"dataset","type":"text","marks":[{"type":"code"}]},{"text":". For more options, always review the ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/394559761/Sync+Request#Request-Definition"}},{"text":" ","type":"text"}]}]},{"type":"panel","attrs":{"panelType":"success"},"content":[{"type":"paragraph","content":[{"text":"After making the SYNC request, the data is being computed and updated in relevant feeds (data index, real time injections, reports, etc)","type":"text"}]}]},{"type":"panel","attrs":{"panelIconId":"1f913","panelIcon":":nerd:","panelIconText":"🤓","panelColor":"#FFFAE6","panelType":"custom"},"content":[{"type":"paragraph","content":[{"text":"We encourage to have a stable fallback & retry policy.","type":"text"},{"type":"hardBreak"},{"text":"We recommend to also review the status of your requests as they happen: ","type":"text"},{"type":"inlineCard","attrs":{"url":"https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/747208705"}},{"text":" ","type":"text"}]}]},{"type":"panel","attrs":{"panelIconId":"1f9d0","panelIcon":":face_with_monocle:","panelIconText":"🧐","panelColor":"#FFBDAD","panelType":"custom"},"content":[{"type":"paragraph","content":[{"text":"In the technical samples, the ","type":"text"},{"text":"stage","type":"text","marks":[{"type":"code"}]},{"text":" endpoint ","type":"text"},{"text":"https://boxalino-di-stage-krceabfwya-ew.a.run.app","type":"text","marks":[{"type":"code"}]},{"text":" was used (for testing purposes). Once your integration flow is ready for production, make sure to use the adequate endpoints:","type":"text"}]},{"type":"orderedList","attrs":{"order":1},"content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"For ","type":"text"},{"text":"mode: F","type":"text","marks":[{"type":"code"}]},{"text":" (full) data pushes: ","type":"text"},{"text":"https://boxalino-di-full-krceabfwya-ew.a.run.app","type":"text","marks":[{"type":"code"}]}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"For ","type":"text"},{"text":"mode: D","type":"text","marks":[{"type":"code"}]},{"text":" (delta) data pushes: ","type":"text"},{"text":"https://boxalino-di-delta-krceabfwya-ew.a.run.app","type":"text","marks":[{"type":"code"}]}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"For ","type":"text"},{"text":"mode: I","type":"text","marks":[{"type":"code"}]},{"text":" (instant) data pushes: ","type":"text"},{"text":"https://boxalino-di-instant-krceabfwya-ew.a.run.app","type":"text","marks":[{"type":"code"}]}]}]}]}]}],"version":1}

Browser not supported