Skip to end of banner
Go to start of banner

GCP Project Deployment

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

The purpose of the GCP Deployment Request is to allow our client`s Data Science team access Boxalino datasets, for the goal of running jupyter/notebook processes in the designed anaconda environments.

Environment Details

On a deployed application, the following stack is available:

  1. Python 3.7

  2. git

  3. Anaconda3

  4. pip / pip3

  5. setuptools

  6. pyarrow

  7. papermill

  8. jupyter

  9. google-api-python-client & google SDK libraries

Integration steps

  1. Make a GCP Project Deployment Request with the Required Information information. Shortly, Boxalino will provide the project.

  2. Your user email (as the requester) will be given the editor role.

  3. Prepare the Required Files (project structure) and load them in a GCS bucket from the project.

  4. Provide to Boxalino the information for the Application Launch

  5. The application is launched in a VM in the project. The commands from commands.txt are executed. Additionally, you can SSH on the VM and update/check content.

Further tools will be provided which you can use to update the code running in the VM.

Integration Access

Because the application is launched in the scope of the project, the following Google Cloud tools can be used:

  1. the Compute Engine - launch more applications

  2. the project`s BigQuery dataset (to store results, if required)

  3. the Google Cloud Storage (GCS) to load files & store logs

  4. the Cloud Scheduler to create events (for automatic runs)

Required Information

When contacting Boxalino with a GCP project deployment request, please provide the following information:

1

project name

as will appear in your project`s list

2

email

the requestor is the one managing the applications running on the project; this email will receive messages (alert and notifications) for when the project/ application is ready to be used

3

client name

(also known as the Boxalino account name) this is to ensure the access to the views, core & reports dataset is shared

4

labels

optional; the labels are used as project meta-information. see Labels

5

permissions

optional; by default, the requestor will have full access. see Permissions

Once the project is created (2-3 min), the requestor will have access to it, in their Google Cloud Console.

As an editor on the project, the requestor will be able to:

  • add new permissions from the IAM Admin panel

  • create GCS bucket

Required Files

1

instance.txt

properties for the VM machine (name, size, root path, etc) (see Instance)

2

env.yml

anaconda environment file

3

requirements.txt

environment requirements (for pip/anaconda install) (see Requirements)

4

commands.txt

a list of commands to be executed as part of your application run process

5

your jupyter files

the jupyter notebook

The #1-#4 files must be named as instructed. They are used as a base for the application run.

instance.txt, requirements.txt and commands.txt must end with an empty line (they are parsed dynamically)

Application Launch

Once the Required Files are uploaded in a GCS bucket from the project, contact Boxalino.
Provide the following information:

1

project name

2

GCS bucket path

the path where the Required Files are located (ex: gs://project-name/app); the contents will be made available on the application as well.

3

launch date

optional; projects can be scheduled for launch at a later day (tomorrow, etc)

BigQuery access

  1. As a data science, chances are that you have been provided with a Service Account (SA) to access the client`s private projects.

  2. The application is run by the project's own Compute Engine Service Account (CE SA).
    Because the project is in the scope of Boxalino, it will have direct read access to the client's datasets.

In order for the Application Launch to be able to access the client's own project dataset,
the CE SA will have to "impersonate" the SA in the client's project.

The output of the application can be stored directly in the deployed GCP project scope (BigQuery, GCS, etc)

  • No labels