Content Comparison

The purpose of the GCP Deployment Request is to allow our client`s Data Science team access Boxalino datasets, for the goal of running jupyter/notebook processes in the designed anaconda environments.

Table of Contents

Environment Details

On a deployed application, the following stack is available:

Python 3.7
git
Anaconda3
pip / pip3
setuptools
papermill
jupyter
google-api-python-client & google SDK libraries

Integration steps

Make a GCP Project Deployment Request with the Required Information information. Shortly, Boxalino will provide the project.
Your user email (as the requester) will be given the editor role.
Prepare the Required Files (project structure) and load them in a GCS bucket from the project.
Prepare the content for the Application Launch.
Launch the application.

Tip
The application is launched in a VM in the project. The commands from commands.txt are executed. Additionally, you can SSH on the VM and update/check content.

Further tools will be provided which you can use to update the code running in the VM.

Integration Access

Because the application is launched in the scope of the project, the following Google Cloud tools can be used:

the Compute Engine - launch more applications https://console.cloud.google.com/compute/instances
the project`s BigQuery dataset (to store results, if required) https://console.cloud.google.com/bigquery
the Google Cloud Storage (GCS) to load files & store logs https://console.cloud.google.com/storage/browser
the Cloud Scheduler to create events (for automatic runs) https://console.cloud.google.com/cloudscheduler/start

BigQuery Datasets Access

Additionally, the project`s Compute Engine Default Service Account will have the following permissions on the client`s datasets in the Boxalino scope:

BigQuery Data Editor : <client>_lab, <client>_views
BigQuery Data Viewer : <client>_core, <client>_stage, <client>_reports, <client>_intelligence

Required Information

When contacting Boxalino with a GCP project deployment request, please provide the following information:

...

project name

...

as will appear in your project`s list
naming requirements: space, - and _ are allowed.

...

email

the requestor is the one managing the applications running on the project;

...

client name

...

(also known as the Boxalino account name) this is to ensure the access to the views, core & reports datasets (https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/303792129/GCP+Project+Deployment#BigQuery-Datasets-Access )

...

labels

...

optional; the labels are used as project meta-information. see Labels

...

permissions

...

optional; by default, the requestor will have full access and can further share with others. see Permissions

Once the project is created (2-3 min), the requestor will have access to it, in their Google Cloud Console.

Tip

As an editor on the project, the requestor will be able to:

access the project from Google Console https://console.cloud.google.com/
share access to the project with other members in the IAM Admin panel
create GCS buckets and load different applications, which can be triggered

Image Removed

Labels (optional)

Labels are key-value pairs meant to better organize the projects.

Lets imagine the scenario when the client has multiple Data Scientists working on different projects. Labeling them will allow better structure.

Code Block
team:data-science component:<main-application-name> environment:production

More information on labelsThe purpose of the GCP Deployment Request is to allow our client`s Data Science team access Boxalino datasets, for the goal of running jupyter/notebook processes in the designed anaconda environments.

Table of Contents

Environment Details

On a deployed application, the following stack is available:

Python 3.7
git
Anaconda3
pip / pip3
setuptools
papermill
jupyter
google-api-python-client & google SDK libraries

Integration steps

Make a GCP Project Deployment Request with the Required Information information. Shortly, Boxalino will provide the project.
Your user email (as the requester) will be given the editor role.
Prepare the Required Files (project structure) and load them in a GCS bucket from the project.
Prepare the content for the Application Launch.
Launch the application.

Tip
The application is launched in a VM in the project. The commands from commands.txt are executed. Additionally, you can SSH on the VM and update/check content.

Further tools will be provided which you can use to update the code running in the VM.

Integration Access

Because the application is launched in the scope of the project, the following Google Cloud tools can be used:

the Compute Engine - launch more applications https://console.cloud.google.com/compute/instances
the project`s BigQuery dataset (to store results, if required) https://console.cloud.google.com/bigquery
the Google Cloud Storage (GCS) to load files & store logs https://console.cloud.google.com/storage/browser
the Cloud Scheduler to create events (for automatic runs) https://console.cloud.google.com/cloudscheduler/start

BigQuery Datasets Access

Additionally, the project`s Compute Engine Default Service Account will have the following permissions on the client`s datasets in the Boxalino scope:

BigQuery Data Editor : <client>_lab, <client>_views
BigQuery Data Viewer : <client>_core, <client>_stage, <client>_reports, <client>_intelligence

Required Information

When contacting Boxalino with a GCP project deployment request, please provide the following information:

1	project name	as will appear in your project`s list naming requirements: space, - and _ are allowed.
2	email	the requestor is the one managing the applications running on the project; this email will receive messages (alert and notifications) for when the project is ready to be used; the email alerts for the VM / application run - is part of the instance.txt** file, specific for every application launch
3	client name	(also known as the Boxalino account name) this is to ensure the access to the views, core & reports datasets (https://boxalino.atlassian.net/wiki/spaces/BPKB/pages/303792129/GCP+Project+Deployment#BigQuery-Datasets-Access )
4	labels	optional; the labels are used as project meta-information. see Labels
5	permissions	optional; by default, the requestor will have full access and can further share with others. see Permissions

Once the project is created (2-3 min), the requestor will have access to it, in their Google Cloud Console.

Tip

As an editor on the project, the requestor will be able to:

access the project from Google Console https://console.cloud.google.com/
share access to the project with other members in the IAM Admin panel
create GCS buckets and load different applications, which can be triggered

Image Added

Labels (optional)

Labels are key-value pairs meant to better organize the projects.

Lets imagine the scenario when the client has multiple Data Scientists working on different projects. Labeling them will allow better structure.

Code Block
team:data-science component:<main-application-name> environment:production

More information on labels: https://cloud.google.com/resource-manager/docs/creating-managing-labels

Permissions (optional)

The permissions are added when the project is created.

By default, the requestor`s email has the project editor role
Once the project is released, the requestor can add more emails / users to the IAM policies of the project.

Here is a sample of provided permissions (optional):

Code Block

user:dana@boxalino.com:roles/editor
user:dana@boxalino.com:roles/resourcemanager.projectIamAdmin
user:dana@boxalino.com:roles/compute.osLogin
user:dana@boxalino.com:roles/compute.osAdminLogin
user:dana@boxalino.com:roles/bigquery.dataOwner
serviceAccount:service-account-from-other-projects:roles/iam.serviceAccountUser
serviceAccount:service-account-from-other-projects:roles/bigquery.dataOwner
serviceAccount:service-account-from-other-projects:roles/bigquery.dataEditor

More information on permissions: https://cloud.google.com/resource-manageriam/docs/creating-managing-labels

Permissions (optional)

The permissions are added when the project is created.

By default, the requestor`s email has the project editor role
Once the project is released, the requestor can add more emails / users to the IAM policies of the project.

Here is a sample of provided permissions (optional):

Code Block

user:dana@boxalino.com:roles/editor
user:dana@boxalino.com:roles/resourcemanager.projectIamAdmin
user:dana@boxalino.com:roles/compute.osLogin
user:dana@boxalino.com:roles/compute.osAdminLogin
user:dana@boxalino.com:roles/bigquery.dataOwner
serviceAccount:service-account-from-other-projects:roles/iam.serviceAccountUser
serviceAccount:service-account-from-other-projects:roles/bigquery.dataOwner
serviceAccount:service-account-from-other-projects:roles/bigquery.dataEditor

More information on permissions: https://cloud.google.com/iam/docs/understanding-roles

Required Files

...

instance.txt

...

properties for the VM machine (name, size, root path, etc) (see instance.txt)

...

requirements.txt

...

environment requirements (for pip/anaconda install) (see requirements.txt)

pip freeze > requirements.txt - command to create the file from a tested environment

...

commands.txt

...

env.yml

...

(optional) anaconda environment file;
if no file is provided - the environment is not created

...

your jupyter/python/application files

...

the content of your application (in python, jupyter notebooks, etc)

Note
The #1-#4 files must be named as instructed. They are used as a base for the application start-up.

Note
instance.txt, requirements.txt and commands.txt must end with an empty line (they are parsed dynamically)

Info

These files (and other required structures) must be uploaded in a GCS bucket.

The GCS bucket name is provided for the Application Launch Request

instance.txt

...

Property

...

Default

...

Required

...

Description

...

instance-name

...

project name

...

yes

...

the instance name is the VM name as appears in the Compute Engine view

...

machine-type

...

e2-micro

...

yes

...

the value depends on what the application needs: more CPU or more RAM? for options, please check the Google Cloud documentation

...

email-to

...

yes

...

the email is used once to receive an alert for when the VM is ready.

...

home

...

\/home\/project-name

...

no

...

image-family

...

ubuntu-2004-lts

...

no

...

boot-disk-size

...

30

...

no

...

zone

...

europe-west1-b

...

no

...

this property can be left empty;

Note
use a zone which is in Europe.

Code Block
instance-name:application-name machine-type:e2-micro email-to:data-science-guru@boxalino-client.com home:\/home\/project-name image-family:ubuntu-2004-lts boot-disk-size:30 zone:europe-west1-b

requirements.txt

This is a typical requirements.txt file for a jupyter/python application.

Info
If your setup has been tested locally, you can save all the packages in the file with the command `pip freeze > requirements.txt.` Keep in mind that in this case, requirements.txt file will list all packages that have been installed in virtual environment, regardless of where they came from

They will be installed on the conda environment.

...

understanding-roles

Required Files

1	instance.txt	properties for the VM machine (name, size, root path, etc) (see instance.txt)
2	requirements.txt	environment requirements (for pip/anaconda install) (see requirements.txt) `pip freeze > requirements.txt` - command to create the file from a tested environment
3	commands.txt	a list of commands to be executed as part of your application run process (see comands.txt) *can be left empty as well (if you chose to SSH on the VM and run your own processes from the project`s scope)
4	env.yml	(optional) anaconda environment file; if no file is provided - the environment is not created
5	your jupyter/python/application files	the content of your application (in python, jupyter notebooks, etc)

Note
The #1-#4 files must be named as instructed. They are used as a base for the application start-up.

Note
instance.txt, requirements.txt and commands.txt must end with an empty line (they are parsed dynamically)

Info

These files (and other required structures) must be uploaded in a GCS bucket.

The GCS bucket name is provided for the Application Launch Request

instance.txt

Property

Default

Required

Description

1

instance-name

project name

yes

the instance name is the VM name as appears in the Compute Engine view

2

machine-type

e2-micro

yes

the value depends on what the application needs: more CPU or more RAM? for options, please check the Google Cloud documentation

3

email-to

yes

the email is used once to receive an alert for when the VM is ready.

4

home

\/home\/project-name

no

the path on the server where the content of the GCS bucket is uploaded;
this is also used for the commands from the commands.txt file in order to launch/trigger your application execution.

alternatives: \/home\/<your-gcs-bucket> , \/srv\/app

_{when you SSH in the machine (ex: your email is}_{data-science-guru@boxalino-client.com}_{) , the VM creates a directory /home/data-science-guru (this is default for any server) so this is your local path;}

5

image-family

ubuntu-2004-lts

no

6

boot-disk-size

30

no

7

zone

europe-west1-b

no

this property can be left empty;

Note
use a zone which is in Europe.

Code Block
instance-name:application-name machine-type:e2-micro email-to:data-science-guru@boxalino-client.com home:\/home\/project-name image-family:ubuntu-2004-lts boot-disk-size:30 zone:europe-west1-b

requirements.txt

This is a typical requirements.txt file for a jupyter/python application.

Info
If your setup has been tested locally, you can save all the packages in the file with the command `pip freeze > requirements.txt.` Keep in mind that in this case, requirements.txt file will list all packages that have been installed in virtual environment, regardless of where they came from

They will be installed on the conda environment.

Code Block
google-api-core==1.20.0 google-api-python-client==1.9.3 google-auth==1.17.2 google-auth-httplib2==0.0.3 [..] google-cloud-core==1.3.0 google-cloud-storage==1.29.0 google-pasta==0.2.0

commands.txt

You can include the commands to run the jupyter process on application launch.

You can also include other commands same as the application has been tested on the local environment.

Code Block
chmod -R 777 <home value from instance.txt>/* papermill <home value from instance.txt>/process.ipynb <home value from instance.txt>/process-output.ipynb

env.yml

The env.yml file is used in the setup step in order to create the anaconda environment.

Code Block

name: gcp-application-name
channels:
  - defaults
dependencies:
  - ca-certificates=2020.1.1=0
  - <a list of dependencies>
  - pip:
      - google-api-core==1.20.022.2
      - google-api-python-client==1.9.3
google-auth==1.17.2
google-auth-httplib2==0.0.3
[..]
google-cloud-core==1.3.0
google-cloud-storage==1.29.0
google-pasta==0.2.0

commands.txt

You can include the commands to run the jupyter process on application launch.

You can also include other commands same as the application has been tested on the local environment.

Code Block
chmod -R 777 <home value from instance.txt>/* papermill <home value from instance.txt>/process.ipynb <home value from instance.txt>/process-output.ipynb

env.yml

The env.yml file is used in the setup step in order to create the anaconda environment.

Code Block

name: gcp-application-name
channels:
  - defaults
dependencies:
  - ca-certificates=2020.1.1=0
  - <a list of dependencies>
  - pip:
      - google-api-core==1.22.2
      - google-api-python-client==1.9.3
      - google-auth==1.17.2
      - <more-libraries required for the application>
prefix: /opt/conda/envs/gcp-application-env

Application Launch

Note
Before launching the application, make sure that the Required Files are uploaded in a GCS bucket.

To launch the application, complete the form in the Application Launch service https://gcp-deploy-du3do2ydza-ew.a.run.app/application.

...

project ID

the project ID is unique;

...

.3
      - google-auth==1.17.2
      - <more-libraries required for the application>
prefix: /opt/conda/envs/gcp-application-env

Application Launch

Note
Before launching the application, make sure that the Required Files are uploaded in a GCS bucket.

To launch the application, complete the form in the Application Launch service https://gcp-deploy-du3do2ydza-ew.a.run.app/application.

Provide the following information:

1

project ID

the project ID is unique;

the project ID is diplayed on the dashboard of your project https://console.cloud.google.com/home/dashboard

Image Added

2

GCS bucket name

the bucket name where the Required Files are located (ex: gs://<project-name>-<app-name>);

the contents will be made available on the application as well.

Note
the bucket must be located in EUROPE; either use EU(multi-region) or europe-west-1 (singural region)

Note
the bucket name must be unique, for this purpose - we recommend that every bucket-name starts with your project name.

3

access code

as provided by Boxalino

Tip
Once the application has been launched, a script will initialize the environment and load all your content from the GCS bucket.

Checking out the application state

1. SSH on the application VM

Go to your project Compute Engine page: https://console.cloud.google.com/compute/instances

You can further log/ SSH on the virtual machine and check out the output or inspect the contents.

...

If you want to track the process live, as it happens, you can run the following command:

Code Block
tail -f -n 2000 /var/log/syslog

2. Monitoring tools from GCP

Go to your project Compute Engine page (https://console.cloud.google.com/

...

GCS bucket name

...

the bucket name where the Required Files are located (ex: gs://<project-name>-<app-name>);

the contents will be made available on the application as well.

Note
the bucket must be located in EUROPE; either use EU(multi-region) or europe-west-1 (singural region)

Note
the bucket name must be unique, for this purpose - we recommend that every bucket-name starts with your project name.

...

access code

...

as provided by Boxalino

Tip
Once the application has been launched, a script will initialize the environment and load all your content from the GCS bucket.

Checking out the application state

1. SSH on the application VM

Go to your project Compute Engine page: compute/instances) and click on the running application instance. Switch to the Monitoring tab.

In the MONITORING view of your Application, you are able to track the resources available & consumed:

CPU utilization
Memory Utilization
Disk Space Utilization

...

3. GCP Logs Explorer

Google Cloud Platform provides a series of tools for monitoring the project resources.

One of these tools is the Logging Client.

Go to your Logs Explorer https://console.cloud.google.com/compute/instances

You can further log/ SSH on the virtual machine and check out the output or inspect the contents.

...

If you want to track the process live, as it happens, you can run the following command:

Code Block
tail -f -n 2000 /var/log/syslog

2. Monitoring tools from GCP

Go to your project Compute Engine page (https://console.cloud.google.com/compute/instances) and click on the running application instance. Switch to the Monitoring tab.

In the MONITORING view of your Application, you are able to track the resources available & consumed:

CPU utilization
Memory Utilization
Disk Space Utilization

...

3. GCP Logs Explorer

Google Cloud Platform provides a series of tools for monitoring the project resources.

One of these tools is the Logging Client.

Go to your Logs Explorer https://console.cloud.google.com/logs/query? and review the application logs.

...

.google.com/logs/query? and review the application logs.

...

Updating the application

Restart the application

Upon application restart (stop and start) - the GCS bucket content is being re-syncrhonized:

application files are syncrhonized with the content of the GCS bucket (from instance.txt)
requirements.txt updated
commands.txt is run

Note
Restarting the application will not update your anaconda environment.

Note
Restarting the application will not update the instance properties (the ones defined in instance.txt) - unless you manually edit them.

Resyncrhonize the bucket

If you do not want to delete & re-run the application launch, you can re-syncrhonize the content of the GCS bucket with your application home directory:

Code Block
sudo gsutil rsync -r gs://<BUCKET>/ <APPLICATION-PATH>

Note

Replace <BUCKET> with your storage bucket name (where the application files have been loaded).

Replace <APPLICATION-PATH> with the path to your application (default: /home/project-name).

Application Delete

If you want to stop the application, you can freely delete it from your Compute Engine view, or use the form provided by Boxalino https://gcp-deploy-du3do2ydza-ew.a.run.app/instance

...

Version	Old Version 25	New Version 26
Changes made by	Dana Negrescu	Dana Negrescu
Saved on	Feb 08, 2021	Feb 08, 2021

Versions Compared

Key

Environment Details

Integration steps

Integration Access

BigQuery Datasets Access

Required Information

Labels (optional)

Environment Details

Integration steps

Integration Access

BigQuery Datasets Access

Required Information

Labels (optional)

Permissions (optional)

Permissions (optional)

Required Files

instance.txt

requirements.txt

Required Files

instance.txt

requirements.txt

commands.txt

env.yml

commands.txt

env.yml

Application Launch

Application Launch

Checking out the application state

1. SSH on the application VM

2. Monitoring tools from GCP

Checking out the application state

1. SSH on the application VM

3. GCP Logs Explorer

2. Monitoring tools from GCP

3. GCP Logs Explorer

Updating the application

Restart the application

Resyncrhonize the bucket

Application Delete