Setting up a database and API server

This page describes the process of setting up your own database / storage for the models. This puts you in full control of who has access to the experiment data. For the moment, Studio only supports Firebase (https://firebase.google.com/) as a database backend, and firebase / google cloud storage (GCS) / Amazon S3 as storage backends.

Introduction

Firebase and Firebase Storage provide fairly simple rules to control use access to experiments. Additionally, GCS and S3 don’t work directly with Firebase authentication tokens, so one cannot create access rules for the storage. In order to provide more rigorous rules, we are employing an API server that proxies database requests and can provide arbitrarily complex access rules (at the moment the access rules are still very simple - anyone can read any experiment, and only user who created the experiment can delete / overwrite it). Also, API server should allow one to swap database backends (i.e. from Firebase to DynamoDB) completely seamlessly for the users, without even updating the users’ config files. Yet another reason to use the API server is that GCS and S3 are much cheaper than Firebase Storage for large amounts of data.

Generally, the outline of the API server / database / storage interaction is as follows:

  1. API server has read/write access to database and storage
  2. When getting/writing the data about experiment, user signs the HTTP request with firebase authentication token. The API server then validates that user indeed has permissions to do so, and either returns data about experiment (for /api/get_experiment method) or writes the expeirment data
  3. Artifacts are being read and written via communicating with storage directly using signed urls, generated by API server

The detailed instructions on setting up the API server (we’ll use google app engine, GAE, but these steps can be trivially adapted for heroku or just running API server on a dedicated instance)

Prerequisites

If deploying onto google app engine, you’ll need to have Google Cloud SDK installed (https://cloud.google.com/sdk/downloads)

In what follows, deployment machine means either the local machine (when deploying on GAE) or the instance on which you are planning to run the API server

Deploying the API server

  1. Create a new Firebase project: go to https://firebase.google.com, sign in, click add project, specify project name

  2. Enable authentication using google accounts (in the left-hand pane select Authentication, then tab “Sign-in method”, click on “Google”, select “Enabled”)

  3. Go to project settings (little cogwheel next to “Overview” on the left-hand pane), tab “General”

  4. Copy the Web API key and paste it in apiKey of the database section of studio/apiserver_config.yaml

  5. Copy the project ID and paste it in projectId of the database section of config yaml file.

  6. Go to Service Accounts tab and generate a new key for the firebase service account. This key is a json file that will give API server admin access to the database. Save it to the deployment machine.

  7. Modify other entries of the apiserver_config.yaml file to your specs (e.g. storage type and bucket)

  8. On the deployment machine in the folder studio/studio, run

    ./deploy_apiserver.sh gae
    

    for GAE and

    ./deploy_apiserver.sh local <port>
    

    when running on a dedicated instance (where port is the port on which the server will be listening). When prompted, input path to the firebase admin credentials json file generated in step 6.

Configuring studio to work with the API server

For clients to work with the API server, you’ll need to modify their config.yaml files as follows:

  1. Remove storage section
  2. In the database section, set type: http, serverUrl: <url of your deployed server>. When deploying to GAE, the url will have format https://<project_name>.appspot.com. When deploying on a dedicated instance, don’t forget to specify the port.