Lomas-server: CLI administration

This notebook showcases how data owner could set up the server, add make their data available to certain users. It explains the different steps required.

Start the server

Create a docker volume

The first step is to create a docker volume for mongodb, which will hold all the “admin” data of the server. Docker volumes are persistent storage spaces that are managed by docker and can be mounted in containers. To create the volume use docker volume create mongodata. This must be done only once, and we use bind mounts for the server, so no need to create volumes for that.

In a terminal run: docker volume create mongodata. In output you should see mongodata written.

Start server

The second step is to start the server. Therefore the config file configs/example_config.yaml has to be adapted. The data owner must make sure to set the develop mode to False, specify the database type and ports. For this notebook, we will keep the default and use a mongodb on port 27017. Note: Keep in mind that if the configuration file is modified then the docker-compose has to be modified accordingly. This is out of scope for this notebook.

In a terminal run docker compose up. This will start the server and the mongodb, each running in its own Docker container. In addition, it will also start a client session container for demonstration purposes and a streamlit container, more on that later.

To check that all containers are indeed running, run docker ps. You should be able to see a container for the server (lomas_server_dev), for the client (lomas_client_dev), for the streamlit (lomas_streamlit_dev) and one for the mongo database (mongodb).

Access the server to administrate the mongoDB

To interact with the mongoDB, we first need to access the server Docker container from where we will run the commands. To do that from inside this Jupyter Notebook, we will need to use the Docker client library. Let’s first install it.

[ ]:
!pip install docker

We can now import the library, create the client allowing us to interact with Docker, and finally, access the server container.

[1]:
import docker
client = docker.DockerClient()
server_container = client.containers.get("lomas_server_dev")

To execute commands inside that Docker container, you can use the exec_run method which will return an ExecResult object, from which you can retrieve the output of the command. Let’s see in the following example:

[2]:
response = server_container.exec_run("ls")
print(response.output.decode('utf-8'))
__init__.py
__pycache__
admin_database
administration
app.py
constants.py
dataset_store
dp_queries
mongodb_admin.py
mongodb_admin_cli.py
private_dataset
tests
utils
uvicorn_serve.py

Now, you are ready to interact with the database and add users.

Prepare the database

Visualise all options

You can visualise all the options offered by the database by running the command python mongodb_admin_cli.py --help. We will go through through each of them in the rest of the notebook.

We prepare the function run_command to have a cleaner output of the commands in the notebook.

[3]:
from ast import literal_eval

def run(command, to_dict=False):
    response = server_container.exec_run(command)
    output = response.output.decode('utf-8').replace("'", '"')
    if "] -" in output:
        output = output.split("] -")[1].strip()
    if to_dict:
        if len(output):
            output = literal_eval(output)
            return output
    return print(output)
[4]:
run("python mongodb_admin_cli.py --help")
usage: mongodb_admin_cli.py [-h]
                            {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,get_user,add_users_via_yaml,get_archives,get_users,get_user_datasets,add_dataset,add_datasets_via_yaml,del_dataset,get_dataset,get_metadata,get_datasets,drop_collection,get_collection}
                            ...

MongoDB administration script for the database

options:
  -h, --help            show this help message and exit

subcommands:
  {add_user,add_user_with_budget,del_user,add_dataset_to_user,del_dataset_to_user,set_budget_field,set_may_query,get_user,add_users_via_yaml,get_archives,get_users,get_user_datasets,add_dataset,add_datasets_via_yaml,del_dataset,get_dataset,get_metadata,get_datasets,drop_collection,get_collection}
                        user database administration operations
    add_user            add user to users collection
    add_user_with_budget
                        add user with budget to users collection
    del_user            delete user from users collection
    add_dataset_to_user
                        add dataset with initialized budget values for a user
    del_dataset_to_user
                        delete dataset for user in users collection
    set_budget_field    set budget field to given value for given user and
                        dataset
    set_may_query       set may query field to given value for given user
    get_user           show all metadata of user
    add_users_via_yaml  create users collection from yaml file
    get_archives       show all previous queries from a user
    get_users           get the list of all users in "users" collection
    get_user_datasets   get the list of all datasets from a user
    add_dataset         set in which database the dataset is stored
    add_datasets_via_yaml
                        create dataset to database type collection
    del_dataset         delete dataset and metadata from datasets and metadata
                        collection
    get_dataset        show a dataset from the dataset collection
    get_metadata       show metadata from the metadata collection
    get_datasets        get the list of all datasets in "datasets" collection
    drop_collection     delete collection from database
    get_collection     print a collection

And finally, let’s delete all existing data from database to start clean:

[5]:
run("python mongodb_admin_cli.py drop_collection --collection datasets")
run("python mongodb_admin_cli.py drop_collection --collection metadata")
run("python mongodb_admin_cli.py drop_collection --collection users")
Deleted collection datasets.
Deleted collection metadata.
Deleted collection users.

Datasets (add and drop)

We first need to set the dataset meta-information. For each dataset, 2 informations are required: - the type of database in which the dataset is stored - a path to the metadata of the dataset (stored as a yaml file).

To later perform query on the dataset, metadata are required. In this secure server the metadata information is expected to be in the same format as SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). It is also expected to be in a yaml file.

These information (dataset name, dataset type and metadata path) are stored in the datasets collection. Then for each dataset, its metadata is fetched from its yaml file and stored in a collection named metadata.

We then check that there is indeed no data in the dataset and metadata collections yet:

[6]:
run("python mongodb_admin_cli.py get_collection --collection datasets")
[]
[7]:
run("python mongodb_admin_cli.py get_collection --collection metadata")
[]

We can add one dataset with its name, database type and path to medata file:

[8]:
run("python mongodb_admin_cli.py add_dataset -d IRIS -db PATH_DB -d_path https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv -m_db PATH_DB -mp ../data/collections/metadata/iris_metadata.yaml")
Added dataset IRIS with database PATH_DB and associated metadata.

We can now see the dataset and metadata collection with the Iris dataset:

[9]:
run("python mongodb_admin_cli.py get_collection --collection datasets", to_dict=True)
[9]:
[{'dataset_name': 'IRIS',
  'database_type': 'PATH_DB',
  'dataset_path': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv',
  'metadata': {'database_type': 'PATH_DB',
   'metadata_path': '../data/collections/metadata/iris_metadata.yaml'}}]
[10]:
run("python mongodb_admin_cli.py get_collection --collection metadata", to_dict=True)
[10]:
[{'IRIS': {'max_ids': 1,
   'row_privacy': True,
   'columns': {'petal_length': {'type': 'float', 'lower': 0.5, 'upper': 10.0},
    'petal_width': {'type': 'float', 'lower': 0.05, 'upper': 5.0},
    'sepal_length': {'type': 'float', 'lower': 2.0, 'upper': 10.0},
    'sepal_width': {'type': 'float', 'lower': 1.0, 'upper': 6.0},
    'species': {'type': 'string',
     'cardinality': 3,
     'categories': ['setosa', 'versicolor', 'virginica']}}}}]

Or a path to a yaml file which contains all these informations to do multiple datasets in one command:

[11]:
run("python mongodb_admin_cli.py add_datasets_via_yaml -yf ../data/collections/dataset_collection.yaml -c")
Cleaning done.

2024-06-05 09:59:46,703 - INFO -                 [mongodb_admin.py:710 - add_datasets_via_yaml()

The argument -c or –clean allow you to clear the current dataset collection before adding your collection.

By default, add_datasets will only add new dataset found from the collection provided.

[12]:
run("python mongodb_admin_cli.py add_datasets_via_yaml -yf ../data/collections/dataset_collection.yaml")
Metadata already exist. Use the command -om to overwrite with new values.
2024-06-05 09:59:48,726 - INFO -                 [mongodb_admin.py:755 - add_datasets_via_yaml()

Arguments :

-od / –overwrite_datasets : Overwrite the values for exisiting datasets with the values provided in the yaml.

-om / –overwrite_metadata : Overwrite the values for exisiting metadata with the values provided in the yaml.

[13]:
# Add new datasets/metadata, update existing datasets
run("python mongodb_admin_cli.py add_datasets_via_yaml -yf ../data/collections/dataset_collection.yaml -od")
Existing datasets updated with new collection
2024-06-05 09:59:50,917 - INFO -                 [mongodb_admin.py:755 - add_datasets_via_yaml()
[14]:
# Add new datasets/metadata, update existing metadata
run("python mongodb_admin_cli.py add_datasets_via_yaml -yf ../data/collections/dataset_collection.yaml -om")
Metadata updated for dataset : IRIS.
2024-06-05 09:59:52,741 - INFO -                 [mongodb_admin.py:749 - add_datasets_via_yaml()
[15]:
# Add new datasets/metadata, update existing datasets & metadata
run("python mongodb_admin_cli.py add_datasets_via_yaml -yf ../data/collections/dataset_collection.yaml -od -om")
Existing datasets updated with new collection
2024-06-05 09:59:54,418 - INFO -                 [mongodb_admin.py:749 - add_datasets_via_yaml()

Let’s see all the dataset collection:

[16]:
run("python mongodb_admin_cli.py get_collection --collection datasets", to_dict=True)
[16]:
[{'dataset_name': 'IRIS',
  'database_type': 'PATH_DB',
  'metadata': {'database_type': 'PATH_DB',
   'metadata_path': '../data/collections/metadata/iris_metadata.yaml'},
  'dataset_path': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'},
 {'dataset_name': 'PENGUIN',
  'database_type': 'PATH_DB',
  'metadata': {'database_type': 'PATH_DB',
   'metadata_path': '../data/collections/metadata/penguin_metadata.yaml'},
  'dataset_path': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv'},
 {'dataset_name': 'TITANIC',
  'database_type': 'S3_DB',
  'metadata': {'database_type': 'S3_DB',
   'bucket': 'example',
   'key': 'metadata/titanic_metadata.yaml',
   'endpoint_url': 'https://api-lomas-minio.lab.sspcloud.fr',
   'aws_access_key_id': 'admin',
   'aws_secret_access_key': 'admin123'},
  'bucket': 'example',
  'key': 'data/titanic.csv',
  'endpoint_url': 'https://api-lomas-minio.lab.sspcloud.fr',
  'aws_access_key_id': 'admin',
  'aws_secret_access_key': 'admin123'},
 {'dataset_name': 'FSO_INCOME_SYNTHETIC',
  'database_type': 'PATH_DB',
  'metadata': {'database_type': 'PATH_DB',
   'metadata_path': '../data/collections/metadata/fso_income_synthetic_metadata.yaml'},
  'dataset_path': '../data/datasets/income_synthetic_data.csv'}]

Finally let’s have a look at the stored metadata:

[17]:
run("python mongodb_admin_cli.py get_collection --collection metadata", to_dict=True)
[17]:
[{'IRIS': {'max_ids': 1,
   'row_privacy': True,
   'columns': {'petal_length': {'type': 'float', 'lower': 0.5, 'upper': 10.0},
    'petal_width': {'type': 'float', 'lower': 0.05, 'upper': 5.0},
    'sepal_length': {'type': 'float', 'lower': 2.0, 'upper': 10.0},
    'sepal_width': {'type': 'float', 'lower': 1.0, 'upper': 6.0},
    'species': {'type': 'string',
     'cardinality': 3,
     'categories': ['setosa', 'versicolor', 'virginica']}}}},
 {'PENGUIN': {'max_ids': 1,
   'row_privacy': True,
   'censor_dims': False,
   'columns': {'species': {'type': 'string',
     'cardinality': 3,
     'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
    'island': {'type': 'string',
     'cardinality': 3,
     'categories': ['Torgersen', 'Biscoe', 'Dream']},
    'bill_length_mm': {'type': 'float', 'lower': 30.0, 'upper': 65.0},
    'bill_depth_mm': {'type': 'float', 'lower': 13.0, 'upper': 23.0},
    'flipper_length_mm': {'type': 'float', 'lower': 150.0, 'upper': 250.0},
    'body_mass_g': {'type': 'float', 'lower': 2000.0, 'upper': 7000.0},
    'sex': {'type': 'string',
     'cardinality': 2,
     'categories': ['MALE', 'FEMALE']}}}},
 {'TITANIC': {'': {'Schema': {'Table': {'max_ids': 1,
      'PassengerId': {'type': 'int', 'lower': 1},
      'Pclass': {'type': 'int', 'lower': 1, 'upper': 3},
      'Name': {'type': 'string'},
      'Sex': {'type': 'string',
       'cardinality': 2,
       'categories': ['male', 'female']},
      'Age': {'type': 'float', 'lower': 0.1, 'upper': 100.0},
      'SibSp': {'type': 'int', 'lower': 0},
      'Parch': {'type': 'int', 'lower': 0},
      'Ticket': {'type': 'string'},
      'Fare': {'type': 'float', 'lower': 0.0},
      'Cabin': {'type': 'string'},
      'Embarked': {'type': 'string',
       'cardinality': 3,
       'categories': ['C', 'Q', 'S']},
      'Survived': {'type': 'boolean'},
      'row_privacy': True}}},
   'engine': 'csv'}},
 {'FSO_INCOME_SYNTHETIC': {'max_ids': 1,
   'columns': {'region': {'type': 'int'},
    'eco_branch': {'type': 'int'},
    'profession': {'type': 'int'},
    'education': {'type': 'int'},
    'age': {'type': 'int'},
    'sex': {'type': 'int'},
    'income': {'type': 'float', 'lower': 1000, 'upper': 100000}}}}]

If we are interested in a specific dataset, we can also show its collection:

[18]:
run("python mongodb_admin_cli.py get_dataset --dataset IRIS", to_dict=True)
[18]:
{'dataset_name': 'IRIS',
 'database_type': 'PATH_DB',
 'metadata': {'database_type': 'PATH_DB',
  'metadata_path': '../data/collections/metadata/iris_metadata.yaml'},
 'dataset_path': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'}

And its associated metadata:

[19]:
run("python mongodb_admin_cli.py get_metadata --dataset IRIS", to_dict=True)
[19]:
{'max_ids': 1,
 'row_privacy': True,
 'columns': {'petal_length': {'type': 'float', 'lower': 0.5, 'upper': 10.0},
  'petal_width': {'type': 'float', 'lower': 0.05, 'upper': 5.0},
  'sepal_length': {'type': 'float', 'lower': 2.0, 'upper': 10.0},
  'sepal_width': {'type': 'float', 'lower': 1.0, 'upper': 6.0},
  'species': {'type': 'string',
   'cardinality': 3,
   'categories': ['setosa', 'versicolor', 'virginica']}}}

We can also get list of all datasets in the ‘datasets’ collection:

[20]:
run("python mongodb_admin_cli.py get_datasets", to_dict=True)
[20]:
['IRIS', 'PENGUIN', 'TITANIC', 'FSO_INCOME_SYNTHETIC']

Users

Add user

Let’s see which users are alreay loaded:

[21]:
run("python mongodb_admin_cli.py get_collection --collection users")
[]

And now let’s add few users.

[22]:
run("python mongodb_admin_cli.py add_user_with_budget --user 'Mrs. Daisy' --dataset 'IRIS' --epsilon 10.0 --delta 0.001")
Added access to user Mrs. Daisy with dataset IRIS, budget epsilon 10.0 and delta 0.001.
[23]:
run("python mongodb_admin_cli.py add_user_with_budget --user 'Mr. Coldheart' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")
Added access to user Mr. Coldheart with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.
[24]:
run("python mongodb_admin_cli.py add_user_with_budget --user 'Lord McFreeze' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")
Added access to user Lord McFreeze with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.

Users must all have different names, otherwise you will have an error and nothing will be done:

[25]:
run("python mongodb_admin_cli.py add_user_with_budget --user 'Lord McFreeze' --dataset 'IRIS' --epsilon 10.0 --delta 0.001")
Traceback (most recent call last):
  File "/code/mongodb_admin_cli.py", line 461, in <module>
    function_map[args.func.__name__](args)
  File "/code/mongodb_admin_cli.py", line 396, in <lambda>
    "add_user_with_budget": lambda args: add_user_with_budget(
                                         ^^^^^^^^^^^^^^^^^^^^^
  File "/code/mongodb_admin.py", line 50, in wrapper_decorator
    raise ValueError(
ValueError: User Lord McFreeze already exists in user collection

If you want to add another dataset access to an existing user, just use the function add_dataset_to_user command.

[26]:
run("python mongodb_admin_cli.py add_dataset_to_user --user 'Lord McFreeze' --dataset 'IRIS' --epsilon 5.0 --delta 0.005")
Added access to dataset IRIS to user Lord McFreeze with budget epsilon 5.0 and delta 0.005.

Alternatively, you can create a user without assigned dataset and then add dataset in another command.

[27]:
run("python mongodb_admin_cli.py add_user --user 'Madame Frostina'")
Added user Madame Frostina.

Let’s see the default parameters after the user creation:

[28]:
run("python mongodb_admin_cli.py get_user --user 'Madame Frostina'", to_dict=True)
[28]:
{'user_name': 'Madame Frostina', 'may_query': True, 'datasets_list': []}

Let’s give her access to a dataset with a budget:

[29]:
run("python mongodb_admin_cli.py add_dataset_to_user --user 'Madame Frostina' --dataset 'IRIS' --epsilon 5.0 --delta 0.005")
Added access to dataset IRIS to user Madame Frostina with budget epsilon 5.0 and delta 0.005.
[30]:
run("python mongodb_admin_cli.py add_dataset_to_user --user 'Madame Frostina' --dataset 'PENGUIN' --epsilon 5.0 --delta 0.005")
Added access to dataset PENGUIN to user Madame Frostina with budget epsilon 5.0 and delta 0.005.

Now let’s see the user Madame Frostina details to check all is in order:

[31]:
run("python mongodb_admin_cli.py get_user --user 'Madame Frostina'", to_dict=True)
[31]:
{'user_name': 'Madame Frostina',
 'may_query': True,
 'datasets_list': [{'dataset_name': 'IRIS',
   'initial_epsilon': 5.0,
   'initial_delta': 0.005,
   'total_spent_epsilon': 0.0,
   'total_spent_delta': 0.0},
  {'dataset_name': 'PENGUIN',
   'initial_epsilon': 5.0,
   'initial_delta': 0.005,
   'total_spent_epsilon': 0.0,
   'total_spent_delta': 0.0}]}

And we can also modify existing the total budget of a user:

[32]:
run("python mongodb_admin_cli.py add_user_with_budget --user 'Dr. Antartica' --dataset 'PENGUIN' --epsilon 10.0 --delta 0.001")
Added access to user Dr. Antartica with dataset PENGUIN, budget epsilon 10.0 and delta 0.001.
[33]:
run("python mongodb_admin_cli.py set_budget_field --user 'Dr. Antartica' --dataset 'PENGUIN' --field initial_epsilon --value 20.0")
Set budget of Dr. Antartica for dataset PENGUIN of initial_epsilon to 20.0.

Let’s see the current state of the database:

[34]:
run("python mongodb_admin_cli.py get_collection --collection users", to_dict=True)
[34]:
[{'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Mr. Coldheart',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Lord McFreeze',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Madame Frostina',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 20.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]}]

Do not hesitate to re-run this command after every other command to ensure that everything runs as expected.

Remove user

You have just heard that the penguin named Coldheart might have malicious intentions and decide to remove his access until an investigation has been carried out. To ensure that he is not allowed to do any more queries, run the following command:

[35]:
run("python mongodb_admin_cli.py set_may_query --user 'Mr. Coldheart' --value False")
Set user Mr. Coldheart may query to False.

Now, he won’t be able to do any query (unless you re-run the query with –value True).

A few days have passed and the investigation reveals that he was aiming to do unethical research, you can remove his dataset by doing:

[36]:
run("python mongodb_admin_cli.py del_dataset_to_user --user 'Mr. Coldheart' --dataset 'PENGUIN'")
Remove access to dataset PENGUIN from user Mr. Coldheart.

Or delete him completely from the codebase:

[37]:
run("python mongodb_admin_cli.py del_user --user 'Mr. Coldheart'")
Deleted user Mr. Coldheart.

Let’s see the resulting users:

[38]:
run("python mongodb_admin_cli.py get_collection --collection users", to_dict=True)
[38]:
[{'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Lord McFreeze',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Madame Frostina',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 20.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]}]

Change budget

You also change your mind about the budget allowed to Lord McFreeze and give him a bit more on the penguin dataset.

[39]:
run("python mongodb_admin_cli.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' --field initial_epsilon --value 15.0")
Set budget of Lord McFreeze for dataset PENGUIN of initial_epsilon to 15.0.
[40]:
run("python mongodb_admin_cli.py set_budget_field --user 'Lord McFreeze' --dataset 'PENGUIN' --field initial_delta --value 0.005")
Set budget of Lord McFreeze for dataset PENGUIN of initial_delta to 0.005.

Let’s check all our changes by looking at the state of the database:

[41]:
run("python mongodb_admin_cli.py get_collection --collection users", to_dict=True)
[41]:
[{'user_name': 'Mrs. Daisy',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Lord McFreeze',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 15.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Madame Frostina',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 5.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 20.0,
    'initial_delta': 0.001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]}]

Finally all can be loaded fom a file direcly

Let’s delete the existing user collection first:

[42]:
run("python mongodb_admin_cli.py drop_collection --collection users")
Deleted collection users.

Is is now empty:

[43]:
run("python mongodb_admin_cli.py get_collection --collection users")
[]

We add the data based on a yaml file:

[44]:
run("python mongodb_admin_cli.py add_users_via_yaml -yf ../data/collections/user_collection.yaml")
Added user data from yaml.

By default, add_users_via_yaml will only add new users to the database.

[45]:
run("python mongodb_admin_cli.py add_users_via_yaml -yf ../data/collections/user_collection.yaml")
No new users added, they already exist in the server

If you want to clean the current users collection and replace it, you can use the argument –clean.

[46]:
run("python mongodb_admin_cli.py add_users_via_yaml -yf ../data/collections/user_collection.yaml --clean")
Cleaning done.

2024-06-05 10:00:45,678 - INFO -                 [mongodb_admin.py:464 - add_users_via_yaml()

If you want to add new users and update the existing ones in your collection, you can use the argument –overwrite. This will make sure to add new users if they do not exist and replace values from existing users with the collection provided.

[47]:
run("python mongodb_admin_cli.py add_users_via_yaml -yf ../data/collections/user_collection.yaml --overwrite")
Existing users updated.
2024-06-05 10:00:47,300 - INFO -                 [mongodb_admin.py:466 - add_users_via_yaml()

And let’s see the resulting collection:

[48]:
run("python mongodb_admin_cli.py get_collection --collection users", to_dict=True)
[48]:
[{'user_name': 'Alice',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0001,
    'total_spent_epsilon': 1.0,
    'total_spent_delta': 1e-06},
   {'dataset_name': 'PENGUIN',
    'initial_epsilon': 5.0,
    'initial_delta': 0.0005,
    'total_spent_epsilon': 0.2,
    'total_spent_delta': 1e-07}]},
 {'user_name': 'Dr. Antartica',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'PENGUIN',
    'initial_epsilon': 10.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Dr. FSO',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'FSO_INCOME_SYNTHETIC',
    'initial_epsilon': 45.0,
    'initial_delta': 0.005,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Bob',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'IRIS',
    'initial_epsilon': 10.0,
    'initial_delta': 0.0001,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]},
 {'user_name': 'Jack',
  'may_query': True,
  'datasets_list': [{'dataset_name': 'TITANIC',
    'initial_epsilon': 45.0,
    'initial_delta': 0.2,
    'total_spent_epsilon': 0.0,
    'total_spent_delta': 0.0}]}]

To get a list of all users in the ‘users’ collection:

[49]:
run("python mongodb_admin_cli.py get_users")
["Alice", "Dr. Antartica", "Dr. FSO", "Bob", "Jack"]

We can also get a list of all datasets allocated to an user:

[50]:
run("python mongodb_admin_cli.py get_user_datasets --user Alice")
["IRIS", "PENGUIN"]

Archives of queries

[51]:
run("python mongodb_admin_cli.py get_archives --user Alice")
[]

Stop the server: do not do it now !

To tear down the service, first do ctrl+C in the terminal where you had done docker compose up. Wait for the command to finish executing and then run docker compose down. This will also delete all the containers but the volume will stay in place.