{ "cells": [ { "cell_type": "markdown", "id": "3f18d338", "metadata": {}, "source": [ "# Lomas Client Side: Using Smartnoise-Synth" ] }, { "cell_type": "markdown", "id": "1582a2ae", "metadata": {}, "source": [ "This notebook showcases how researcher could use the Secure Data Disclosure system. It explains the different functionnalities provided by the `lomas-client` client library to interact with the secure server.\n", "\n", "The secure data are never visible by researchers. They can only access to differentially private responses via queries to the server.\n", "\n", "Each user has access to one or multiple projects and for each dataset has a limited budget with $\\epsilon$ and $\\delta$ values." ] }, { "cell_type": "markdown", "id": "01ae30d2", "metadata": {}, "source": [ "## Step 1: Install the library\n", "To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library `lomas-client` on her local developping environment. \n", "\n", "It can be installed via the pip command:" ] }, { "cell_type": "code", "execution_count": 1, "id": "dc563050-fcc0-4c11-9e63-46eaefa63ce7", "metadata": {}, "outputs": [], "source": [ "# !pip install lomas_client" ] }, { "cell_type": "markdown", "id": "c5df0c8f-ca9c-4af1-8c60-fb1d30d6283d", "metadata": {}, "source": [ "Or using a local version of the client" ] }, { "cell_type": "code", "execution_count": 2, "id": "36d508bf-6cc3-4034-8e11-fffe858552f9", "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "sys.path.append(os.path.abspath(os.path.join('..')))" ] }, { "cell_type": "code", "execution_count": 3, "id": "9535e92e-620e-4df4-92dd-4ea2c653e4ab", "metadata": {}, "outputs": [], "source": [ "from lomas_client import Client\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "9c63718b", "metadata": {}, "source": [ "## Step 2: Initialise the client\n", "\n", "Once the library is installed, a Client object must be created. It is responsible for sending sending requests to the server and processing responses in the local environment. It enables a seamless interaction with the server. \n", "\n", "To create the client, Dr. Antartica needs to give it a few parameters:\n", "- a url: the root application endpoint to the remote secure server.\n", "- user_name: her name as registered in the database (Dr. Alice Antartica)\n", "- dataset_name: the name of the dataset that she wants to query (PENGUIN)\n", "\n", "She will only be able to query on the real dataset if the queen Icergina has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some epsilon and delta credit (as is done in the Admin Notebook for Users and Datasets management)." ] }, { "cell_type": "code", "execution_count": null, "id": "f4c18a1e", "metadata": {}, "outputs": [], "source": [ "DATASET_NAME = \"PENGUIN\"" ] }, { "cell_type": "code", "execution_count": null, "id": "38b3eb04", "metadata": {}, "outputs": [], "source": [ "# The following would usually be set in the environment by a system administrator\n", "# and be tranparent to lomas users.\n", "APP_URL = \"http://localhost:48080\" # For local devenv setup\n", "# APP_URL = \"http://lomas_server:48080\" # For local docker compose setup\n", "# APP_URL = \"http://lomas-server.example.com:80\" # For Kubernetes deployment\n", "USER_NAME = \"Dr.Antartica\"\n", "\n", "import os\n", "os.environ[\"LOMAS_CLIENT_ID\"] = USER_NAME\n", "os.environ[\"LOMAS_CLIENT_SECRET\"] = USER_NAME.lower()\n", "os.environ[\"LOMAS_KEYCLOAK_ADDRESS\"] = \"localhost\" # For local devenv setup\n", "# os.environ[\"LOMAS_KEYCLOAK_ADDRESS\"] = \"keycloak\" # For local docker compose setup\n", "# os.environ[\"LOMAS_KEYCLOAK_ADDRESS\"] = \"lomas-keycloak.example.com\" # For Kubernetes deployment \n", "os.environ[\"LOMAS_KEYCLOAK_PORT\"] = \"80\" # For local deployments\n", "# os.environ[\"LOMAS_KEYCLOAK_PORT\"] = \"443\" # For Kubernetes deployment\n", "os.environ[\"LOMAS_KEYCLOAK_USE_TLS\"] = \"0\" # For local deployments\n", "# os.environ[\"LOMAS_KEYCLOAK_USE_TLS\"] = \"1\" # For Kubernetes deployments\n", "os.environ[\"LOMAS_REALM\"] = \"lomas\"" ] }, { "cell_type": "code", "execution_count": null, "id": "d11725be", "metadata": {}, "outputs": [], "source": [ "client = Client(url=APP_URL, dataset_name=DATASET_NAME)" ] }, { "cell_type": "markdown", "id": "0ec400c8", "metadata": {}, "source": [ "And that's it for the preparation. She is now ready to use the various functionnalities offered by `lomas-client`." ] }, { "cell_type": "markdown", "id": "9b9a5f13", "metadata": {}, "source": [ "## Step 3: Metadata and dummy dataset" ] }, { "cell_type": "markdown", "id": "c7cb5531", "metadata": {}, "source": [ "### Getting dataset metadata\n", "\n", "Dr. Antartica has never seen the data and as a first step to understand what is available to her, she would like to check the metadata of the dataset. Therefore, she just needs to call the `get_dataset_metadata()` function of the client. As this is public information, this does not cost any budget.\n", "\n", "This function returns metadata information in a format based on [SmartnoiseSQL dictionary format](https://docs.smartnoise.org/sql/metadata.html#dictionary-format), where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added." ] }, { "cell_type": "code", "execution_count": 5, "id": "0fdebac9-57fc-4410-878b-5a77425af634", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'max_ids': 1,\n", " 'rows': 344,\n", " 'row_privacy': True,\n", " 'censor_dims': False,\n", " 'columns': {'species': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'string',\n", " 'cardinality': 3,\n", " 'categories': ['Adelie', 'Chinstrap', 'Gentoo']},\n", " 'island': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'string',\n", " 'cardinality': 3,\n", " 'categories': ['Torgersen', 'Biscoe', 'Dream']},\n", " 'bill_length_mm': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 30.0,\n", " 'upper': 65.0},\n", " 'bill_depth_mm': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 13.0,\n", " 'upper': 23.0},\n", " 'flipper_length_mm': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 150.0,\n", " 'upper': 250.0},\n", " 'body_mass_g': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 2000.0,\n", " 'upper': 7000.0},\n", " 'sex': {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'string',\n", " 'cardinality': 2,\n", " 'categories': ['MALE', 'FEMALE']}}}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "penguin_metadata = client.get_dataset_metadata()\n", "penguin_metadata" ] }, { "cell_type": "markdown", "id": "9e7ca7ae-bf17-40c8-aa75-2d72fcdd3088", "metadata": {}, "source": [ "## Step 3: Create a Synthetic Dataset keeping all default parameters" ] }, { "cell_type": "markdown", "id": "2de1389c-53a7-4098-bc3c-397c12a4b869", "metadata": {}, "source": [ "We want to get a synthetic model to represent the private data.\n", "\n", "Therefore, we use a Smartnoise Synth Synthesizers." ] }, { "cell_type": "markdown", "id": "3423d410-2501-4eaa-bea4-6b31fba8c869", "metadata": {}, "source": [ "Let's list the potential options. There respective paramaters are then available in Smarntoise Synth documentation [here](https://docs.smartnoise.org/synth/synthesizers/index.html)." ] }, { "cell_type": "code", "execution_count": 6, "id": "cf6dd9f4-a9ca-4805-a597-553a26604430", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['mwem', 'dpctgan', 'patectgan', 'mst', 'pacsynth', 'dpgan', 'pategan', 'aim']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from snsynth import Synthesizer\n", "Synthesizer.list_synthesizers()" ] }, { "cell_type": "markdown", "id": "a06365e9-4076-4592-871a-31af91d6a05d", "metadata": {}, "source": [ "### AIM: Adaptive Iterative Mechanism" ] }, { "cell_type": "markdown", "id": "4f83dffe-f5b6-42fc-a74c-f3f00dc6c257", "metadata": {}, "source": [ "We start by executing a query on the dummy dataset without specifying any special parameters for AIM (all optional kept as default).\n", "Also only works on categorical columns so we select \"species\" and \"island\" columns to create a synthetic dataset of these two columns." ] }, { "cell_type": "code", "execution_count": 7, "id": "a17ef2a7-1a70-440a-b11f-9867e1e9dd70", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesisland
0GentooBiscoe
1AdelieBiscoe
2GentooDream
3ChinstrapDream
4GentooBiscoe
.........
195GentooTorgersen
196Chinstrap
197ChinstrapTorgersen
198AdelieBiscoe
199GentooDream
\n", "

200 rows × 2 columns

\n", "
" ], "text/plain": [ " species island\n", "0 Gentoo Biscoe\n", "1 Adelie Biscoe\n", "2 Gentoo Dream\n", "3 Chinstrap Dream\n", "4 Gentoo Biscoe\n", ".. ... ...\n", "195 Gentoo Torgersen\n", "196 Chinstrap \n", "197 Chinstrap Torgersen\n", "198 Adelie Biscoe\n", "199 Gentoo Dream\n", "\n", "[200 rows x 2 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"aim\",\n", " epsilon=1.0,\n", " delta=0.0001,\n", " select_cols = [\"species\", \"island\"],\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples" ] }, { "cell_type": "markdown", "id": "f12ed311-3622-4cb8-b5e5-585cf20c91a8", "metadata": {}, "source": [ "The algorithm works and returned a synthetic dataset. We now estimate the cost of running this command:" ] }, { "cell_type": "code", "execution_count": 8, "id": "51063e79-0809-49ee-b7f0-c19b190571c5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CostResponse(epsilon=1.0, delta=0.0001)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_cost = client.smartnoise_synth.cost(\n", " synth_name=\"aim\",\n", " epsilon=1.0,\n", " delta=0.0001,\n", " select_cols = [\"species\", \"island\"],\n", ")\n", "res_cost" ] }, { "cell_type": "markdown", "id": "0f582e93-ca3b-4a9d-b24a-8c26996cab64", "metadata": {}, "source": [ "Executing such a query on the private dataset would cost 1.0 epsilon and 0.0001 delta. Dr. Antartica decides to do it with now the flag `dummmy` to False and specifiying that the wants the aim synthesizer model in return (with `return_model = True`).\n", "\n", "NOTE: if she does not set the parameter `return_model = True`, then it is False by default and she will get a synthetic dataframe as response directly." ] }, { "cell_type": "code", "execution_count": 9, "id": "8160a5ab-dd53-4d0d-9f6d-ff31c39831c9", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.12/site-packages/mbi/__init__.py:15: UserWarning: MixtureInference disabled, please install jax and jaxlib\n", " warnings.warn('MixtureInference disabled, please install jax and jaxlib')\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = client.smartnoise_synth.query(\n", " synth_name=\"aim\",\n", " epsilon=1.0,\n", " delta=0.0001,\n", " select_cols = [\"species\", \"island\"],\n", " dummy=True,\n", " return_model = True\n", ")\n", "res.result.model" ] }, { "cell_type": "markdown", "id": "20d8db1d-6fe2-4bf0-9b12-e9e25a9df235", "metadata": {}, "source": [ "She can now get the model and sample results with it. She choose to sample 10 samples." ] }, { "cell_type": "code", "execution_count": 10, "id": "1add6713-1906-4d57-bd90-9e54b0a883d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesisland
0GentooTorgersen
1GentooBiscoe
2ChinstrapBiscoe
3GentooDream
4AdelieDream
5ChinstrapBiscoe
6GentooDream
7AdelieTorgersen
8ChinstrapDream
9GentooTorgersen
\n", "
" ], "text/plain": [ " species island\n", "0 Gentoo Torgersen\n", "1 Gentoo Biscoe\n", "2 Chinstrap Biscoe\n", "3 Gentoo Dream\n", "4 Adelie Dream\n", "5 Chinstrap Biscoe\n", "6 Gentoo Dream\n", "7 Adelie Torgersen\n", "8 Chinstrap Dream\n", "9 Gentoo Torgersen" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "synth = res.result.model\n", "synth.sample(10)" ] }, { "cell_type": "markdown", "id": "9b9837d3-11a5-49d9-aaaf-0637061cf2f5", "metadata": {}, "source": [ "She now wants to specify some specific parameters to the AIM model. Therefore, she needs to set some parameters in `synth_params` based on the Smartnoise-Synth documentation [here](https://docs.smartnoise.org/synth/synthesizers/aim.html#parameters). She decides that she wants to modify the `max_model_size` to 50 (the default was 80) and tries on the dummy." ] }, { "cell_type": "code", "execution_count": 11, "id": "8be9943c-4cb3-41ef-bb31-02cbb6e773c6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"aim\",\n", " epsilon=1.0,\n", " delta=0.0001,\n", " select_cols = [\"species\", \"island\"],\n", " dummy=True,\n", " return_model = True,\n", " synth_params = {\"max_model_size\": 50}\n", ")\n", "res_dummy.result.model" ] }, { "cell_type": "code", "execution_count": 12, "id": "e0e958c9-76de-499c-bdeb-658f7052d0b0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesisland
0GentooTorgersen
1ChinstrapDream
2GentooBiscoe
3ChinstrapBiscoe
4AdelieDream
\n", "
" ], "text/plain": [ " species island\n", "0 Gentoo Torgersen\n", "1 Chinstrap Dream\n", "2 Gentoo Biscoe\n", "3 Chinstrap Biscoe\n", "4 Adelie Dream" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "synth = res_dummy.result.model\n", "synth.sample(5)" ] }, { "cell_type": "markdown", "id": "5b656a6c-2199-465e-b109-818c369b2798", "metadata": {}, "source": [ "Now that the workflow is understood for AIM, she wants to experiment with various synthesizer on the dummy." ] }, { "cell_type": "markdown", "id": "69cab29c-a882-4821-a0ef-eeb863e03071", "metadata": {}, "source": [ "### MWEM: Multiplicative Weights Exponential Mechanism " ] }, { "cell_type": "markdown", "id": "036bb9fe-29e1-42c1-bf7b-e684c7c37336", "metadata": {}, "source": [ "She tries MWEM on all columns with all default parameters. As `return_model` is not specified she will directly receive a synthetic dataframe back. " ] }, { "cell_type": "code", "execution_count": 13, "id": "002a9a17-3e75-427b-8293-0fbd5188f762", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieDream56.2522.5165.04750.0FEMALE
1GentooDream35.2518.5205.02750.0FEMALE
2ChinstrapBiscoe49.2513.5215.02250.0FEMALE
3AdelieBiscoe35.2513.5225.04750.0MALE
4AdelieDream56.2522.5165.04750.0FEMALE
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Adelie Dream 56.25 22.5 165.0 \n", "1 Gentoo Dream 35.25 18.5 205.0 \n", "2 Chinstrap Biscoe 49.25 13.5 215.0 \n", "3 Adelie Biscoe 35.25 13.5 225.0 \n", "4 Adelie Dream 56.25 22.5 165.0 \n", "\n", " body_mass_g sex \n", "0 4750.0 FEMALE \n", "1 2750.0 FEMALE \n", "2 2250.0 FEMALE \n", "3 4750.0 MALE \n", "4 4750.0 FEMALE " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mwem\",\n", " epsilon=1.0,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "bf112c1d-2313-451e-8107-966a7b731283", "metadata": {}, "source": [ "She now specifies 3 columns and some parameters explained [here](https://docs.smartnoise.org/synth/synthesizers/mwem.html#snsynth.mwem.MWEMSynthesizer)." ] }, { "cell_type": "code", "execution_count": 14, "id": "4f7303b7-77e0-4023-8f59-6b30e503567b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandsex
0ChinstrapTorgersenFEMALE
1AdelieTorgersenFEMALE
2AdelieTorgersenFEMALE
3ChinstrapTorgersenFEMALE
4GentooBiscoeMALE
\n", "
" ], "text/plain": [ " species island sex\n", "0 Chinstrap Torgersen FEMALE\n", "1 Adelie Torgersen FEMALE\n", "2 Adelie Torgersen FEMALE\n", "3 Chinstrap Torgersen FEMALE\n", "4 Gentoo Biscoe MALE" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mwem\",\n", " epsilon=1.0,\n", " select_cols = [\"species\", \"island\", \"sex\"],\n", " synth_params = {\"measure_only\": False, \"max_retries_exp_mechanism\": 5},\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "184114e9-1cf0-4b53-a2e4-5a14d787562a", "metadata": {}, "source": [ "Finally it MWEM, she wants to go more in depth and create her own data preparation pipeline. Therefore, she can use Smartnoise-Synth \"Data Transformers\" explained [here](https://docs.smartnoise.org/synth/transforms/index.html) and send her own constraints dictionnary for specific steps. This is more for advanced user.\n", "\n", "By default, if no constraints are specified, the server creates its automatically a data transformer based on selected columns, synthesizer and metadata.\n", "\n", "Here she wants to add a clamping transformation on the continuous columns before training the synthesizer. She add the bounds based on metadata." ] }, { "cell_type": "code", "execution_count": 15, "id": "3c3bf0ec-ca04-4c91-8742-c31f79633191", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "({'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 30.0,\n", " 'upper': 65.0},\n", " {'private_id': False,\n", " 'nullable': False,\n", " 'max_partition_length': None,\n", " 'max_influenced_partitions': None,\n", " 'max_partition_contributions': None,\n", " 'type': 'float',\n", " 'precision': 64,\n", " 'lower': 13.0,\n", " 'upper': 23.0})" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bl_bounds = penguin_metadata[\"columns\"][\"bill_length_mm\"]\n", "bd_bounds = penguin_metadata[\"columns\"][\"bill_depth_mm\"]\n", "bl_bounds, bd_bounds" ] }, { "cell_type": "code", "execution_count": 16, "id": "c3215ff4-aafb-4c1e-adf0-50dc383cd133", "metadata": {}, "outputs": [], "source": [ "from snsynth.transform import BinTransformer, ClampTransformer, ChainTransformer, LabelTransformer\n", "\n", "my_own_constraints = {\n", " \"bill_length_mm\": ChainTransformer(\n", " [\n", " ClampTransformer(lower = bl_bounds[\"lower\"] + 10, upper = bl_bounds[\"upper\"] - 10),\n", " BinTransformer(bins = 20, lower = bl_bounds[\"lower\"] + 10, upper = bl_bounds[\"upper\"] - 10),\n", " ]\n", " ),\n", " \"bill_depth_mm\": ChainTransformer(\n", " [\n", " ClampTransformer(lower = bd_bounds[\"lower\"] + 2, upper = bd_bounds[\"upper\"] - 2),\n", " BinTransformer(bins=20, lower = bd_bounds[\"lower\"] + 2, upper = bd_bounds[\"upper\"] - 2),\n", " ]\n", " ),\n", " \"species\": LabelTransformer(nullable=True)\n", "}" ] }, { "cell_type": "code", "execution_count": 17, "id": "ffba8bcd-0b6b-4cc0-98a3-b25e8bdd1786", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bill_length_mmbill_depth_mmspecies
047.87515.15Chinstrap
148.62515.45Gentoo
255.00020.85Adelie
347.87515.15Chinstrap
447.87515.15Chinstrap
\n", "
" ], "text/plain": [ " bill_length_mm bill_depth_mm species\n", "0 47.875 15.15 Chinstrap\n", "1 48.625 15.45 Gentoo\n", "2 55.000 20.85 Adelie\n", "3 47.875 15.15 Chinstrap\n", "4 47.875 15.15 Chinstrap" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mwem\",\n", " epsilon=1.0,\n", " select_cols = [\"bill_length_mm\", \"bill_depth_mm\", \"species\"],\n", " constraints = my_own_constraints,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "8d680bdc-e99f-4a40-a9ff-c2d1d053a573", "metadata": {}, "source": [ "Also a subset of constraints can be specified for certain columns and the server will automatically generate those for the missing columns." ] }, { "cell_type": "code", "execution_count": 18, "id": "21cc6269-5a6f-40c6-b081-52e59a62d903", "metadata": {}, "outputs": [], "source": [ "my_own_constraints = {\n", " \"bill_length_mm\": ChainTransformer(\n", " [\n", " ClampTransformer(lower = bl_bounds[\"lower\"] + 10, upper = bl_bounds[\"upper\"] - 10),\n", " BinTransformer(bins = 20, lower = bl_bounds[\"lower\"] + 10, upper = bl_bounds[\"upper\"] - 10),\n", " ]\n", " )\n", "}" ] }, { "cell_type": "markdown", "id": "2b590d77-3cde-42ef-b55b-107cd253aad4", "metadata": {}, "source": [ "In this case, only the bill_length will be clamped." ] }, { "cell_type": "code", "execution_count": 19, "id": "432e076e-8411-49f0-8250-100e0940313a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bill_length_mmbill_depth_mmspecies
054.62514.5Adelie
140.37513.5Gentoo
254.62514.5Adelie
354.62514.5Adelie
454.62514.5Adelie
\n", "
" ], "text/plain": [ " bill_length_mm bill_depth_mm species\n", "0 54.625 14.5 Adelie\n", "1 40.375 13.5 Gentoo\n", "2 54.625 14.5 Adelie\n", "3 54.625 14.5 Adelie\n", "4 54.625 14.5 Adelie" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mwem\",\n", " epsilon=1.0,\n", " select_cols = [\"bill_length_mm\", \"bill_depth_mm\", \"species\"],\n", " constraints = my_own_constraints,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "035b69c3-4819-4fc1-b4ac-1c2bac1b31fc", "metadata": {}, "source": [ "### MST: Maximum Spanning Tree" ] }, { "cell_type": "markdown", "id": "21190b72-3089-4783-addd-09590814b94f", "metadata": {}, "source": [ "She now experiments with MST. As the synthesizer is very needy in terms of computation, she selects a subset of column for it. See MST [here](https://docs.smartnoise.org/synth/synthesizers/mst.html)." ] }, { "cell_type": "code", "execution_count": 20, "id": "08d3f738-7dd0-4c73-9db6-ea91ee188968", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciessex
0ChinstrapFEMALE
1
2Chinstrap
3
4GentooMALE
\n", "
" ], "text/plain": [ " species sex\n", "0 Chinstrap FEMALE\n", "1 \n", "2 Chinstrap \n", "3 \n", "4 Gentoo MALE" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mst\",\n", " epsilon=1.0,\n", " select_cols = [\"species\", \"sex\"],\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "249615d7-14e6-4f9c-8368-3fc00b4832c9", "metadata": {}, "source": [ "She can also specify a specific number of samples to get (if return_model is not True):" ] }, { "cell_type": "code", "execution_count": 21, "id": "7ea90e9d-7190-43d7-8ab8-c58214cd4198", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciessex
0FEMALE
1GentooMALE
2GentooFEMALE
3ChinstrapMALE
\n", "
" ], "text/plain": [ " species sex\n", "0 FEMALE\n", "1 Gentoo MALE\n", "2 Gentoo FEMALE\n", "3 Chinstrap MALE" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mst\",\n", " epsilon=1.0,\n", " select_cols = [\"species\", \"sex\"],\n", " nb_samples = 4,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples" ] }, { "cell_type": "markdown", "id": "c1b4c876-11a5-4466-8831-d94786debe00", "metadata": {}, "source": [ "And a condition on these samples. For instance, here, she only wants female samples." ] }, { "cell_type": "code", "execution_count": 22, "id": "e1d50939-1fc7-4fcb-84d2-d7611435800b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexspecies
0Gentoo
1Gentoo
2Gentoo
3Gentoo
\n", "
" ], "text/plain": [ " sex species\n", "0 Gentoo\n", "1 Gentoo\n", "2 Gentoo\n", "3 Gentoo" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"mst\",\n", " epsilon=1.0,\n", " select_cols = [\"sex\", \"species\"],\n", " nb_samples = 4,\n", " condition = \"sex = FEMALE\",\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples" ] }, { "cell_type": "markdown", "id": "54809d01-9dd4-4bef-a575-3bc61d52b842", "metadata": {}, "source": [ "## DPCTGAN: Differentially Private Conditional Tabular GAN" ] }, { "cell_type": "markdown", "id": "8e0cb6b4-f539-4613-86f1-031b646e2376", "metadata": {}, "source": [ "She now tries DPCTGAN. A first warning let her know that the random noise generation for this model is not cryptographically secure and if it is not ok for her, she can decode to stop using this synthesizer. Then she does not get a response but an error 422 with an explanation." ] }, { "cell_type": "code", "execution_count": 23, "id": "7b508f33-7294-4928-8a25-27e0f0c702d3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/code/lomas_client/utils.py:62: UserWarning: Warning:dpctgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).\n", " warnings.warn(\n" ] }, { "ename": "ExternalLibraryException", "evalue": "('smartnoise_synth', 'Error fitting model: sample_rate=5.0 is not a valid value. Please provide a float between 0 and 1. Try decreasing batch_size in synth_params (default batch_size=500).')", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mExternalLibraryException\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m res_dummy \u001b[38;5;241m=\u001b[39m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msmartnoise_synth\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mquery\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 2\u001b[0m \u001b[43m \u001b[49m\u001b[43msynth_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdpctgan\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3\u001b[0m \u001b[43m \u001b[49m\u001b[43mepsilon\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m1.0\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4\u001b[0m \u001b[43m \u001b[49m\u001b[43mdummy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m)\u001b[49m\n\u001b[1;32m 6\u001b[0m res_dummy\n", "File \u001b[0;32m/code/lomas_client/libraries/smartnoise_synth.py:203\u001b[0m, in \u001b[0;36mSmartnoiseSynthClient.query\u001b[0;34m(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)\u001b[0m\n\u001b[1;32m 200\u001b[0m r_model \u001b[38;5;241m=\u001b[39m QueryResponse\u001b[38;5;241m.\u001b[39mmodel_validate_json(data)\n\u001b[1;32m 201\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m r_model\n\u001b[0;32m--> 203\u001b[0m \u001b[43mraise_error\u001b[49m\u001b[43m(\u001b[49m\u001b[43mres\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 204\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n", "File \u001b[0;32m/code/lomas_client/utils.py:38\u001b[0m, in \u001b[0;36mraise_error\u001b[0;34m(response)\u001b[0m\n\u001b[1;32m 36\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidQueryException(error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalidQueryException\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m 37\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m==\u001b[39m status\u001b[38;5;241m.\u001b[39mHTTP_422_UNPROCESSABLE_ENTITY:\n\u001b[0;32m---> 38\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ExternalLibraryException(\n\u001b[1;32m 39\u001b[0m error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlibrary\u001b[39m\u001b[38;5;124m\"\u001b[39m], error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExternalLibraryException\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 40\u001b[0m )\n\u001b[1;32m 41\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m==\u001b[39m status\u001b[38;5;241m.\u001b[39mHTTP_403_FORBIDDEN:\n\u001b[1;32m 42\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m UnauthorizedAccessException(error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUnauthorizedAccessException\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n", "\u001b[0;31mExternalLibraryException\u001b[0m: ('smartnoise_synth', 'Error fitting model: sample_rate=5.0 is not a valid value. Please provide a float between 0 and 1. Try decreasing batch_size in synth_params (default batch_size=500).')" ] } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"dpctgan\",\n", " epsilon=1.0,\n", " dummy=True,\n", ")\n", "res_dummy" ] }, { "cell_type": "markdown", "id": "240b3ad1-bc17-48cd-877e-19a05dc18b67", "metadata": {}, "source": [ "The default parameters of DPCTGAN do not work for PENGUIN dataset. Hence, as advised in the error message, she decreases the batch_size (also she checks the documentation [here](https://docs.smartnoise.org/synth/synthesizers/dpctgan.html)." ] }, { "cell_type": "code", "execution_count": 24, "id": "8b122e7f-9cd4-42ca-b175-f26d34609646", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieDream43.41410817.841402180.2846425016.072102FEMALE
1GentooBiscoe43.29885216.777365222.2253405162.192479MALE
2AdelieDream50.62239419.280649209.8938675275.184557FEMALE
3ChinstrapBiscoe41.49321617.206660233.3231572938.070863FEMALE
4AdelieBiscoe46.74927817.139504204.0606085795.609772MALE
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Adelie Dream 43.414108 17.841402 180.284642 \n", "1 Gentoo Biscoe 43.298852 16.777365 222.225340 \n", "2 Adelie Dream 50.622394 19.280649 209.893867 \n", "3 Chinstrap Biscoe 41.493216 17.206660 233.323157 \n", "4 Adelie Biscoe 46.749278 17.139504 204.060608 \n", "\n", " body_mass_g sex \n", "0 5016.072102 FEMALE \n", "1 5162.192479 MALE \n", "2 5275.184557 FEMALE \n", "3 2938.070863 FEMALE \n", "4 5795.609772 MALE " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"dpctgan\",\n", " epsilon=1.0,\n", " synth_params = {\"batch_size\": 50},\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "d0275fe1-7af3-4c1a-b54c-f054b9fc9658", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6372e6e4-26d7-401d-b11b-46d1313a9b1f", "metadata": {}, "source": [ "## PATEGAN: Private Aggregation of Teacher Ensembles" ] }, { "cell_type": "markdown", "id": "547e950a-0070-42b4-819a-f8527b4e24f1", "metadata": {}, "source": [ "Unfortunatelly, she is not able to train the pategan synthetizer on the PENGUIN dataset. Hence, she must try another one." ] }, { "cell_type": "code", "execution_count": 25, "id": "5a9859d6-5bd7-4300-9232-95314bee37f6", "metadata": {}, "outputs": [ { "ename": "ExternalLibraryException", "evalue": "('smartnoise_synth', 'pategan not reliable with this dataset.')", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mExternalLibraryException\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[25], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m res_dummy \u001b[38;5;241m=\u001b[39m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msmartnoise_synth\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mquery\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 2\u001b[0m \u001b[43m \u001b[49m\u001b[43msynth_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpategan\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3\u001b[0m \u001b[43m \u001b[49m\u001b[43mepsilon\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m1.0\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4\u001b[0m \u001b[43m \u001b[49m\u001b[43mdummy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m)\u001b[49m\n\u001b[1;32m 6\u001b[0m res_dummy\n", "File \u001b[0;32m/code/lomas_client/libraries/smartnoise_synth.py:203\u001b[0m, in \u001b[0;36mSmartnoiseSynthClient.query\u001b[0;34m(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)\u001b[0m\n\u001b[1;32m 200\u001b[0m r_model \u001b[38;5;241m=\u001b[39m QueryResponse\u001b[38;5;241m.\u001b[39mmodel_validate_json(data)\n\u001b[1;32m 201\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m r_model\n\u001b[0;32m--> 203\u001b[0m \u001b[43mraise_error\u001b[49m\u001b[43m(\u001b[49m\u001b[43mres\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 204\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n", "File \u001b[0;32m/code/lomas_client/utils.py:38\u001b[0m, in \u001b[0;36mraise_error\u001b[0;34m(response)\u001b[0m\n\u001b[1;32m 36\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m InvalidQueryException(error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInvalidQueryException\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m 37\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m==\u001b[39m status\u001b[38;5;241m.\u001b[39mHTTP_422_UNPROCESSABLE_ENTITY:\n\u001b[0;32m---> 38\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ExternalLibraryException(\n\u001b[1;32m 39\u001b[0m error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlibrary\u001b[39m\u001b[38;5;124m\"\u001b[39m], error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExternalLibraryException\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 40\u001b[0m )\n\u001b[1;32m 41\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m==\u001b[39m status\u001b[38;5;241m.\u001b[39mHTTP_403_FORBIDDEN:\n\u001b[1;32m 42\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m UnauthorizedAccessException(error_message[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUnauthorizedAccessException\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n", "\u001b[0;31mExternalLibraryException\u001b[0m: ('smartnoise_synth', 'pategan not reliable with this dataset.')" ] } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"pategan\",\n", " epsilon=1.0,\n", " dummy=True,\n", ")\n", "res_dummy" ] }, { "cell_type": "markdown", "id": "248f8cdc-a7d0-46e2-9b3a-61b78da606b0", "metadata": {}, "source": [ "## PATECTGAN: Conditional tabular GAN using Private Aggregation of Teacher Ensembles" ] }, { "cell_type": "code", "execution_count": 26, "id": "87dcdd69-8bf0-4eb4-88c1-f6bc9c87ea09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieTorgersen44.96522321.050138197.0112863798.269078MALE
1ChinstrapBiscoe54.78471118.795483189.3396034936.383002MALE
2ChinstrapBiscoe58.83641514.854715201.5414734849.516831MALE
3GentooDream49.26064119.661433245.8453954142.061740FEMALE
4GentooTorgersen48.66270817.788002177.3742485917.481452FEMALE
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Adelie Torgersen 44.965223 21.050138 197.011286 \n", "1 Chinstrap Biscoe 54.784711 18.795483 189.339603 \n", "2 Chinstrap Biscoe 58.836415 14.854715 201.541473 \n", "3 Gentoo Dream 49.260641 19.661433 245.845395 \n", "4 Gentoo Torgersen 48.662708 17.788002 177.374248 \n", "\n", " body_mass_g sex \n", "0 3798.269078 MALE \n", "1 4936.383002 MALE \n", "2 4849.516831 MALE \n", "3 4142.061740 FEMALE \n", "4 5917.481452 FEMALE " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"patectgan\",\n", " epsilon=1.0,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "code", "execution_count": 27, "id": "06eb0b95-1265-422c-9087-121b7091f37c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
islandbill_length_mmbody_mass_g
0Dream51.2955504649.196619
1Biscoe38.3691724301.166393
2Biscoe52.1367794498.011571
3Torgersen58.9008254223.040946
4Dream40.4921663707.417592
\n", "
" ], "text/plain": [ " island bill_length_mm body_mass_g\n", "0 Dream 51.295550 4649.196619\n", "1 Biscoe 38.369172 4301.166393\n", "2 Biscoe 52.136779 4498.011571\n", "3 Torgersen 58.900825 4223.040946\n", "4 Dream 40.492166 3707.417592" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"patectgan\",\n", " epsilon=1.0,\n", " select_cols = [\"island\", \"bill_length_mm\", \"body_mass_g\"],\n", " synth_params = {\n", " \"embedding_dim\": 256, \n", " \"generator_dim\": (128, 128), \n", " \"discriminator_dim\": (256, 256),\n", " \"generator_lr\": 0.0003, \n", " \"generator_decay\": 1e-05, \n", " \"discriminator_lr\": 0.0003, \n", " \"discriminator_decay\": 1e-05, \n", " \"batch_size\": 500\n", " },\n", " nb_samples = 100,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "7fcda9d1-4137-4ff3-9af7-eead85057dd5", "metadata": {}, "source": [ "## DPGAN: DIfferentially Private GAN" ] }, { "cell_type": "markdown", "id": "084ea436-f47e-4da2-95ea-7a068b9f1510", "metadata": {}, "source": [ "For DPGAN, there is the same warning as for DPCTGAN with the cryptographically secure random noise generation." ] }, { "cell_type": "code", "execution_count": 28, "id": "03c70909-e5f6-4c34-a787-37f5615ed600", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/code/lomas_client/utils.py:62: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0GentooBiscoe61.08430017.778250202.4042614074.338235MALE
1AdelieDream45.14312723.000000250.0000004078.621872MALE
2GentooBiscoe63.31005016.944589215.1555673999.723613FEMALE
3GentooDream65.00000022.198413218.9262387000.000000MALE
4AdelieDream65.00000023.000000191.2997804249.239404MALE
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Gentoo Biscoe 61.084300 17.778250 202.404261 \n", "1 Adelie Dream 45.143127 23.000000 250.000000 \n", "2 Gentoo Biscoe 63.310050 16.944589 215.155567 \n", "3 Gentoo Dream 65.000000 22.198413 218.926238 \n", "4 Adelie Dream 65.000000 23.000000 191.299780 \n", "\n", " body_mass_g sex \n", "0 4074.338235 MALE \n", "1 4078.621872 MALE \n", "2 3999.723613 FEMALE \n", "3 7000.000000 MALE \n", "4 4249.239404 MALE " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"dpgan\",\n", " epsilon=1.0,\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "db5329b4-6a21-4b38-abeb-68a68195109e", "metadata": {}, "source": [ "One final time she samples with conditions:" ] }, { "cell_type": "code", "execution_count": 29, "id": "17651dfc-d3b7-4030-a4be-1767de3767d1", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/code/lomas_client/utils.py:62: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0GentooTorgersen48.61425217.252423250.0000007000.000000FEMALE
1GentooTorgersen62.44352717.991540250.0000007000.000000FEMALE
2ChinstrapDream65.00000023.000000226.9080197000.000000MALE
3GentooDream60.14164616.770572246.7242725726.566434MALE
4AdelieTorgersen46.26025516.974378250.0000006849.641472MALE
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Gentoo Torgersen 48.614252 17.252423 250.000000 \n", "1 Gentoo Torgersen 62.443527 17.991540 250.000000 \n", "2 Chinstrap Dream 65.000000 23.000000 226.908019 \n", "3 Gentoo Dream 60.141646 16.770572 246.724272 \n", "4 Adelie Torgersen 46.260255 16.974378 250.000000 \n", "\n", " body_mass_g sex \n", "0 7000.000000 FEMALE \n", "1 7000.000000 FEMALE \n", "2 7000.000000 MALE \n", "3 5726.566434 MALE \n", "4 6849.641472 MALE " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"dpgan\",\n", " epsilon=1.0,\n", " condition = \"body_mass_g > 5000\",\n", " dummy=True,\n", ")\n", "res_dummy.result.df_samples.head()" ] }, { "cell_type": "markdown", "id": "177c8a4b-4eb7-444c-836d-cce81327e5c6", "metadata": {}, "source": [ "And now on the real dataset" ] }, { "cell_type": "code", "execution_count": 30, "id": "799c7574-0eb9-49c1-bc01-b051782d7b62", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/code/lomas_client/utils.py:62: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieBiscoe44.27591723.000000194.3869865710.911572
1ChinstrapBiscoe45.76153619.180464190.2286065585.222661FEMALE
2ChinstrapBiscoe51.91834320.711846250.0000005547.108099
3AdelieDream65.00000023.000000193.7611427000.000000
4ChinstrapDream65.00000023.000000244.2202066518.389255MALE
5AdelieTorgersen61.53313220.927101186.0779875242.271543MALE
6ChinstrapDream46.06600020.364783198.2498765248.364478
7AdelieTorgersen63.79151217.969750199.1375647000.000000
8AdelieDream65.00000016.838814180.9559056145.199358
9AdelieBiscoe65.00000019.727768183.6110007000.000000
\n", "
" ], "text/plain": [ " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", "0 Adelie Biscoe 44.275917 23.000000 194.386986 \n", "1 Chinstrap Biscoe 45.761536 19.180464 190.228606 \n", "2 Chinstrap Biscoe 51.918343 20.711846 250.000000 \n", "3 Adelie Dream 65.000000 23.000000 193.761142 \n", "4 Chinstrap Dream 65.000000 23.000000 244.220206 \n", "5 Adelie Torgersen 61.533132 20.927101 186.077987 \n", "6 Chinstrap Dream 46.066000 20.364783 198.249876 \n", "7 Adelie Torgersen 63.791512 17.969750 199.137564 \n", "8 Adelie Dream 65.000000 16.838814 180.955905 \n", "9 Adelie Biscoe 65.000000 19.727768 183.611000 \n", "\n", " body_mass_g sex \n", "0 5710.911572 \n", "1 5585.222661 FEMALE \n", "2 5547.108099 \n", "3 7000.000000 \n", "4 6518.389255 MALE \n", "5 5242.271543 MALE \n", "6 5248.364478 \n", "7 7000.000000 \n", "8 6145.199358 \n", "9 7000.000000 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_dummy = client.smartnoise_synth.query(\n", " synth_name=\"dpgan\",\n", " epsilon=1.0,\n", " condition = \"body_mass_g > 5000\",\n", " nb_samples = 10,\n", " dummy=False,\n", ")\n", "res_dummy.result.df_samples" ] }, { "cell_type": "markdown", "id": "94eaf59b-c108-424c-8978-b1c86e141ccb", "metadata": {}, "source": [ "## Step 6: See archives of queries" ] }, { "cell_type": "markdown", "id": "64003c53-de56-4bdc-a3c2-0c3e40031919", "metadata": {}, "source": [ "She now wants to verify all the queries that she did on the real data. It is possible because an archive of all queries is kept in a secure database. With a function call she can see her queries, budget and associated responses." ] }, { "cell_type": "code", "execution_count": 31, "id": "008fd230-cdfd-4e03-91ce-5a60b06c106d", "metadata": {}, "outputs": [], "source": [ "previous_queries = client.get_previous_queries()" ] }, { "cell_type": "markdown", "id": "f2a34bc3-d1a5-4124-983f-ddc09dd1af7b", "metadata": {}, "source": [ "Let's check the last query" ] }, { "cell_type": "code", "execution_count": 32, "id": "1795a54b-d04e-4687-8649-93982c84ad30", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'user_name': 'Dr. Antartica',\n", " 'dataset_name': 'PENGUIN',\n", " 'dp_library': 'smartnoise_synth',\n", " 'client_input': {'dataset_name': 'PENGUIN',\n", " 'synth_name': 'dpgan',\n", " 'epsilon': 1.0,\n", " 'delta': None,\n", " 'select_cols': [],\n", " 'synth_params': {},\n", " 'nullable': True,\n", " 'constraints': '',\n", " 'return_model': False,\n", " 'condition': 'body_mass_g > 5000',\n", " 'nb_samples': 10},\n", " 'response': {'epsilon': 1.0,\n", " 'delta': 0.00015673368198174188,\n", " 'requested_by': 'Dr. Antartica',\n", " 'result': res_type \\\n", " index sn_synth_samples \n", " columns sn_synth_samples \n", " data sn_synth_samples \n", " index_names sn_synth_samples \n", " column_names sn_synth_samples \n", " \n", " df_samples \n", " index [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] \n", " columns [species, island, bill_length_mm, bill_depth_m... \n", " data [[Adelie, Biscoe, 44.27591737359762, 23.0, 194... \n", " index_names [None] \n", " column_names [None] },\n", " 'timestamp': 1728461702.0089684}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "last_query = previous_queries[-1]\n", "last_query" ] }, { "cell_type": "code", "execution_count": null, "id": "08253456-fae7-424d-ac63-52dfc539c1e4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }