Lomas Client Side: Using Smartnoise-Synth

This notebook showcases how researcher could use the Secure Data Disclosure system. It explains the different functionnalities provided by the lomas-client client library to interact with the secure server.

The secure data are never visible by researchers. They can only access to differentially private responses via queries to the server.

Each user has access to one or multiple projects and for each dataset has a limited budget with \(\epsilon\) and \(\delta\) values.

Step 1: Install the library

To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library lomas-client on her local developping environment.

It can be installed via the pip command:

[ ]:
# !pip install lomas_client

Or using a local version of the client

[ ]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join('..')))
[ ]:
from lomas_client import Client
import numpy as np

Step 2: Initialise the client

Once the library is installed, a Client object must be created. It is responsible for sending sending requests to the server and processing responses in the local environment. It enables a seamless interaction with the server.

The client needs a few parameters to be created. Usually, these would be set in the environment by the system administrator (queen Icebergina) and be transparent to lomas users. In this instance, the following code snippet sets a few of these parameters that are specific to this notebook.

She will only be able to query on the real dataset if the queen Icergina has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some epsilon and delta credit.

[ ]:
# The following would usually be set in the environment by a system administrator
# and be tranparent to lomas users.
# Uncomment them if you are running against a Kubernetes deployment.
# They have already been set for you if you are running locally within a devenv or the Jupyter lab set up by Docker compose.

import os
# os.environ["LOMAS_CLIENT_APP_URL"] = "https://lomas.example.com:443"
# os.environ["LOMAS_CLIENT_KEYCLOAK_URL"] = "https://keycloak.example.com:443"
# os.environ["LOMAS_CLIENT_TELEMETRY__ENABLED"] = "false"
# os.environ["LOMAS_CLIENT_TELEMETRY__COLLECTOR_ENDPOINT"] = "http://otel.example.com:445"
# os.environ["LOMAS_CLIENT_TELEMETRY__COLLECTOR_INSECURE"] = "true"
# os.environ["LOMAS_CLIENT_TELEMETRY__SERVICE_ID"] = "my-app-client"
# os.environ["LOMAS_CLIENT_REALM"] = "lomas"

# We set these ones because they are specific to this notebook.

USER_NAME = "Dr.Antartica"
os.environ["LOMAS_CLIENT_CLIENT_ID"] = USER_NAME
os.environ["LOMAS_CLIENT_CLIENT_SECRET"] = USER_NAME.lower()
os.environ["LOMAS_CLIENT_DATASET_NAME"] = "PENGUIN"

# Note that all client settings can also be passed as keyword arguments to the Client constructor.
# eg. client = Client(client_id = "Dr.Antartica") takes precedence over setting the "LOMAS_CLIENT_CLIENT_ID"
# environment variable.
[ ]:
client = Client()

And that’s it for the preparation. She is now ready to use the various functionnalities offered by lomas-client.

Step 3: Metadata and dummy dataset

Getting dataset metadata

Dr. Antartica has never seen the data and as a first step to understand what is available to her, she would like to check the metadata of the dataset. Therefore, she just needs to call the get_dataset_metadata() function of the client. As this is public information, this does not cost any budget.

This function returns metadata information in a format based on SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added.

[ ]:
penguin_metadata = client.get_dataset_metadata()
penguin_metadata
{'max_ids': 1,
 'rows': 344,
 'row_privacy': True,
 'censor_dims': False,
 'columns': {'species': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
  'island': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Torgersen', 'Biscoe', 'Dream']},
  'bill_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 30.0,
   'upper': 65.0},
  'bill_depth_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 13.0,
   'upper': 23.0},
  'flipper_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 150.0,
   'upper': 250.0},
  'body_mass_g': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 2000.0,
   'upper': 7000.0},
  'sex': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 2,
   'categories': ['MALE', 'FEMALE']}}}

Step 3: Create a Synthetic Dataset keeping all default parameters

We want to get a synthetic model to represent the private data.

Therefore, we use a Smartnoise Synth Synthesizers.

Let’s list the potential options. There respective paramaters are then available in Smarntoise Synth documentation here.

[ ]:
from snsynth import Synthesizer
Synthesizer.list_synthesizers()
['mwem', 'dpctgan', 'patectgan', 'mst', 'pacsynth', 'dpgan', 'pategan', 'aim']

AIM: Adaptive Iterative Mechanism

We start by executing a query on the dummy dataset without specifying any special parameters for AIM (all optional kept as default). Also only works on categorical columns so we select “species” and “island” columns to create a synthetic dataset of these two columns.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
)
res_dummy.result.df_samples
species island
0 Chinstrap Biscoe
1 Gentoo Biscoe
2 Gentoo Dream
3 Adelie Dream
4 Gentoo Biscoe
... ... ...
195 Gentoo Torgersen
196 Adelie Biscoe
197 Gentoo Biscoe
198 Adelie Torgersen
199 Gentoo Torgersen

200 rows × 2 columns

The algorithm works and returned a synthetic dataset. We now estimate the cost of running this command:

[ ]:
res_cost = client.smartnoise_synth.cost(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
)
res_cost
CostResponse(epsilon=1.0, delta=0.0001)

Executing such a query on the private dataset would cost 1.0 epsilon and 0.0001 delta. Dr. Antartica decides to do it with now the flag dummmy to False and specifiying that the wants the aim synthesizer model in return (with return_model = True).

NOTE: if she does not set the parameter return_model = True, then it is False by default and she will get a synthetic dataframe as response directly.

[ ]:
res = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True
)
res.result.model
<snsynth.aim.aim.AIMSynthesizer at 0x75e5e1044500>

She can now get the model and sample results with it. She choose to sample 10 samples.

[ ]:
synth = res.result.model
synth.sample(10)
species island
0 Chinstrap Torgersen
1 Adelie Torgersen
2 Chinstrap Torgersen
3 Adelie Dream
4 Chinstrap Biscoe
5 Gentoo Biscoe
6 Gentoo Dream
7 Gentoo Biscoe
8 Chinstrap Biscoe
9 Chinstrap Torgersen

She now wants to specify some specific parameters to the AIM model. Therefore, she needs to set some parameters in synth_params based on the Smartnoise-Synth documentation here. She decides that she wants to modify the max_model_size to 50 (the default was 80) and tries on the dummy.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True,
    synth_params = {"max_model_size": 50}
)
res_dummy.result.model
<snsynth.aim.aim.AIMSynthesizer at 0x75e5e106e930>
[ ]:
synth = res_dummy.result.model
synth.sample(5)
species island
0 Adelie Biscoe
1 Gentoo Torgersen
2 Chinstrap Biscoe
3 Chinstrap Torgersen
4 Gentoo Dream

Now that the workflow is understood for AIM, she wants to experiment with various synthesizer on the dummy.

MWEM: Multiplicative Weights Exponential Mechanism

She tries MWEM on all columns with all default parameters. As return_model is not specified she will directly receive a synthetic dataframe back.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Biscoe 56.25 20.5 155.0 6250.0 MALE
1 Gentoo Biscoe 56.25 20.5 245.0 4750.0 MALE
2 Adelie Dream 56.25 21.5 185.0 3250.0 FEMALE
3 Gentoo Dream 52.75 13.5 245.0 5250.0 FEMALE
4 Gentoo Dream 63.25 13.5 245.0 3750.0 FEMALE

She now specifies 3 columns and some parameters explained here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["species", "island", "sex"],
    synth_params = {"measure_only": False, "max_retries_exp_mechanism": 5},
    dummy=True,
)
res_dummy.result.df_samples.head()
species island sex
0 Chinstrap Dream FEMALE
1 Gentoo Dream MALE
2 Chinstrap Biscoe FEMALE
3 Chinstrap Biscoe FEMALE
4 Gentoo Dream MALE

Finally it MWEM, she wants to go more in depth and create her own data preparation pipeline. Therefore, she can use Smartnoise-Synth “Data Transformers” explained here and send her own constraints dictionnary for specific steps. This is more for advanced user.

By default, if no constraints are specified, the server creates its automatically a data transformer based on selected columns, synthesizer and metadata.

Here she wants to add a clamping transformation on the continuous columns before training the synthesizer. She add the bounds based on metadata.

[ ]:
bl_bounds = penguin_metadata["columns"]["bill_length_mm"]
bd_bounds = penguin_metadata["columns"]["bill_depth_mm"]
bl_bounds, bd_bounds
({'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 30.0,
  'upper': 65.0},
 {'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 13.0,
  'upper': 23.0})
[ ]:
from snsynth.transform import BinTransformer, ClampTransformer, ChainTransformer, LabelTransformer

my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    ),
    "bill_depth_mm": ChainTransformer(
        [
            ClampTransformer(lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
            BinTransformer(bins=20, lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
        ]
    ),
    "species": LabelTransformer(nullable=True)
}
[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()
bill_length_mm bill_depth_mm species
0 50.875 19.95 Chinstrap
1 44.875 20.85 Gentoo
2 41.875 16.05 Adelie
3 47.125 21.00 Gentoo
4 44.875 20.85 Gentoo

Also a subset of constraints can be specified for certain columns and the server will automatically generate those for the missing columns.

[ ]:
my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    )
}

In this case, only the bill_length will be clamped.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()
bill_length_mm bill_depth_mm species
0 53.125 20.5 Chinstrap
1 50.125 15.5 Adelie
2 50.125 15.5 Adelie
3 50.125 15.5 Adelie
4 54.625 17.5 Gentoo

MST: Maximum Spanning Tree

She now experiments with MST. As the synthesizer is very needy in terms of computation, she selects a subset of column for it. See MST here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    dummy=True,
)
res_dummy.result.df_samples.head()
species sex
0 MALE
1 Gentoo FEMALE
2 FEMALE
3 Adelie MALE
4 Gentoo FEMALE

She can also specify a specific number of samples to get (if return_model is not True):

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    nb_samples = 4,
    dummy=True,
)
res_dummy.result.df_samples
species sex
0 FEMALE
1 MALE
2 Chinstrap MALE
3 Adelie FEMALE

And a condition on these samples. For instance, here, she only wants female samples.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["sex", "species"],
    nb_samples = 4,
    condition = "sex = FEMALE",
    dummy=True,
)
res_dummy.result.df_samples
sex species
0 Gentoo
1
2 Adelie
3 Gentoo

DPCTGAN: Differentially Private Conditional Tabular GAN

She now tries DPCTGAN. A first warning let her know that the random noise generation for this model is not cryptographically secure and if it is not ok for her, she can decode to stop using this synthesizer. Then she does not get a response but an error 422 with an explanation.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpctgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[24], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="dpctgan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:97, in validate_model_response(client, response, response_model)
     95 if job.status == "failed":
     96     assert job.error is not None, "job {job_uid} failed without error !"
---> 97     raise_error_from_model(job.error)
     99 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'Error fitting model: sample_rate=5.0 is not a valid value. Please provide a float between 0 and 1. Try decreasing batch_size in synth_params (default batch_size=500).')

The default parameters of DPCTGAN do not work for PENGUIN dataset. Hence, as advised in the error message, she decreases the batch_size (also she checks the documentation here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    synth_params = {"batch_size": 50},
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Biscoe 45.106190 16.716415 231.220016 4253.058255 MALE
1 Chinstrap Torgersen 48.932801 17.334574 202.085213 4730.876580 MALE
2 Chinstrap Torgersen 45.390894 15.489699 198.972954 4027.705349 FEMALE
3 Chinstrap Dream 56.003239 16.340220 210.331659 3981.057748 MALE
4 Adelie Torgersen 41.854952 15.144781 215.535502 3810.137480 FEMALE

PATEGAN: Private Aggregation of Teacher Ensembles

Unfortunatelly, she is not able to train the pategan synthetizer on the PENGUIN dataset. Hence, she must try another one.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="pategan",
    epsilon=1.0,
    dummy=True,
)
res_dummy
---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[26], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="pategan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:97, in validate_model_response(client, response, response_model)
     95 if job.status == "failed":
     96     assert job.error is not None, "job {job_uid} failed without error !"
---> 97     raise_error_from_model(job.error)
     99 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'pategan not reliable with this dataset.')

PATECTGAN: Conditional tabular GAN using Private Aggregation of Teacher Ensembles

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Biscoe 37.576655 16.970317 206.350563 4852.220871 MALE
1 Chinstrap Biscoe 41.743625 18.780999 206.831843 5129.978105 MALE
2 Chinstrap Biscoe 47.641487 18.473230 227.558169 3462.845579 MALE
3 Gentoo Dream 54.314414 18.642316 225.657928 3326.226145 FEMALE
4 Gentoo Torgersen 46.694295 18.423236 195.639025 5145.398423 FEMALE
[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    select_cols = ["island", "bill_length_mm", "body_mass_g"],
    synth_params = {
        "embedding_dim": 256,
        "generator_dim": (128, 128),
        "discriminator_dim": (256, 256),
        "generator_lr": 0.0003,
        "generator_decay": 1e-05,
        "discriminator_lr": 0.0003,
        "discriminator_decay": 1e-05,
        "batch_size": 500
    },
    nb_samples = 100,
    dummy=True,
)
res_dummy.result.df_samples.head()
island bill_length_mm body_mass_g
0 Torgersen 62.282526 3478.341073
1 Biscoe 59.720804 2531.271100
2 Biscoe 46.183680 5444.812819
3 Torgersen 54.461237 2595.776290
4 Dream 41.082272 4234.085873

DPGAN: DIfferentially Private GAN

For DPGAN, there is the same warning as for DPCTGAN with the cryptographically secure random noise generation.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Dream 59.408093 21.774501 182.858433 3574.388221 FEMALE
1 Gentoo Biscoe 45.653737 18.811784 197.755754 3584.595516 FEMALE
2 Gentoo Dream 46.935709 22.695824 184.080292 4085.711025 FEMALE
3 Gentoo Dream 47.613375 20.382118 192.039980 3633.892506 FEMALE
4 Gentoo Torgersen 47.486346 21.495789 244.985238 3500.759944 FEMALE

One final time she samples with conditions:

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    dummy=True,
)
res_dummy.result.df_samples.head()
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Biscoe 64.592287 17.889545 196.430311 5547.378704 FEMALE
1 Gentoo Torgersen 56.610777 17.608110 198.295114 5344.676420 MALE
2 Adelie Biscoe 47.453223 17.926415 246.210375 6746.744037 MALE
3 Chinstrap Biscoe 58.206975 17.540024 191.467018 6017.837495 MALE
4 Chinstrap Biscoe 47.606777 21.512008 188.292421 6610.772133 MALE

And now on the real dataset

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    nb_samples = 10,
    dummy=False,
)
res_dummy.result.df_samples
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Biscoe 65.000000 17.650500 250.000000 5846.368641 FEMALE
1 Gentoo Biscoe 46.755033 17.386022 243.762323 6292.863309 FEMALE
2 Adelie Biscoe 65.000000 19.964333 234.881747 6435.244948 MALE
3 Gentoo Biscoe 65.000000 16.515368 229.168162 5154.040873 FEMALE
4 Chinstrap Biscoe 65.000000 17.283090 250.000000 6809.538275 MALE
5 Gentoo Biscoe 61.373030 17.146575 229.227242 6436.501563 FEMALE
6 Gentoo Torgersen 49.680814 19.886045 218.070625 6159.562886 MALE
7 Adelie Torgersen 52.848585 17.673031 203.913779 7000.000000 MALE
8 Gentoo Biscoe 46.311444 23.000000 241.793999 5256.193101 FEMALE
9 Gentoo Biscoe 55.132013 17.231155 233.941543 6587.419331 MALE

Step 6: See archives of queries

She now wants to verify all the queries that she did on the real data. It is possible because an archive of all queries is kept in a secure database. With a function call she can see her queries, budget and associated responses.

[ ]:
previous_queries = client.get_previous_queries()

Let’s check the last query

[ ]:
last_query = previous_queries[-1]
last_query
{'user_name': 'Dr.Antartica',
 'dataset_name': 'PENGUIN',
 'dp_library': 'smartnoise_synth',
 'client_input': {'dataset_name': 'PENGUIN',
  'synth_name': 'dpgan',
  'epsilon': 1.0,
  'delta': None,
  'select_cols': [],
  'synth_params': {},
  'nullable': True,
  'constraints': '',
  'return_model': False,
  'condition': 'body_mass_g > 5000',
  'nb_samples': 10},
 'response': {'epsilon': 1.0,
  'delta': 0.00015673368198174188,
  'requested_by': 'Dr.Antartica',
  'result':                       res_type  \
  index         sn_synth_samples
  columns       sn_synth_samples
  data          sn_synth_samples
  index_names   sn_synth_samples
  column_names  sn_synth_samples

                                                       df_samples
  index                            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  columns       [species, island, bill_length_mm, bill_depth_m...
  data          [[Gentoo, Biscoe, 65.0, 17.650499559938908, 25...
  index_names                                              [None]
  column_names                                             [None]  },
 'timestamp': 1747224709.1297455}