Lomas Client Side: Using Smartnoise-Synth

This notebook showcases how researcher could use the Secure Data Disclosure system. It explains the different functionnalities provided by the lomas-client client library to interact with the secure server.

The secure data are never visible by researchers. They can only access to differentially private responses via queries to the server.

Each user has access to one or multiple projects and for each dataset has a limited budget with \(\epsilon\) and \(\delta\) values.

Step 1: Install the library

To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library lomas-client on her local developping environment.

It can be installed via the pip command:

[ ]:
# !pip install lomas_client

Or using a local version of the client

[ ]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join('..')))
[ ]:
from lomas_client import Client
import numpy as np

Step 2: Initialise the client

Once the library is installed, a Client object must be created. It is responsible for sending sending requests to the server and processing responses in the local environment. It enables a seamless interaction with the server.

The client needs a few parameters to be created. Usually, these would be set in the environment by the system administrator (queen Icebergina) and be transparent to lomas users. In this instance, the following code snippet sets a few of these parameters that are specific to this notebook.

She will only be able to query on the real dataset if the queen Icergina has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some epsilon and delta credit.

[ ]:
# The following would usually be set in the environment by a system administrator
# and be tranparent to lomas users. We reset these ones because they are specific to this notebook.

# Note that all client settings can also be passed as keyword arguments to the Client constructor.
# eg. client = Client(client_id = "Dr.Antartica") takes precedence over setting the "LOMAS_CLIENT_CLIENT_ID"
# environment variable.

import os

USER_NAME = "Dr.Antartica"
os.environ["LOMAS_CLIENT_CLIENT_ID"] = USER_NAME
os.environ["LOMAS_CLIENT_CLIENT_SECRET"] = USER_NAME.lower()
os.environ["LOMAS_CLIENT_DATASET_NAME"] = "PENGUIN"
[ ]:
client = Client()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 client = Client()

NameError: name 'Client' is not defined

And that’s it for the preparation. She is now ready to use the various functionnalities offered by lomas-client.

Step 3: Metadata and dummy dataset

Getting dataset metadata

Dr. Antartica has never seen the data and as a first step to understand what is available to her, she would like to check the metadata of the dataset. Therefore, she just needs to call the get_dataset_metadata() function of the client. As this is public information, this does not cost any budget.

This function returns metadata information in a format based on SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added.

[ ]:
penguin_metadata = client.get_dataset_metadata()
penguin_metadata
{'max_ids': 1,
 'rows': 344,
 'row_privacy': True,
 'censor_dims': False,
 'columns': {'species': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
  'island': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Torgersen', 'Biscoe', 'Dream']},
  'bill_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 30.0,
   'upper': 65.0},
  'bill_depth_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 13.0,
   'upper': 23.0},
  'flipper_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 150.0,
   'upper': 250.0},
  'body_mass_g': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 2000.0,
   'upper': 7000.0},
  'sex': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 2,
   'categories': ['MALE', 'FEMALE']}}}

Step 3: Create a Synthetic Dataset keeping all default parameters

We want to get a synthetic model to represent the private data.

Therefore, we use a Smartnoise Synth Synthesizers.

Let’s list the potential options. There respective paramaters are then available in Smarntoise Synth documentation here.

[ ]:
from snsynth import Synthesizer
Synthesizer.list_synthesizers()
['mwem', 'dpctgan', 'patectgan', 'mst', 'pacsynth', 'dpgan', 'pategan', 'aim']

AIM: Adaptive Iterative Mechanism

We start by executing a query on the dummy dataset without specifying any special parameters for AIM (all optional kept as default). Also only works on categorical columns so we select “species” and “island” columns to create a synthetic dataset of these two columns.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
)
res_dummy.result.df_samples
species island
0 Adelie Dream
1 Chinstrap Torgersen
2 Chinstrap Biscoe
3 Chinstrap Biscoe
4 Torgersen
... ... ...
195 Adelie Torgersen
196 Chinstrap Biscoe
197 Adelie Torgersen
198 Chinstrap Biscoe
199 Gentoo Dream

200 rows × 2 columns

The algorithm works and returned a synthetic dataset. We now estimate the cost of running this command:

[ ]:
res_cost = client.smartnoise_synth.cost(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
)
res_cost
CostResponse(epsilon=1.0, delta=0.0001)

Executing such a query on the private dataset would cost 1.0 epsilon and 0.0001 delta. Dr. Antartica decides to do it with now the flag dummmy to False and specifiying that the wants the aim synthesizer model in return (with return_model = True).

NOTE: if she does not set the parameter return_model = True, then it is False by default and she will get a synthetic dataframe as response directly.

[ ]:
res = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True
)
res.result.model
<snsynth.aim.aim.AIMSynthesizer at 0x7d2cb1dcbd10>

She can now get the model and sample results with it. She choose to sample 10 samples.

[ ]:
synth = res.result.model
synth.sample(10)
species island
0 Gentoo Biscoe
1 Chinstrap Torgersen
2 Gentoo Torgersen
3 Chinstrap Torgersen
4 Gentoo Biscoe
5 Adelie Dream
6 Adelie Biscoe
7 Chinstrap Biscoe
8 Chinstrap Dream
9 Gentoo Torgersen

She now wants to specify some specific parameters to the AIM model. Therefore, she needs to set some parameters in synth_params based on the Smartnoise-Synth documentation here. She decides that she wants to modify the max_model_size to 50 (the default was 80) and tries on the dummy.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True,
    synth_params = {"max_model_size": 50}
)
res_dummy.result.model
<snsynth.aim.aim.AIMSynthesizer at 0x7d2ca18b9100>
[ ]:
synth = res_dummy.result.model
synth.sample(5)
species island
0 Gentoo Biscoe
1 Gentoo Dream
2 Chinstrap Biscoe
3 Chinstrap Torgersen
4 Adelie Dream

Now that the workflow is understood for AIM, she wants to experiment with various synthesizer on the dummy.

MWEM: Multiplicative Weights Exponential Mechanism

She tries MWEM on all columns with all default parameters. As return_model is not specified she will directly receive a synthetic dataframe back.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Dream 49.25 18.5 185.0 2750.0 FEMALE
1 Gentoo Dream 45.75 18.5 185.0 2750.0 FEMALE
2 Adelie Biscoe 49.25 13.5 155.0 2250.0 MALE
3 Gentoo Dream 49.25 18.5 185.0 2750.0 FEMALE
4 Adelie Biscoe 49.25 13.5 155.0 2250.0 MALE

She now specifies 3 columns and some parameters explained here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["species", "island", "sex"],
    synth_params = {"measure_only": False, "max_retries_exp_mechanism": 5},
    dummy=True,
)
res_dummy.result.df_samples.head()
species island sex
0 Gentoo Biscoe MALE
1 Chinstrap Dream MALE
2 Chinstrap Dream MALE
3 Chinstrap Dream MALE
4 Gentoo Torgersen FEMALE

Finally it MWEM, she wants to go more in depth and create her own data preparation pipeline. Therefore, she can use Smartnoise-Synth “Data Transformers” explained here and send her own constraints dictionnary for specific steps. This is more for advanced user.

By default, if no constraints are specified, the server creates its automatically a data transformer based on selected columns, synthesizer and metadata.

Here she wants to add a clamping transformation on the continuous columns before training the synthesizer. She add the bounds based on metadata.

[ ]:
bl_bounds = penguin_metadata["columns"]["bill_length_mm"]
bd_bounds = penguin_metadata["columns"]["bill_depth_mm"]
bl_bounds, bd_bounds
({'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 30.0,
  'upper': 65.0},
 {'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 13.0,
  'upper': 23.0})
[ ]:
from snsynth.transform import BinTransformer, ClampTransformer, ChainTransformer, LabelTransformer

my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    ),
    "bill_depth_mm": ChainTransformer(
        [
            ClampTransformer(lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
            BinTransformer(bins=20, lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
        ]
    ),
    "species": LabelTransformer(nullable=True)
}
[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()
bill_length_mm bill_depth_mm species
0 46.375 17.55 Gentoo
1 50.875 15.75 Chinstrap
2 41.875 15.15 Chinstrap
3 41.875 15.15 Adelie
4 46.375 17.55 Gentoo

Also a subset of constraints can be specified for certain columns and the server will automatically generate those for the missing columns.

[ ]:
my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    )
}

In this case, only the bill_length will be clamped.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()
bill_length_mm bill_depth_mm species
0 49.375 15.5 Gentoo
1 45.625 22.5 Adelie
2 46.375 20.5 Adelie
3 53.125 17.5 Chinstrap
4 54.625 14.5 Adelie

MST: Maximum Spanning Tree

She now experiments with MST. As the synthesizer is very needy in terms of computation, she selects a subset of column for it. See MST here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    dummy=True,
)
res_dummy.result.df_samples.head()
species sex
0 Gentoo FEMALE
1 MALE
2 Chinstrap MALE
3 Adelie FEMALE
4

She can also specify a specific number of samples to get (if return_model is not True):

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    nb_samples = 4,
    dummy=True,
)
res_dummy.result.df_samples
species sex
0 Chinstrap
1 Gentoo
2 MALE
3 FEMALE

And a condition on these samples. For instance, here, she only wants female samples.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["sex", "species"],
    nb_samples = 4,
    condition = "sex = FEMALE",
    dummy=True,
)
res_dummy.result.df_samples
sex species
0 Gentoo
1 Chinstrap
2 Gentoo
3 Gentoo

DPCTGAN: Differentially Private Conditional Tabular GAN

She now tries DPCTGAN. A first warning let her know that the random noise generation for this model is not cryptographically secure and if it is not ok for her, she can decode to stop using this synthesizer. Then she does not get a response but an error 422 with an explanation.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:44: UserWarning: Warning:dpctgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[25], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="dpctgan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:93, in validate_model_response(client, response, response_model)
     91 if job.status == "failed":
     92     assert job.error is not None, "job {job_uid} failed without error !"
---> 93     raise_error_from_model(job.error)
     95 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'Error fitting model: sample_rate=5.0 is not a valid value. Please provide a float between 0 and 1. Try decreasing batch_size in synth_params (default batch_size=500).')

The default parameters of DPCTGAN do not work for PENGUIN dataset. Hence, as advised in the error message, she decreases the batch_size (also she checks the documentation here.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    synth_params = {"batch_size": 50},
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 45.833347 16.692103 194.082665 3149.535030 FEMALE
1 Chinstrap Biscoe 53.732724 18.273553 177.004233 5117.040396 FEMALE
2 Adelie Torgersen 49.115819 16.810560 219.699721 5106.081523 FEMALE
3 Adelie Biscoe 42.522341 16.397532 201.215174 5495.932743 MALE
4 Adelie Torgersen 39.654274 16.744885 228.313026 4522.405903 FEMALE

PATEGAN: Private Aggregation of Teacher Ensembles

Unfortunatelly, she is not able to train the pategan synthetizer on the PENGUIN dataset. Hence, she must try another one.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="pategan",
    epsilon=1.0,
    dummy=True,
)
res_dummy
---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[27], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="pategan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:93, in validate_model_response(client, response, response_model)
     91 if job.status == "failed":
     92     assert job.error is not None, "job {job_uid} failed without error !"
---> 93     raise_error_from_model(job.error)
     95 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'pategan not reliable with this dataset.')

PATECTGAN: Conditional tabular GAN using Private Aggregation of Teacher Ensembles

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 40.007473 14.863616 177.771713 4367.781503 MALE
1 Chinstrap Biscoe 47.799655 18.101346 182.233909 4781.415079 MALE
2 Chinstrap Biscoe 41.795687 16.121351 193.219110 3124.987453 MALE
3 Gentoo Dream 41.408596 21.911954 180.690348 4655.957984 FEMALE
4 Gentoo Biscoe 41.825240 17.597221 190.128309 2562.520325 FEMALE
[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    select_cols = ["island", "bill_length_mm", "body_mass_g"],
    synth_params = {
        "embedding_dim": 256,
        "generator_dim": (128, 128),
        "discriminator_dim": (256, 256),
        "generator_lr": 0.0003,
        "generator_decay": 1e-05,
        "discriminator_lr": 0.0003,
        "discriminator_decay": 1e-05,
        "batch_size": 500
    },
    nb_samples = 100,
    dummy=True,
)
res_dummy.result.df_samples.head()
island bill_length_mm body_mass_g
0 Dream 62.184163 3563.350335
1 Biscoe 58.693441 2519.153178
2 Biscoe 45.244734 5277.579844
3 Torgersen 53.086722 2477.480292
4 Dream 39.586384 4253.510337

DPGAN: DIfferentially Private GAN

For DPGAN, there is the same warning as for DPCTGAN with the cryptographically secure random noise generation.

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:44: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Dream 50.926630 23.000000 196.682433 4906.792127 FEMALE
1 Chinstrap Dream 43.686233 22.855870 186.157387 4108.924724 FEMALE
2 Adelie Biscoe 43.874988 22.465074 250.000000 4141.524814 FEMALE
3 Gentoo Dream 49.637254 19.829533 190.552057 3293.796897 FEMALE
4 Gentoo Biscoe 65.000000 23.000000 185.239148 4287.198659 FEMALE

One final time she samples with conditions:

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    dummy=True,
)
res_dummy.result.df_samples.head()
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:44: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Biscoe 65.000000 17.500878 194.463225 5220.095709 FEMALE
1 Gentoo Torgersen 65.000000 17.846123 236.159381 7000.000000 FEMALE
2 Adelie Biscoe 62.844849 17.839089 195.672168 7000.000000 FEMALE
3 Adelie Dream 62.495059 23.000000 213.272040 7000.000000 MALE
4 Chinstrap Biscoe 65.000000 16.639676 228.477314 7000.000000 MALE

And now on the real dataset

[ ]:
res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    nb_samples = 10,
    dummy=False,
)
res_dummy.result.df_samples
/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:44: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Gentoo Torgersen 43.037990 17.304015 242.335990 5220.064640 FEMALE
1 Gentoo Torgersen 54.582117 18.683831 196.131904 5238.780871 FEMALE
2 Gentoo Torgersen 47.120516 22.075984 181.180426 7000.000000
3 Chinstrap Torgersen 65.000000 17.883028 197.028062 5078.756407 MALE
4 Adelie Dream 65.000000 16.657900 231.202775 7000.000000 MALE
5 Adelie Dream 65.000000 17.988916 185.333617 7000.000000 MALE
6 Adelie Torgersen 65.000000 18.632700 250.000000 5856.899589 MALE
7 Adelie Biscoe 44.833169 15.574961 191.562866 6073.620439 FEMALE
8 Adelie Dream 65.000000 17.337221 199.819478 7000.000000
9 Adelie Torgersen 43.839323 21.674445 212.702855 7000.000000

Step 6: See archives of queries

She now wants to verify all the queries that she did on the real data. It is possible because an archive of all queries is kept in a secure database. With a function call she can see her queries, budget and associated responses.

[ ]:
previous_queries = client.get_previous_queries()

Let’s check the last query

[ ]:
last_query = previous_queries[-1]
last_query
{'user_name': 'Dr.Antartica',
 'dataset_name': 'PENGUIN',
 'dp_library': 'smartnoise_synth',
 'client_input': {'dataset_name': 'PENGUIN',
  'synth_name': 'dpgan',
  'epsilon': 1.0,
  'delta': None,
  'select_cols': [],
  'synth_params': {},
  'nullable': True,
  'constraints': '',
  'return_model': False,
  'condition': 'body_mass_g > 5000',
  'nb_samples': 10},
 'response': {'epsilon': 1.0,
  'delta': 0.00015673368198174188,
  'requested_by': 'Dr.Antartica',
  'result':                       res_type  \
  index         sn_synth_samples
  columns       sn_synth_samples
  data          sn_synth_samples
  index_names   sn_synth_samples
  column_names  sn_synth_samples

                                                       df_samples
  index                            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  columns       [species, island, bill_length_mm, bill_depth_m...
  data          [[Gentoo, Torgersen, 43.03798981010914, 17.304...
  index_names                                              [None]
  column_names                                             [None]  },
 'timestamp': 1746086943.2004273}