Lomas Client Side: Using Smartnoise-Synth

This notebook showcases how researcher could use the Secure Data Disclosure system. It explains the different functionnalities provided by the lomas-client client library to interact with the secure server.

The secure data are never visible by researchers. They can only access to differentially private responses via queries to the server.

Each user has access to one or multiple projects and for each dataset has a limited budget with \(\epsilon\) and \(\delta\) values.

Step 1: Install the library

To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library lomas-client on her local developping environment.

It can be installed via the pip command:

[ ]:

# !pip install lomas_client

Or using a local version of the client

[ ]:

import sys
import os
sys.path.append(os.path.abspath(os.path.join('..')))

[ ]:

from lomas_client import Client
import numpy as np

Step 2: Initialise the client

Once the library is installed, a Client object must be created. It is responsible for sending sending requests to the server and processing responses in the local environment. It enables a seamless interaction with the server.

The client needs a few parameters to be created. Usually, these would be set in the environment by the system administrator (queen Icebergina) and be transparent to lomas users. In this instance, the following code snippet sets a few of these parameters that are specific to this notebook.

She will only be able to query on the real dataset if the queen Icergina has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some epsilon and delta credit.

[ ]:

# The following would usually be set in the environment by a system administrator
# and be tranparent to lomas users.
# Uncomment them if you are running against a Kubernetes deployment.
# They have already been set for you if you are running locally within a devenv or the Jupyter lab set up by Docker compose.

import os
# os.environ["LOMAS_CLIENT_APP_URL"] = "https://lomas.example.com:443"
# os.environ["LOMAS_CLIENT_KEYCLOAK_URL"] = "https://keycloak.example.com:443"
# os.environ["LOMAS_CLIENT_TELEMETRY__ENABLED"] = "false"
# os.environ["LOMAS_CLIENT_TELEMETRY__COLLECTOR_ENDPOINT"] = "http://otel.example.com:445"
# os.environ["LOMAS_CLIENT_TELEMETRY__COLLECTOR_INSECURE"] = "true"
# os.environ["LOMAS_CLIENT_TELEMETRY__SERVICE_ID"] = "my-app-client"
# os.environ["LOMAS_CLIENT_REALM"] = "lomas"

# We set these ones because they are specific to this notebook.

USER_NAME = "Dr.Antartica"
os.environ["LOMAS_CLIENT_CLIENT_ID"] = USER_NAME
os.environ["LOMAS_CLIENT_CLIENT_SECRET"] = USER_NAME.lower()
os.environ["LOMAS_CLIENT_DATASET_NAME"] = "PENGUIN"

# Note that all client settings can also be passed as keyword arguments to the Client constructor.
# eg. client = Client(client_id = "Dr.Antartica") takes precedence over setting the "LOMAS_CLIENT_CLIENT_ID"
# environment variable.

[ ]:

client = Client()

And that’s it for the preparation. She is now ready to use the various functionnalities offered by lomas-client.

Step 3: Metadata and dummy dataset

Getting dataset metadata

Dr. Antartica has never seen the data and as a first step to understand what is available to her, she would like to check the metadata of the dataset. Therefore, she just needs to call the get_dataset_metadata() function of the client. As this is public information, this does not cost any budget.

This function returns metadata information in a format based on SmartnoiseSQL dictionary format, where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details). Any metadata is required for Smartnoise-SQL is also required here and additional information such that the different categories in a string type column column can be added.

[ ]:

penguin_metadata = client.get_dataset_metadata()
penguin_metadata

{'max_ids': 1,
 'rows': 344,
 'row_privacy': True,
 'censor_dims': False,
 'columns': {'species': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Adelie', 'Chinstrap', 'Gentoo']},
  'island': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 3,
   'categories': ['Torgersen', 'Biscoe', 'Dream']},
  'bill_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 30.0,
   'upper': 65.0},
  'bill_depth_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 13.0,
   'upper': 23.0},
  'flipper_length_mm': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 150.0,
   'upper': 250.0},
  'body_mass_g': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'float',
   'precision': 64,
   'lower': 2000.0,
   'upper': 7000.0},
  'sex': {'private_id': False,
   'nullable': False,
   'max_partition_length': None,
   'max_influenced_partitions': None,
   'max_partition_contributions': None,
   'type': 'string',
   'cardinality': 2,
   'categories': ['MALE', 'FEMALE']}}}

Step 3: Create a Synthetic Dataset keeping all default parameters

We want to get a synthetic model to represent the private data.

Therefore, we use a Smartnoise Synth Synthesizers.

Let’s list the potential options. There respective paramaters are then available in Smarntoise Synth documentation here.

[ ]:

from snsynth import Synthesizer
Synthesizer.list_synthesizers()

['mwem', 'dpctgan', 'patectgan', 'mst', 'pacsynth', 'dpgan', 'pategan', 'aim']

AIM: Adaptive Iterative Mechanism

We start by executing a query on the dummy dataset without specifying any special parameters for AIM (all optional kept as default). Also only works on categorical columns so we select “species” and “island” columns to create a synthetic dataset of these two columns.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
)
res_dummy.result.df_samples

	species	island
0	Chinstrap	Biscoe
1	Gentoo	Biscoe
2	Gentoo	Dream
3	Adelie	Dream
4	Gentoo	Biscoe
...	...	...
195	Gentoo	Torgersen
196	Adelie	Biscoe
197	Gentoo	Biscoe
198	Adelie	Torgersen
199	Gentoo	Torgersen

200 rows × 2 columns

The algorithm works and returned a synthetic dataset. We now estimate the cost of running this command:

[ ]:

res_cost = client.smartnoise_synth.cost(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
)
res_cost

CostResponse(epsilon=1.0, delta=0.0001)

Executing such a query on the private dataset would cost 1.0 epsilon and 0.0001 delta. Dr. Antartica decides to do it with now the flag dummmy to False and specifiying that the wants the aim synthesizer model in return (with return_model = True).

NOTE: if she does not set the parameter return_model = True, then it is False by default and she will get a synthetic dataframe as response directly.

[ ]:

res = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True
)
res.result.model

<snsynth.aim.aim.AIMSynthesizer at 0x75e5e1044500>

She can now get the model and sample results with it. She choose to sample 10 samples.

[ ]:

synth = res.result.model
synth.sample(10)

	species	island
0	Chinstrap	Torgersen
1	Adelie	Torgersen
2	Chinstrap	Torgersen
3	Adelie	Dream
4	Chinstrap	Biscoe
5	Gentoo	Biscoe
6	Gentoo	Dream
7	Gentoo	Biscoe
8	Chinstrap	Biscoe
9	Chinstrap	Torgersen

She now wants to specify some specific parameters to the AIM model. Therefore, she needs to set some parameters in synth_params based on the Smartnoise-Synth documentation here. She decides that she wants to modify the max_model_size to 50 (the default was 80) and tries on the dummy.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="aim",
    epsilon=1.0,
    delta=0.0001,
    select_cols = ["species", "island"],
    dummy=True,
    return_model = True,
    synth_params = {"max_model_size": 50}
)
res_dummy.result.model

<snsynth.aim.aim.AIMSynthesizer at 0x75e5e106e930>

[ ]:

synth = res_dummy.result.model
synth.sample(5)

	species	island
0	Adelie	Biscoe
1	Gentoo	Torgersen
2	Chinstrap	Biscoe
3	Chinstrap	Torgersen
4	Gentoo	Dream

Now that the workflow is understood for AIM, she wants to experiment with various synthesizer on the dummy.

MWEM: Multiplicative Weights Exponential Mechanism

She tries MWEM on all columns with all default parameters. As return_model is not specified she will directly receive a synthetic dataframe back.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Gentoo	Biscoe	56.25	20.5	155.0	6250.0	MALE
1	Gentoo	Biscoe	56.25	20.5	245.0	4750.0	MALE
2	Adelie	Dream	56.25	21.5	185.0	3250.0	FEMALE
3	Gentoo	Dream	52.75	13.5	245.0	5250.0	FEMALE
4	Gentoo	Dream	63.25	13.5	245.0	3750.0	FEMALE

She now specifies 3 columns and some parameters explained here.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["species", "island", "sex"],
    synth_params = {"measure_only": False, "max_retries_exp_mechanism": 5},
    dummy=True,
)
res_dummy.result.df_samples.head()

	species	island	sex
0	Chinstrap	Dream	FEMALE
1	Gentoo	Dream	MALE
2	Chinstrap	Biscoe	FEMALE
3	Chinstrap	Biscoe	FEMALE
4	Gentoo	Dream	MALE

Finally it MWEM, she wants to go more in depth and create her own data preparation pipeline. Therefore, she can use Smartnoise-Synth “Data Transformers” explained here and send her own constraints dictionnary for specific steps. This is more for advanced user.

By default, if no constraints are specified, the server creates its automatically a data transformer based on selected columns, synthesizer and metadata.

Here she wants to add a clamping transformation on the continuous columns before training the synthesizer. She add the bounds based on metadata.

[ ]:

bl_bounds = penguin_metadata["columns"]["bill_length_mm"]
bd_bounds = penguin_metadata["columns"]["bill_depth_mm"]
bl_bounds, bd_bounds

({'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 30.0,
  'upper': 65.0},
 {'private_id': False,
  'nullable': False,
  'max_partition_length': None,
  'max_influenced_partitions': None,
  'max_partition_contributions': None,
  'type': 'float',
  'precision': 64,
  'lower': 13.0,
  'upper': 23.0})

[ ]:

from snsynth.transform import BinTransformer, ClampTransformer, ChainTransformer, LabelTransformer

my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    ),
    "bill_depth_mm": ChainTransformer(
        [
            ClampTransformer(lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
            BinTransformer(bins=20, lower = bd_bounds["lower"] + 2, upper = bd_bounds["upper"] - 2),
        ]
    ),
    "species": LabelTransformer(nullable=True)
}

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()

	bill_length_mm	bill_depth_mm	species
0	50.875	19.95	Chinstrap
1	44.875	20.85	Gentoo
2	41.875	16.05	Adelie
3	47.125	21.00	Gentoo
4	44.875	20.85	Gentoo

Also a subset of constraints can be specified for certain columns and the server will automatically generate those for the missing columns.

[ ]:

my_own_constraints = {
    "bill_length_mm": ChainTransformer(
        [
            ClampTransformer(lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
            BinTransformer(bins = 20, lower = bl_bounds["lower"] + 10, upper = bl_bounds["upper"] - 10),
        ]
    )
}

In this case, only the bill_length will be clamped.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mwem",
    epsilon=1.0,
    select_cols = ["bill_length_mm", "bill_depth_mm", "species"],
    constraints = my_own_constraints,
    dummy=True,
)
res_dummy.result.df_samples.head()

	bill_length_mm	bill_depth_mm	species
0	53.125	20.5	Chinstrap
1	50.125	15.5	Adelie
2	50.125	15.5	Adelie
3	50.125	15.5	Adelie
4	54.625	17.5	Gentoo

MST: Maximum Spanning Tree

She now experiments with MST. As the synthesizer is very needy in terms of computation, she selects a subset of column for it. See MST here.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    dummy=True,
)
res_dummy.result.df_samples.head()

	species	sex
0		MALE
1	Gentoo	FEMALE
2		FEMALE
3	Adelie	MALE
4	Gentoo	FEMALE

She can also specify a specific number of samples to get (if return_model is not True):

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["species", "sex"],
    nb_samples = 4,
    dummy=True,
)
res_dummy.result.df_samples

	species	sex
0		FEMALE
1		MALE
2	Chinstrap	MALE
3	Adelie	FEMALE

And a condition on these samples. For instance, here, she only wants female samples.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="mst",
    epsilon=1.0,
    select_cols = ["sex", "species"],
    nb_samples = 4,
    condition = "sex = FEMALE",
    dummy=True,
)
res_dummy.result.df_samples

	sex	species
0		Gentoo
1
2		Adelie
3		Gentoo

DPCTGAN: Differentially Private Conditional Tabular GAN

She now tries DPCTGAN. A first warning let her know that the random noise generation for this model is not cryptographically secure and if it is not ok for her, she can decode to stop using this synthesizer. Then she does not get a response but an error 422 with an explanation.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy

/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpctgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(

---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[24], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="dpctgan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:97, in validate_model_response(client, response, response_model)
     95 if job.status == "failed":
     96     assert job.error is not None, "job {job_uid} failed without error !"
---> 97     raise_error_from_model(job.error)
     99 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'Error fitting model: sample_rate=5.0 is not a valid value. Please provide a float between 0 and 1. Try decreasing batch_size in synth_params (default batch_size=500).')

The default parameters of DPCTGAN do not work for PENGUIN dataset. Hence, as advised in the error message, she decreases the batch_size (also she checks the documentation here.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="dpctgan",
    epsilon=1.0,
    synth_params = {"batch_size": 50},
    dummy=True,
)
res_dummy.result.df_samples.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Biscoe	45.106190	16.716415	231.220016	4253.058255	MALE
1	Chinstrap	Torgersen	48.932801	17.334574	202.085213	4730.876580	MALE
2	Chinstrap	Torgersen	45.390894	15.489699	198.972954	4027.705349	FEMALE
3	Chinstrap	Dream	56.003239	16.340220	210.331659	3981.057748	MALE
4	Adelie	Torgersen	41.854952	15.144781	215.535502	3810.137480	FEMALE

PATEGAN: Private Aggregation of Teacher Ensembles

Unfortunatelly, she is not able to train the pategan synthetizer on the PENGUIN dataset. Hence, she must try another one.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="pategan",
    epsilon=1.0,
    dummy=True,
)
res_dummy

---------------------------------------------------------------------------
ExternalLibraryException                  Traceback (most recent call last)
Cell In[26], line 1
----> 1 res_dummy = client.smartnoise_synth.query(
      2     synth_name="pategan",
      3     epsilon=1.0,
      4     dummy=True,
      5 )
      6 res_dummy

File ~/work/sdd-poc-server/client/lomas_client/libraries/smartnoise_synth.py:195, in SmartnoiseSynthClient.query(self, synth_name, epsilon, delta, select_cols, synth_params, nullable, constraints, dummy, return_model, condition, nb_samples, nb_rows, seed)
    192 body = request_model.model_validate(body_dict)
    193 res = self.http_client.post(endpoint, body, SMARTNOISE_SYNTH_READ_TIMEOUT)
--> 195 return validate_model_response(self.http_client, res, QueryResponse)

File ~/work/sdd-poc-server/client/lomas_client/utils.py:97, in validate_model_response(client, response, response_model)
     95 if job.status == "failed":
     96     assert job.error is not None, "job {job_uid} failed without error !"
---> 97     raise_error_from_model(job.error)
     99 return response_model.model_validate(job.result)

File ~/work/sdd-poc-server/core/lomas_core/error_handler.py:150, in raise_error_from_model(error_model)
    148     raise InvalidQueryException(error_model.message)
    149 case ExternalLibraryExceptionModel():
--> 150     raise ExternalLibraryException(error_model.library, error_model.message)
    151 case UnauthorizedAccessExceptionModel():
    152     raise UnauthorizedAccessException(error_model.message)

ExternalLibraryException: (<DPLibraries.SMARTNOISE_SYNTH: 'smartnoise_synth'>, 'pategan not reliable with this dataset.')

PATECTGAN: Conditional tabular GAN using Private Aggregation of Teacher Ensembles

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Biscoe	37.576655	16.970317	206.350563	4852.220871	MALE
1	Chinstrap	Biscoe	41.743625	18.780999	206.831843	5129.978105	MALE
2	Chinstrap	Biscoe	47.641487	18.473230	227.558169	3462.845579	MALE
3	Gentoo	Dream	54.314414	18.642316	225.657928	3326.226145	FEMALE
4	Gentoo	Torgersen	46.694295	18.423236	195.639025	5145.398423	FEMALE

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="patectgan",
    epsilon=1.0,
    select_cols = ["island", "bill_length_mm", "body_mass_g"],
    synth_params = {
        "embedding_dim": 256,
        "generator_dim": (128, 128),
        "discriminator_dim": (256, 256),
        "generator_lr": 0.0003,
        "generator_decay": 1e-05,
        "discriminator_lr": 0.0003,
        "discriminator_decay": 1e-05,
        "batch_size": 500
    },
    nb_samples = 100,
    dummy=True,
)
res_dummy.result.df_samples.head()

	island	bill_length_mm	body_mass_g
0	Torgersen	62.282526	3478.341073
1	Biscoe	59.720804	2531.271100
2	Biscoe	46.183680	5444.812819
3	Torgersen	54.461237	2595.776290
4	Dream	41.082272	4234.085873

DPGAN: DIfferentially Private GAN

For DPGAN, there is the same warning as for DPCTGAN with the cryptographically secure random noise generation.

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    dummy=True,
)
res_dummy.result.df_samples.head()

/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Gentoo	Dream	59.408093	21.774501	182.858433	3574.388221	FEMALE
1	Gentoo	Biscoe	45.653737	18.811784	197.755754	3584.595516	FEMALE
2	Gentoo	Dream	46.935709	22.695824	184.080292	4085.711025	FEMALE
3	Gentoo	Dream	47.613375	20.382118	192.039980	3633.892506	FEMALE
4	Gentoo	Torgersen	47.486346	21.495789	244.985238	3500.759944	FEMALE

One final time she samples with conditions:

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    dummy=True,
)
res_dummy.result.df_samples.head()

/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Biscoe	64.592287	17.889545	196.430311	5547.378704	FEMALE
1	Gentoo	Torgersen	56.610777	17.608110	198.295114	5344.676420	MALE
2	Adelie	Biscoe	47.453223	17.926415	246.210375	6746.744037	MALE
3	Chinstrap	Biscoe	58.206975	17.540024	191.467018	6017.837495	MALE
4	Chinstrap	Biscoe	47.606777	21.512008	188.292421	6610.772133	MALE

And now on the real dataset

[ ]:

res_dummy = client.smartnoise_synth.query(
    synth_name="dpgan",
    epsilon=1.0,
    condition = "body_mass_g > 5000",
    nb_samples = 10,
    dummy=False,
)
res_dummy.result.df_samples

/home/azureuser/work/sdd-poc-server/client/lomas_client/utils.py:48: UserWarning: Warning:dpgan synthesizer random generator for noise and shuffling is not cryptographically secure. (pseudo-rng in vanilla PyTorch).
  warnings.warn(

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Gentoo	Biscoe	65.000000	17.650500	250.000000	5846.368641	FEMALE
1	Gentoo	Biscoe	46.755033	17.386022	243.762323	6292.863309	FEMALE
2	Adelie	Biscoe	65.000000	19.964333	234.881747	6435.244948	MALE
3	Gentoo	Biscoe	65.000000	16.515368	229.168162	5154.040873	FEMALE
4	Chinstrap	Biscoe	65.000000	17.283090	250.000000	6809.538275	MALE
5	Gentoo	Biscoe	61.373030	17.146575	229.227242	6436.501563	FEMALE
6	Gentoo	Torgersen	49.680814	19.886045	218.070625	6159.562886	MALE
7	Adelie	Torgersen	52.848585	17.673031	203.913779	7000.000000	MALE
8	Gentoo	Biscoe	46.311444	23.000000	241.793999	5256.193101	FEMALE
9	Gentoo	Biscoe	55.132013	17.231155	233.941543	6587.419331	MALE

Step 6: See archives of queries

She now wants to verify all the queries that she did on the real data. It is possible because an archive of all queries is kept in a secure database. With a function call she can see her queries, budget and associated responses.

[ ]:

previous_queries = client.get_previous_queries()

Let’s check the last query

[ ]:

last_query = previous_queries[-1]
last_query

{'user_name': 'Dr.Antartica',
 'dataset_name': 'PENGUIN',
 'dp_library': 'smartnoise_synth',
 'client_input': {'dataset_name': 'PENGUIN',
  'synth_name': 'dpgan',
  'epsilon': 1.0,
  'delta': None,
  'select_cols': [],
  'synth_params': {},
  'nullable': True,
  'constraints': '',
  'return_model': False,
  'condition': 'body_mass_g > 5000',
  'nb_samples': 10},
 'response': {'epsilon': 1.0,
  'delta': 0.00015673368198174188,
  'requested_by': 'Dr.Antartica',
  'result':                       res_type  \
  index         sn_synth_samples
  columns       sn_synth_samples
  data          sn_synth_samples
  index_names   sn_synth_samples
  column_names  sn_synth_samples

                                                       df_samples
  index                            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  columns       [species, island, bill_length_mm, bill_depth_m...
  data          [[Gentoo, Biscoe, 65.0, 17.650499559938908, 25...
  index_names                                              [None]
  column_names                                             [None]  },
 'timestamp': 1747224709.1297455}