lomas_client.libraries package
Submodules
lomas_client.libraries.diffprivlib module
- class lomas_client.libraries.diffprivlib.DiffPrivLibClient(http_client: LomasHttpClient)[source]
Bases:
object
A client for executing and estimating the cost of DiffPrivLib queries.
- cost(pipeline: Pipeline, feature_columns: List[str] = [''], target_columns: List[str] = [''], test_size: float = 0.2, test_train_split_seed: int = 1, imputer_strategy: str = 'drop') CostResponse | None [source]
This function estimates the cost of executing a DiffPrivLib query.
- Parameters:
pipeline (sklearn.pipeline) –
DiffPrivLib pipeline with three conditions: - The pipeline MUST start with a models.StandardScaler.
Otherwise a PrivacyLeakWarning is raised by DiffPrivLib library and is treated as an error in lomas server.
random_state fields can only be int (RandomState will not work).
accountant fields must be None.
Note: as in DiffPrivLib, avoid any DiffprivlibCompatibilityWarning to ensure that the pipeline does what is intended.
feature_columns (list[str]) – the list of feature column to train
target_columns (list[str], optional) – the list of target column to predict May be None for certain models.
test_size (float, optional) – proportion of the test set Defaults to 0.2.
test_train_split_seed (int, optional) – seed for random train test split Defaults to 1.
imputer_strategy (str, optional) – imputation strategy. Defaults to “drop”. “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values
- Returns:
A dictionary containing the estimated cost.
- Return type:
Optional[dict[str, float]]
- query(pipeline: Pipeline, feature_columns: List[str], target_columns: List[str] | None = None, test_size: float = 0.2, test_train_split_seed: int = 1, imputer_strategy: str = 'drop', dummy: bool = False, nb_rows: int = 100, seed: int = 42) QueryResponse | None [source]
Trains a DiffPrivLib pipeline and return a trained Pipeline.
- Parameters:
pipeline (sklearn.pipeline) –
DiffPrivLib pipeline with three conditions: - The pipeline MUST start with a models.StandardScaler.
Otherwise a PrivacyLeakWarning is raised by DiffPrivLib library and is treated as an error in lomas server.
random_state fields can only be int (RandomState will not work).
accountant fields must be None.
Note: as in DiffPrivLib, avoid any DiffprivlibCompatibilityWarning to ensure that the pipeline does what is intended.
feature_columns (list[str]) – the list of feature column to train
target_columns (list[str], optional) – the list of target column to predict May be None for certain models.
test_size (float, optional) – proportion of the test set Defaults to 0.2.
test_train_split_seed (int, optional) – seed for random train test split Defaults to 1.
imputer_strategy (str, optional) – imputation strategy. Defaults to “drop”. “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values
dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.
nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.
seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.
- Returns:
A trained DiffPrivLip pipeline
- Return type:
Optional[Pipeline]
lomas_client.libraries.opendp module
- class lomas_client.libraries.opendp.OpenDPClient(http_client: LomasHttpClient)[source]
Bases:
object
A client for executing and estimating the cost of OpenDP queries.
- cost(opendp_pipeline: Measurement, fixed_delta: float | None = None) CostResponse | None [source]
This function estimates the cost of executing an OpenDP query.
- Parameters:
opendp_pipeline (dp.Measurement) – The OpenDP pipeline for the query.
fixed_delta (Optional[float], optional) – If the pipeline measurement is of type “ZeroConcentratedDivergence” (e.g. with make_gaussian) then it is converted to “SmoothedMaxDivergence” with make_zCDP_to_approxDP (See Smartnoise-SQL postprocessing documentation.). In that case a fixed_delta must be provided by the user. Defaults to None.
- Returns:
A dictionary containing the estimated cost.
- Return type:
Optional[dict[str, float]]
- query(opendp_pipeline: Measurement, fixed_delta: float | None = None, dummy: bool = False, nb_rows: int = 100, seed: int = 42) QueryResponse | None [source]
This function executes an OpenDP query.
- Parameters:
opendp_pipeline (dp.Measurement) – The OpenDP pipeline for the query.
fixed_delta (Optional[float], optional) – If the pipeline measurement is of type “ZeroConcentratedDivergence” (e.g. with make_gaussian) then it is converted to “SmoothedMaxDivergence” with make_zCDP_to_approxDP (See Smartnoise-SQL postprocessing documentation.). In that case a fixed_delta must be provided by the user. Defaults to None.
dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.
nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.
seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.
- Raises:
Exception – If the server returns dataframes
- Returns:
A Pandas DataFrame containing the query results.
- Return type:
Optional[dict]
lomas_client.libraries.smartnoise_sql module
- class lomas_client.libraries.smartnoise_sql.SmartnoiseSQLClient(http_client: LomasHttpClient)[source]
Bases:
object
A client for executing and estimating the cost of SmartNoise SQL queries.
- cost(query: str, epsilon: float, delta: float, mechanisms: dict[str, str] = {}) CostResponse | None [source]
This function estimates the cost of executing a SmartNoise query.
- Parameters:
query (str) – The SQL query to estimate the cost for. NOTE: the table name is df, the query must end with “FROM df”.
epsilon (float) – Privacy parameter (e.g., 0.1).
delta (float) – Privacy parameter (e.g., 1e-5). mechanisms (dict[str, str], optional): Dictionary of mechanisms for the query See Smartnoise-SQL postprocessing documentation. Defaults to {}.
- Returns:
A dictionary containing the estimated cost.
- Return type:
Optional[dict[str, float]]
- query(query: str, epsilon: float, delta: float, mechanisms: dict[str, str] = {}, postprocess: bool = True, dummy: bool = False, nb_rows: int = 100, seed: int = 42) QueryResponse | None [source]
This function executes a SmartNoise SQL query.
- Parameters:
query (str) – The SQL query to execute. NOTE: the table name is df, the query must end with “FROM df”.
epsilon (float) – Privacy parameter (e.g., 0.1).
delta (float) – Privacy parameter (e.g., 1e-5).
mechanisms (dict[str, str], optional) –
Dictionary of mechanisms for the query See Smartnoise-SQL postprocessing documentation.
Defaults to {}.
postprocess (bool, optional) –
Whether to postprocess the query results. See Smartnoise-SQL postprocessing documentation.
Defaults to True.
dummy (bool, optional) –
Whether to use a dummy dataset.
Defaults to False.
nb_rows (int, optional) –
The number of rows in the dummy dataset.
Defaults to DUMMY_NB_ROWS.
seed (int, optional) –
The random seed for generating the dummy dataset.
Defaults to DUMMY_SEED.
- Returns:
A Pandas DataFrame containing the query results.
- Return type:
Optional[dict]
lomas_client.libraries.smartnoise_synth module
- class lomas_client.libraries.smartnoise_synth.SmartnoiseSynthClient(http_client: LomasHttpClient)[source]
Bases:
object
A client for executing and estimating the cost of SmartNoiseSynth queries.
- cost(synth_name: str, epsilon: float, delta: float | None = None, select_cols: List[str] = [], synth_params: dict = {}, nullable: bool = True, constraints: dict = {}) CostResponse | None [source]
This function estimates the cost of executing a SmartNoise query. :param synth_name: name of the Synthesizer model to use.
- Available synthesizer are
“aim”,
“mwem”,
“dpctgan” with disabled_dp always forced to False and a
warning due to not cryptographically secure random generator - “patectgan” - “dpgan” with a warning due to not cryptographically secure random generator
- Available under certain conditions:
“mst” if return_model=False
“pategan” if the dataset has enough rows
- Not available:
“pacsynth” due to Rust panic error
“quail” currently unavailable in Smartnoise Synth
For further documentation on models, please see here: https://docs.smartnoise.org/synth/index.html#synthesizers-reference
- Parameters:
epsilon (float) – Privacy parameter (e.g., 0.1).
delta (float) – Privacy parameter (e.g., 1e-5).
select_cols (List[str]) – List of columns to select. Defaults to None.
synth_params (dict) – Keyword arguments to pass to the synthesizer constructor. See https://docs.smartnoise.org/synth/synthesizers/index.html#, provide all parameters of the model except epsilon and delta. Defaults to None.
nullable (bool) – True if some data cells may be null Defaults to True.
constraints (dict) – Dictionnary for custom table transformer constraints. Column that are not specified will be inferred based on metadata. Defaults to {}. For further documentation on constraints, please see here: https://docs.smartnoise.org/synth/transforms/index.html. Note: lambda function in AnonimizationTransformer are not supported.
- Returns:
A dictionary containing the estimated cost.
- Return type:
Optional[dict[str, float]]
- query(synth_name: str, epsilon: float, delta: float | None = None, select_cols: List[str] = [], synth_params: dict = {}, nullable: bool = True, constraints: dict = {}, dummy: bool = False, return_model: bool = False, condition: str = '', nb_samples: int = 200, nb_rows: int = 100, seed: int = 42) QueryResponse | None [source]
This function executes a SmartNoise Synthetic query. :param synth_name: name of the Synthesizer model to use.
- Available synthesizer are
“aim”,
“mwem”,
“dpctgan” with disabled_dp always forced to False and a
warning due to not cryptographically secure random generator - “patectgan” - “dpgan” with a warning due to not cryptographically secure random generator
- Available under certain conditions:
“mst” if return_model=False
“pategan” if the dataset has enough rows
- Not available:
“pacsynth” due to Rust panic error
“quail” currently unavailable in Smartnoise Synth
For further documentation on models, please see here: https://docs.smartnoise.org/synth/index.html#synthesizers-reference
- Parameters:
epsilon (float) – Privacy parameter (e.g., 0.1).
delta (float) – Privacy parameter (e.g., 1e-5).
select_cols (List[str]) – List of columns to select. Defaults to None.
synth_params (dict) – Keyword arguments to pass to the synthesizer constructor. See https://docs.smartnoise.org/synth/synthesizers/index.html#, provide all parameters of the model except epsilon and delta. Defaults to None.
nullable (bool) – True if some data cells may be null Defaults to True.
constraints – Dictionnary for custom table transformer constraints. Column that are not specified will be inferred based on metadata. Defaults to {}. For further documentation on constraints, please see here: https://docs.smartnoise.org/synth/transforms/index.html. Note: lambda function in AnonimizationTransformer are not supported.
return_model (bool) – True to get Synthesizer model, False to get samples Defaults to False
condition (Optional[str]) – sampling condition in model.sample (only relevant if return_model is False) Defaults to “”.
nb_samples (Optional[int]) – number of samples to generate. (only relevant if return_model is False) Defaults to SNSYNTH_DEFAULT_SAMPLES_NB
dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.
nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.
seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.
- Returns:
A Pandas DataFrame containing the query results.
- Return type:
Optional[dict]