lomas_client package

Submodules

lomas_client.client module

class lomas_client.client.Client(url: str, user_name: str, dataset_name: str)[source]

Bases: object

Client class to send requests to the server Handle all serialisation and deserialisation steps

diffprivlib_query(pipeline: Pipeline, feature_columns: List[str], target_columns: List[str] | None = None, test_size: float = 0.2, test_train_split_seed: int = 1, imputer_strategy: str = 'drop', dummy: bool = False, nb_rows: int = 100, seed: int = 42) Pipeline[source]

This function trains a DiffPrivLib pipeline on the sensitive data and return a trained Pipeline.

Parameters:
  • pipeline (sklearn.pipeline) –

    DiffPrivLib pipeline with three conditions: - The pipeline MUST start with a models.StandardScaler.

    Otherwise a PrivacyLeakWarning is raised by DiffPrivLib library and is treated as an error in lomas server.

    • random_state fields can only be int (RandomState will not work).

    • accountant fields must be None.

    Note: as in DiffPrivLib, avoid any DiffprivlibCompatibilityWarning to ensure that the pipeline does what is intended.

  • feature_columns (list[str]) – the list of feature column to train

  • target_columns (list[str], optional) – the list of target column to predict May be None for certain models.

  • test_size (float, optional) – proportion of the test set Defaults to 0.2.

  • test_train_split_seed (int, optional) – seed for random train test split Defaults to 1.

  • imputer_strategy (str, optional) – imputation strategy. Defaults to “drop”. “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values

  • dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.

  • nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.

  • seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.

Returns:

A trained DiffPrivLip pipeline

Return type:

Optional[Pipeline]

estimate_diffprivlib_cost(pipeline: Pipeline, feature_columns: List[str] = [''], target_columns: List[str] = [''], test_size: float = 0.2, test_train_split_seed: int = 1, imputer_strategy: str = 'drop') dict[source]

This function estimates the cost of executing a DiffPrivLib query.

Parameters:
  • pipeline (sklearn.pipeline) –

    DiffPrivLib pipeline with three conditions: - The pipeline MUST start with a models.StandardScaler.

    Otherwise a PrivacyLeakWarning is raised by DiffPrivLib library and is treated as an error in lomas server.

    • random_state fields can only be int (RandomState will not work).

    • accountant fields must be None.

    Note: as in DiffPrivLib, avoid any DiffprivlibCompatibilityWarning to ensure that the pipeline does what is intended.

  • feature_columns (list[str]) – the list of feature column to train

  • target_columns (list[str], optional) – the list of target column to predict May be None for certain models.

  • test_size (float, optional) – proportion of the test set Defaults to 0.2.

  • test_train_split_seed (int, optional) – seed for random train test split Defaults to 1.

  • imputer_strategy (str, optional) – imputation strategy. Defaults to “drop”. “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values

Returns:

A dictionary containing the estimated cost.

Return type:

Optional[dict[str, float]]

estimate_opendp_cost(opendp_pipeline: Measurement, fixed_delta: float | None = None) dict[str, float] | None[source]

This function estimates the cost of executing an OpenDP query.

Parameters:
  • opendp_pipeline (dp.Measurement) – The OpenDP pipeline for the query.

  • fixed_delta (Optional[float], optional) – If the pipeline measurement is of type “ZeroConcentratedDivergence” (e.g. with make_gaussian) then it is converted to “SmoothedMaxDivergence” with make_zCDP_to_approxDP (See Smartnoise-SQL postprocessing documentation.). In that case a fixed_delta must be provided by the user. Defaults to None.

Returns:

A dictionary containing the estimated cost.

Return type:

Optional[dict[str, float]]

estimate_smartnoise_sql_cost(query: str, epsilon: float, delta: float, mechanisms: dict[str, str] = {}) dict[str, float] | None[source]

This function estimates the cost of executing a SmartNoise query.

Parameters:
  • query (str) – The SQL query to estimate the cost for. NOTE: the table name is df, the query must end with “FROM df”.

  • epsilon (float) – Privacy parameter (e.g., 0.1).

  • delta (float) – Privacy parameter (e.g., 1e-5). mechanisms (dict[str, str], optional): Dictionary of mechanisms for the query See Smartnoise-SQL postprocessing documentation. Defaults to {}.

Returns:

A dictionary containing the estimated cost.

Return type:

Optional[dict[str, float]]

estimate_smartnoise_synth_cost(synth_name: str, epsilon: float, delta: float | None = None, select_cols: List[str] = [], synth_params: dict = {}, nullable: bool = True, constraints: dict = {}) dict[str, float] | None[source]

This function estimates the cost of executing a SmartNoise query.

Parameters:
  • synth_name (str) –

    name of the Synthesizer model to use. Available synthesizer are

    • ”aim”,

    • ”mwem”,

    • ”dpctgan” with disabled_dp always forced to False and a

    warning due to not cryptographically secure random generator - “patectgan” - “dpgan” with a warning due to not cryptographically secure random generator

    Available under certain conditions:
    • ”mst” if return_model=False

    • ”pategan” if the dataset has enough rows

    Not available:
    • ”pacsynth” due to Rust panic error

    • ”quail” currently unavailable in Smartnoise Synth

    For further documentation on models, please see here: https://docs.smartnoise.org/synth/index.html#synthesizers-reference

  • epsilon (float) – Privacy parameter (e.g., 0.1).

  • delta (float) – Privacy parameter (e.g., 1e-5).

  • select_cols (List[str]) – List of columns to select. Defaults to None.

  • synth_params (dict) – Keyword arguments to pass to the synthesizer constructor. See https://docs.smartnoise.org/synth/synthesizers/index.html#, provide all parameters of the model except epsilon and delta. Defaults to None.

  • nullable (bool) – True if some data cells may be null Defaults to True.

  • constraints (dict) – Dictionnary for custom table transformer constraints. Column that are not specified will be inferred based on metadata. Defaults to {}. For further documentation on constraints, please see here: https://docs.smartnoise.org/synth/transforms/index.html. Note: lambda function in AnonimizationTransformer are not supported.

Returns:

A dictionary containing the estimated cost.

Return type:

Optional[dict[str, float]]

get_dataset_metadata() Dict[str, int | bool | Dict[str, str | int]] | None[source]

This function retrieves metadata for the dataset.

Returns:

A dictionary containing dataset metadata.

Return type:

Optional[Dict[str, Union[int, bool, Dict[str, Union[str, int]]]]]

get_dummy_dataset(nb_rows: int = 100, seed: int = 42) DataFrame | None[source]

This function retrieves a dummy dataset with optional parameters.

Parameters:
  • nb_rows (int, optional) –

    The number of rows in the dummy dataset.

    Defaults to DUMMY_NB_ROWS.

  • seed (int, optional) –

    The random seed for generating the dummy dataset.

    Defaults to DUMMY_SEED.

Returns:

A Pandas DataFrame representing the dummy dataset.

Return type:

Optional[pd.DataFrame]

get_initial_budget() dict[str, float] | None[source]

This function retrieves the initial budget.

Returns:

A dictionary containing the initial budget.

Return type:

Optional[dict[str, float]]

get_previous_queries() List[dict] | None[source]

This function retrieves the previous queries of the user.

Raises:

ValueError – If an unknown query type is encountered during deserialization.

Returns:

A list of dictionary containing the different queries on the private dataset.

Return type:

Optional[List[dict]]

get_remaining_budget() dict[str, float] | None[source]

This function retrieves the remaining budget.

Returns:

A dictionary containing the remaining budget.

Return type:

Optional[dict[str, float]]

get_total_spent_budget() dict[str, float] | None[source]

This function retrieves the total spent budget.

Returns:

A dictionary containing the total spent budget.

Return type:

Optional[dict[str, float]]

opendp_query(opendp_pipeline: Measurement, fixed_delta: float | None = None, dummy: bool = False, nb_rows: int = 100, seed: int = 42) dict | None[source]

This function executes an OpenDP query.

Parameters:
  • opendp_pipeline (dp.Measurement) – The OpenDP pipeline for the query.

  • fixed_delta (Optional[float], optional) – If the pipeline measurement is of type “ZeroConcentratedDivergence” (e.g. with make_gaussian) then it is converted to “SmoothedMaxDivergence” with make_zCDP_to_approxDP (See Smartnoise-SQL postprocessing documentation.). In that case a fixed_delta must be provided by the user. Defaults to None.

  • dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.

  • nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.

  • seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.

Raises:

Exception – If the server returns dataframes

Returns:

A Pandas DataFrame containing the query results.

Return type:

Optional[dict]

smartnoise_sql_query(query: str, epsilon: float, delta: float, mechanisms: dict[str, str] = {}, postprocess: bool = True, dummy: bool = False, nb_rows: int = 100, seed: int = 42) dict | None[source]

This function executes a SmartNoise SQL query.

Parameters:
  • query (str) – The SQL query to execute. NOTE: the table name is df, the query must end with “FROM df”.

  • epsilon (float) – Privacy parameter (e.g., 0.1).

  • delta (float) – Privacy parameter (e.g., 1e-5).

  • mechanisms (dict[str, str], optional) –

    Dictionary of mechanisms for the query See Smartnoise-SQL postprocessing documentation.

    Defaults to {}.

  • postprocess (bool, optional) –

    Whether to postprocess the query results. See Smartnoise-SQL postprocessing documentation.

    Defaults to True.

  • dummy (bool, optional) –

    Whether to use a dummy dataset.

    Defaults to False.

  • nb_rows (int, optional) –

    The number of rows in the dummy dataset.

    Defaults to DUMMY_NB_ROWS.

  • seed (int, optional) –

    The random seed for generating the dummy dataset.

    Defaults to DUMMY_SEED.

Returns:

A Pandas DataFrame containing the query results.

Return type:

Optional[dict]

smartnoise_synth_query(synth_name: str, epsilon: float, delta: float | None = None, select_cols: List[str] = [], synth_params: dict = {}, nullable: bool = True, constraints: dict = {}, dummy: bool = False, return_model: bool = False, condition: str = '', nb_samples: int = 200, nb_rows: int = 100, seed: int = 42) dict | None[source]

This function executes a SmartNoise Synthetic query.

Parameters:
  • synth_name (str) –

    name of the Synthesizer model to use. Available synthesizer are

    • ”aim”,

    • ”mwem”,

    • ”dpctgan” with disabled_dp always forced to False and a

    warning due to not cryptographically secure random generator - “patectgan” - “dpgan” with a warning due to not cryptographically secure random generator

    Available under certain conditions:
    • ”mst” if return_model=False

    • ”pategan” if the dataset has enough rows

    Not available:
    • ”pacsynth” due to Rust panic error

    • ”quail” currently unavailable in Smartnoise Synth

    For further documentation on models, please see here: https://docs.smartnoise.org/synth/index.html#synthesizers-reference

  • epsilon (float) – Privacy parameter (e.g., 0.1).

  • delta (float) – Privacy parameter (e.g., 1e-5).

  • select_cols (List[str]) – List of columns to select. Defaults to None.

  • synth_params (dict) – Keyword arguments to pass to the synthesizer constructor. See https://docs.smartnoise.org/synth/synthesizers/index.html#, provide all parameters of the model except epsilon and delta. Defaults to None.

  • nullable (bool) – True if some data cells may be null Defaults to True.

  • constraints – Dictionnary for custom table transformer constraints. Column that are not specified will be inferred based on metadata. Defaults to {}. For further documentation on constraints, please see here: https://docs.smartnoise.org/synth/transforms/index.html. Note: lambda function in AnonimizationTransformer are not supported.

  • return_model (bool) – True to get Synthesizer model, False to get samples Defaults to False

  • condition (Optional[str]) – sampling condition in model.sample (only relevant if return_model is False) Defaults to “”.

  • nb_samples (Optional[int]) – number of samples to generate. (only relevant if return_model is False) Defaults to SNSYNTH_DEFAULT_SYMPLES_NB

  • dummy (bool, optional) – Whether to use a dummy dataset. Defaults to False.

  • nb_rows (int, optional) – The number of rows in the dummy dataset. Defaults to DUMMY_NB_ROWS.

  • seed (int, optional) – The random seed for generating the dummy dataset. Defaults to DUMMY_SEED.

Returns:

A Pandas DataFrame containing the query results.

Return type:

Optional[dict]

class lomas_client.client.DPLibraries(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

Enum of the DP librairies used in the server WARNING: MUST match those of lomas_server

DIFFPRIVLIB = 'diffprivlib'
OPENDP = 'opendp'
SMARTNOISE_SQL = 'smartnoise_sql'
SMARTNOISE_SYNTH = 'smartnoise_synth'
lomas_client.client.error_message(res: Response) str[source]

Generates an error message based on the HTTP response.

Parameters:

res (requests.Response) – The response object from an HTTP request.

Returns:

A formatted string describing the server error,

including the status code and response text.

Return type:

str

lomas_client.utils module

class lomas_client.utils.SSynthGanSynthesizer(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

GAN Synthesizer models for smartnoise synth

DP_CTGAN = 'dpctgan'
DP_GAN = 'dpgan'
PATE_CTGAN = 'patectgan'
PATE_GAN = 'pategan'
class lomas_client.utils.SSynthMarginalSynthesizer(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

Marginal Synthesizer models for smartnoise synth

AIM = 'aim'
MST = 'mst'
MWEM = 'mwem'
PAC_SYNTH = 'pacsynth'
lomas_client.utils.validate_synthesizer(synth_name: str, return_model: bool = False)[source]

Validate smartnoise synthesizer (some model are not accepted)

Parameters:
  • synth_name (str) – name of the Synthesizer model to use.

  • return_model (bool) – True to get Synthesizer model, False to get samples

Raises:

ValueError – if a synthesizer or its parameters are not valid

Module contents