lomas_server.dp_queries.dp_libraries package

Submodules

lomas_server.dp_queries.dp_libraries.diffprivlib module

class lomas_server.dp_queries.dp_libraries.diffprivlib.DiffPrivLibQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[DiffPrivLibRequestModel, DiffPrivLibQueryModel, DiffPrivLibQueryResult]

Concrete implementation of the DPQuerier ABC for the DiffPrivLib library.

cost(query_json: DiffPrivLibRequestModel) tuple[float, float][source]

Estimate cost of query.

Parameters:

query_json (DiffPrivLibRequestModel) – The request model object.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

fit_model_on_data(query_json: DiffPrivLibRequestModel) tuple[Pipeline, DataFrame, DataFrame][source]

Perform necessary steps to fit the model on the data.

Parameters:

query_json (BaseModel) – The JSON request object for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

the fitted model on the training data x_test (pd.DataFrame): test data feature y_test (pd.DataFrame): test data target

Return type:

dpl_pipeline (dpl model)

query(query_json: DiffPrivLibQueryModel) DiffPrivLibQueryResult[source]

Perform the query and return the response.

Parameters:

query_json (DiffPrivLibQueryModel) – The request model object.

Raises:
Returns:

The dictionary encoding of the resulting pd.DataFrame.

Return type:

dict

lomas_server.dp_queries.dp_libraries.diffprivlib.split_train_test_data(df: DataFrame, query_json: DiffPrivLibRequestModel) tuple[DataFrame, DataFrame, DataFrame, DataFrame][source]

Split the data between train and test set.

Parameters:
  • df (pd.DataFrame) – dataframe with the data

  • query_json (DiffPrivLibRequestModel) – user input query indication feature_columns (list[str]): columns from data to use as features target_columns (list[str]): columns from data to use as target (to predict) test_size (float): proportion of data in the test set test_train_split_seed (int): seed for the random train-test split

Returns:

training data features x_test (pd.DataFrame): testing data features y_train (pd.DataFrame): training data target y_test (pd.DataFrame): testing data target

Return type:

x_train (pd.DataFrame)

lomas_server.dp_queries.dp_libraries.factory module

lomas_server.dp_queries.dp_libraries.opendp module

lomas_server.dp_queries.dp_libraries.smartnoise_sql module

class lomas_server.dp_queries.dp_libraries.smartnoise_sql.SmartnoiseSQLQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSQLRequestModel, SmartnoiseSQLQueryModel, SmartnoiseSQLQueryResult]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSQL library.

cost(query_json: SmartnoiseSQLRequestModel) tuple[float, float][source]

Estimate cost of query.

Parameters:

query_json (SmartnoiseSQLModelCost) – JSON request object for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

query(query_json: SmartnoiseSQLQueryModel) SmartnoiseSQLQueryResult[source]

Performs the query and returns the response.

Parameters:

query_json (SmartnoiseSQLQueryModel) – The request model object.

Returns:

The dictionary encoding of the result pd.DataFrame.

Return type:

dict

query_with_iter(query_json: SmartnoiseSQLQueryModel, nb_iter: int = 0) SmartnoiseSQLQueryResult[source]

Perform the query and return the response.

Parameters:
  • query_json (SmartnoiseSQLQueryModel) – Request object for the query.

  • nb_iter (int, optional) – Number of trials if output is Nan. Defaults to 0.

Raises:
Returns:

The dictionary encoding of the resulting pd.DataFrame.

Return type:

SmartnoiseSQLQueryResult

lomas_server.dp_queries.dp_libraries.smartnoise_sql.convert_to_smartnoise_metadata(metadata: Metadata) dict[source]

Convert Lomas metadata to smartnoise metadata format (for SQL).

Parameters:

metadata (Metadata) – Dataset metadata from admin database

Returns:

metadata of the dataset in smartnoise-sql format

Return type:

dict

lomas_server.dp_queries.dp_libraries.smartnoise_sql.set_mechanisms(privacy: Privacy, mechanisms: dict[str, str]) Privacy[source]

Set privacy mechanisms on the Privacy object.

For more information see: https://docs.smartnoise.org/sql/advanced.html#overriding-mechanisms

Parameters:
  • privacy (Privacy) – Privacy object.

  • mechanisms (dict[str, str]) – Mechanisms to set.

Returns:

The updated Privacy object.

Return type:

Privacy

lomas_server.dp_queries.dp_libraries.smartnoise_synth module

class lomas_server.dp_queries.dp_libraries.smartnoise_synth.SmartnoiseSynthQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSynthRequestModel, SmartnoiseSynthQueryModel, SmartnoiseSynthSamples | SmartnoiseSynthModel]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSynth library.

cost(query_json: SmartnoiseSynthRequestModel) tuple[float, float][source]

Return cost of query_json.

Parameters:

query_json (SmartnoiseSynthRequestModel) – JSON request object for the query.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

# TODO: verify and model.rho

query(query_json: SmartnoiseSynthQueryModel) SmartnoiseSynthSamples | SmartnoiseSynthModel[source]

Perform the query and return the response.

Parameters:

query_json (SmartnoiseSynthQueryModel) – The request object for the query.

Raises:
Returns:

The resulting pd.DataFrame samples.

Return type:

pd.DataFrame

lomas_server.dp_queries.dp_libraries.smartnoise_synth.datetime_to_float(upper: datetime, lower: datetime) float[source]

Convert the upper date as the distance between the upper date and.

lower date as float

Parameters:
  • upper (datetime) – date to convert

  • lower – start date to convert from

lomas_server.dp_queries.dp_libraries.utils module

lomas_server.dp_queries.dp_libraries.utils.handle_missing_data(df: DataFrame, imputer_strategy: str) DataFrame[source]

Impute missing data based on given imputation strategy for NaNs.

Parameters:
  • df (pd.DataFrame) – dataframe with the data

  • imputer_strategy (str) – string to indicate imputatation for NaNs “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values

Raises:

InvalidQueryException – If the “imputer_strategy” does not exist

Returns:

dataframe with the imputed data

Return type:

df (pd.DataFrame)

Module contents