lomas_server.dp_queries.dp_libraries package

Submodules

lomas_server.dp_queries.dp_libraries.diffprivlib module

class lomas_server.dp_queries.dp_libraries.diffprivlib.DiffPrivLibQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[DiffPrivLibRequestModel, DiffPrivLibQueryModel]

Concrete implementation of the DPQuerier ABC for the DiffPrivLib library.

cost(query_json: DiffPrivLibRequestModel) tuple[float, float][source]

Estimate cost of query

Parameters:

query_json (DiffPrivLibRequestModel) – The request model object.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

fit_model_on_data(query_json: DiffPrivLibRequestModel) tuple[Pipeline, DataFrame, DataFrame][source]

Perform necessary steps to fit the model on the data

Parameters:

query_json (BaseModel) – The JSON request object for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

the fitted model on the training data x_test (pd.DataFrame): test data feature y_test (pd.DataFrame): test data target

Return type:

dpl_pipeline (dpl model)

query(query_json: DiffPrivLibQueryModel) Dict[source]

Perform the query and return the response.

Parameters:

query_json (DiffPrivLibQueryModel) – The request model object.

Raises:
Returns:

The dictionary encoding of the resulting pd.DataFrame.

Return type:

dict

lomas_server.dp_queries.dp_libraries.diffprivlib.split_train_test_data(df: DataFrame, query_json: DiffPrivLibRequestModel) tuple[DataFrame, DataFrame, DataFrame, DataFrame][source]

Split the data between train and test set :param df: dataframe with the data :type df: pd.DataFrame :param query_json: user input query indication

feature_columns (list[str]): columns from data to use as features target_columns (list[str]): columns from data to use as target (to predict) test_size (float): proportion of data in the test set test_train_split_seed (int): seed for the random train-test split

Returns:

training data features x_test (pd.DataFrame): testing data features y_train (pd.DataFrame): training data target y_test (pd.DataFrame): testing data target

Return type:

x_train (pd.DataFrame)

lomas_server.dp_queries.dp_libraries.factory module

lomas_server.dp_queries.dp_libraries.factory.querier_factory(lib: str, data_connector: DataConnector, admin_database: AdminDatabase) DPQuerier[source]

Builds the correct DPQuerier instance.

Parameters:
  • lib (str) – The library to build the querier for. One of DPLibraries.

  • data_connector (DataConnector) – The dataset to query.

  • admin_database (AdminDatabase) – An initialized instance of an AdminDatabase.

Raises:

InternalServerException – If the library is unknown.

Returns:

The built DPQuerier.

Return type:

DPQuerier

lomas_server.dp_queries.dp_libraries.opendp module

class lomas_server.dp_queries.dp_libraries.opendp.OpenDPQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[OpenDPRequestModel, OpenDPQueryModel]

Concrete implementation of the DPQuerier ABC for the OpenDP library.

cost(query_json: OpenDPRequestModel) tuple[float, float][source]

Estimate cost of query

Parameters:

query_json (OpenDPRequestModel) – The request model object.

Raises:
Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

query(query_json: OpenDPQueryModel) List | int | float[source]

Perform the query and return the response.

Parameters:

query_json (OpenDPQueryModel) – The input model for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

(Union[List, int, float]) query result

lomas_server.dp_queries.dp_libraries.opendp.get_output_measure(opendp_pipe: Measurement) str[source]

Get output measure type.

Parameters:

opendp_pipe (dp.Measurement) – Pipeline to get measure type.

Raises:

InternalServerException – If the measure type is unknown.

Returns:

One of OpenDPMeasurement.

Return type:

str

lomas_server.dp_queries.dp_libraries.opendp.has_dataset_input_metric(pipeline: Measurement) None[source]

Check that the input metric of the pipeline is a dataset metric

Parameters:

pipeline (dp.Measurement) – The pipeline to check.

Raises:

InvalidQueryException – If the pipeline input metric is not a dataset input metric.

lomas_server.dp_queries.dp_libraries.opendp.is_measurement(pipeline: Measurement) None[source]

Check if the pipeline is a measurement.

Parameters:

pipeline (dp.Measurement) – The measurement to check.

Raises:

InvalidQueryException – If the pipeline is not a measurement.

lomas_server.dp_queries.dp_libraries.opendp.reconstruct_measurement_pipeline(pipeline: str) Measurement[source]

Reconstruct OpenDP pipeline from json representation.

Parameters:

pipeline (str) – The JSON string encoding of the pipeline.

Raises:

InvalidQueryException – If the pipeline is not a measurement.

Returns:

The reconstructed pipeline.

Return type:

dp.Measurement

lomas_server.dp_queries.dp_libraries.opendp.set_opendp_features_config(opendp_config: OpenDPConfig)[source]

Enable opendp features based on config See https://github.com/opendp/opendp/discussions/304

Parameters:

opendp_config (OpenDPConfig) – OpenDP configurations

lomas_server.dp_queries.dp_libraries.smartnoise_sql module

class lomas_server.dp_queries.dp_libraries.smartnoise_sql.SmartnoiseSQLQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSQLRequestModel, SmartnoiseSQLQueryModel]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSQL library.

cost(query_json: SmartnoiseSQLRequestModel) tuple[float, float][source]

Estimate cost of query

Parameters:

query_json (SmartnoiseSQLModelCost) – JSON request object for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

query(query_json: SmartnoiseSQLQueryModel) dict[source]

Performs the query and returns the response.

Parameters:

query_json (SmartnoiseSQLQueryModel) – The request model object.

Returns:

The dictionary encoding of the result pd.DataFrame.

Return type:

dict

query_with_iter(query_json: SmartnoiseSQLQueryModel, nb_iter: int = 0) dict[source]

Perform the query and return the response.

Parameters:
  • query_json (SmartnoiseSQLQueryModel) – Request object for the query.

  • nb_iter (int, optional) – Number of trials if output is Nan. Defaults to 0.

Raises:
Returns:

The dictionary encoding of the resulting pd.DataFrame.

Return type:

dict

lomas_server.dp_queries.dp_libraries.smartnoise_sql.convert_to_smartnoise_metadata(metadata: Metadata) dict[source]

Convert Lomas metadata to smartnoise metadata format (for SQL) :param metadata: Dataset metadata from admin database :type metadata: Metadata

Returns:

metadata of the dataset in smartnoise-sql format

Return type:

dict

lomas_server.dp_queries.dp_libraries.smartnoise_sql.set_mechanisms(privacy: Privacy, mechanisms: dict[str, str]) Privacy[source]

Set privacy mechanisms on the Privacy object.

For more information see: https://docs.smartnoise.org/sql/advanced.html#overriding-mechanisms

Parameters:
  • privacy (Privacy) – Privacy object.

  • mechanisms (dict[str, str]) – Mechanisms to set.

Returns:

The updated Privacy object.

Return type:

Privacy

lomas_server.dp_queries.dp_libraries.smartnoise_synth module

class lomas_server.dp_queries.dp_libraries.smartnoise_synth.SmartnoiseSynthQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSynthRequestModel, SmartnoiseSynthQueryModel]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSynth library.

cost(query_json: SmartnoiseSynthRequestModel) tuple[float, float][source]

Return cost of query_json

Parameters:

query_json (SmartnoiseSynthRequestModel) – JSON request object for the query.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

# TODO: verify and model.rho

query(query_json: SmartnoiseSynthQueryModel) DataFrame | str[source]

Perform the query and return the response.

Parameters:

query_json (SmartnoiseSynthQueryModel) – The request object for the query.

Raises:
Returns:

The resulting pd.DataFrame samples.

Return type:

pd.DataFrame

lomas_server.dp_queries.dp_libraries.smartnoise_synth.datetime_to_float(upper: datetime, lower: datetime) float[source]

Convert the upper date as the distance between the upper date and lower date as float

Parameters:
  • upper (datetime) – date to convert

  • lower – start date to convert from

lomas_server.dp_queries.dp_libraries.utils module

lomas_server.dp_queries.dp_libraries.utils.handle_missing_data(df: DataFrame, imputer_strategy: str) DataFrame[source]

Impute missing data based on given imputation strategy for NaNs :param df: dataframe with the data :type df: pd.DataFrame :param imputer_strategy: string to indicate imputatation for NaNs

“drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values

Raises:

InvalidQueryException – If the “imputer_strategy” does not exist

Returns:

dataframe with the imputed data

Return type:

df (pd.DataFrame)

lomas_server.dp_queries.dp_libraries.utils.serialise_model(model: Any) str[source]

Serialise a python object (fitted Smartnoise Synth synthesizer of fitted DiffPrivLib pipeline) into an utf-8 string

Parameters:

model (Any) – An object to serialise

Returns:

string of serialised model

Return type:

str

Module contents