lomas_server.dp_queries.dp_libraries package
Submodules
lomas_server.dp_queries.dp_libraries.diffprivlib module
- class lomas_server.dp_queries.dp_libraries.diffprivlib.DiffPrivLibQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]
Bases:
DPQuerier
[DiffPrivLibRequestModel
,DiffPrivLibQueryModel
]Concrete implementation of the DPQuerier ABC for the DiffPrivLib library.
- cost(query_json: DiffPrivLibRequestModel) tuple[float, float] [source]
Estimate cost of query
- Parameters:
query_json (DiffPrivLibRequestModel) – The request model object.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
- Returns:
- The tuple of costs, the first value
is the epsilon cost, the second value is the delta value.
- Return type:
tuple[float, float]
- fit_model_on_data(query_json: DiffPrivLibRequestModel) tuple[Pipeline, DataFrame, DataFrame] [source]
Perform necessary steps to fit the model on the data
- Parameters:
query_json (BaseModel) – The JSON request object for the query.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
- Returns:
the fitted model on the training data x_test (pd.DataFrame): test data feature y_test (pd.DataFrame): test data target
- Return type:
dpl_pipeline (dpl model)
- query(query_json: DiffPrivLibQueryModel) Dict [source]
Perform the query and return the response.
- Parameters:
query_json (DiffPrivLibQueryModel) – The request model object.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
InvalidQueryException – If the budget values are too small to perform the query.
- Returns:
The dictionary encoding of the resulting pd.DataFrame.
- Return type:
dict
- lomas_server.dp_queries.dp_libraries.diffprivlib.split_train_test_data(df: DataFrame, query_json: DiffPrivLibRequestModel) tuple[DataFrame, DataFrame, DataFrame, DataFrame] [source]
Split the data between train and test set :param df: dataframe with the data :type df: pd.DataFrame :param query_json: user input query indication
feature_columns (list[str]): columns from data to use as features target_columns (list[str]): columns from data to use as target (to predict) test_size (float): proportion of data in the test set test_train_split_seed (int): seed for the random train-test split
- Returns:
training data features x_test (pd.DataFrame): testing data features y_train (pd.DataFrame): training data target y_test (pd.DataFrame): testing data target
- Return type:
x_train (pd.DataFrame)
lomas_server.dp_queries.dp_libraries.factory module
- lomas_server.dp_queries.dp_libraries.factory.querier_factory(lib: str, data_connector: DataConnector, admin_database: AdminDatabase) DPQuerier [source]
Builds the correct DPQuerier instance.
- Parameters:
lib (str) – The library to build the querier for. One of
DPLibraries
.data_connector (DataConnector) – The dataset to query.
admin_database (AdminDatabase) – An initialized instance of an AdminDatabase.
- Raises:
InternalServerException – If the library is unknown.
- Returns:
The built DPQuerier.
- Return type:
lomas_server.dp_queries.dp_libraries.opendp module
- class lomas_server.dp_queries.dp_libraries.opendp.OpenDPQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]
Bases:
DPQuerier
[OpenDPRequestModel
,OpenDPQueryModel
]Concrete implementation of the DPQuerier ABC for the OpenDP library.
- cost(query_json: OpenDPRequestModel) tuple[float, float] [source]
Estimate cost of query
- Parameters:
query_json (OpenDPRequestModel) – The request model object.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
InternalServerException – For any other unforseen exceptions.
InvalidQueryException – The pipeline does not contain a “measurement”, there is not enough budget or the dataset does not exist.
- Returns:
- The tuple of costs, the first value
is the epsilon cost, the second value is the delta value.
- Return type:
tuple[float, float]
- query(query_json: OpenDPQueryModel) List | int | float [source]
Perform the query and return the response.
- Parameters:
query_json (OpenDPQueryModel) – The input model for the query.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
- Returns:
(Union[List, int, float]) query result
- lomas_server.dp_queries.dp_libraries.opendp.get_output_measure(opendp_pipe: Measurement) str [source]
Get output measure type.
- Parameters:
opendp_pipe (dp.Measurement) – Pipeline to get measure type.
- Raises:
InternalServerException – If the measure type is unknown.
- Returns:
One of
OpenDPMeasurement
.- Return type:
str
- lomas_server.dp_queries.dp_libraries.opendp.has_dataset_input_metric(pipeline: Measurement) None [source]
Check that the input metric of the pipeline is a dataset metric
- Parameters:
pipeline (dp.Measurement) – The pipeline to check.
- Raises:
InvalidQueryException – If the pipeline input metric is not a dataset input metric.
- lomas_server.dp_queries.dp_libraries.opendp.is_measurement(pipeline: Measurement) None [source]
Check if the pipeline is a measurement.
- Parameters:
pipeline (dp.Measurement) – The measurement to check.
- Raises:
InvalidQueryException – If the pipeline is not a measurement.
- lomas_server.dp_queries.dp_libraries.opendp.reconstruct_measurement_pipeline(pipeline: str) Measurement [source]
Reconstruct OpenDP pipeline from json representation.
- Parameters:
pipeline (str) – The JSON string encoding of the pipeline.
- Raises:
InvalidQueryException – If the pipeline is not a measurement.
- Returns:
The reconstructed pipeline.
- Return type:
dp.Measurement
- lomas_server.dp_queries.dp_libraries.opendp.set_opendp_features_config(opendp_config: OpenDPConfig)[source]
Enable opendp features based on config See https://github.com/opendp/opendp/discussions/304
- Parameters:
opendp_config (OpenDPConfig) – OpenDP configurations
lomas_server.dp_queries.dp_libraries.smartnoise_sql module
- class lomas_server.dp_queries.dp_libraries.smartnoise_sql.SmartnoiseSQLQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]
Bases:
DPQuerier
[SmartnoiseSQLRequestModel
,SmartnoiseSQLQueryModel
]Concrete implementation of the DPQuerier ABC for the SmartNoiseSQL library.
- cost(query_json: SmartnoiseSQLRequestModel) tuple[float, float] [source]
Estimate cost of query
- Parameters:
query_json (SmartnoiseSQLModelCost) – JSON request object for the query.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
- Returns:
- The tuple of costs, the first value
is the epsilon cost, the second value is the delta value.
- Return type:
tuple[float, float]
- query(query_json: SmartnoiseSQLQueryModel) dict [source]
Performs the query and returns the response.
- Parameters:
query_json (SmartnoiseSQLQueryModel) – The request model object.
- Returns:
The dictionary encoding of the result pd.DataFrame.
- Return type:
dict
- query_with_iter(query_json: SmartnoiseSQLQueryModel, nb_iter: int = 0) dict [source]
Perform the query and return the response.
- Parameters:
query_json (SmartnoiseSQLQueryModel) – Request object for the query.
nb_iter (int, optional) – Number of trials if output is Nan. Defaults to 0.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
InvalidQueryException – If the budget values are too small to perform the query.
- Returns:
The dictionary encoding of the resulting pd.DataFrame.
- Return type:
dict
- lomas_server.dp_queries.dp_libraries.smartnoise_sql.convert_to_smartnoise_metadata(metadata: Metadata) dict [source]
Convert Lomas metadata to smartnoise metadata format (for SQL) :param metadata: Dataset metadata from admin database :type metadata: Metadata
- Returns:
metadata of the dataset in smartnoise-sql format
- Return type:
dict
- lomas_server.dp_queries.dp_libraries.smartnoise_sql.set_mechanisms(privacy: Privacy, mechanisms: dict[str, str]) Privacy [source]
Set privacy mechanisms on the Privacy object.
For more information see: https://docs.smartnoise.org/sql/advanced.html#overriding-mechanisms
- Parameters:
privacy (Privacy) – Privacy object.
mechanisms (dict[str, str]) – Mechanisms to set.
- Returns:
The updated Privacy object.
- Return type:
Privacy
lomas_server.dp_queries.dp_libraries.smartnoise_synth module
- class lomas_server.dp_queries.dp_libraries.smartnoise_synth.SmartnoiseSynthQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]
Bases:
DPQuerier
[SmartnoiseSynthRequestModel
,SmartnoiseSynthQueryModel
]Concrete implementation of the DPQuerier ABC for the SmartNoiseSynth library.
- cost(query_json: SmartnoiseSynthRequestModel) tuple[float, float] [source]
Return cost of query_json
- Parameters:
query_json (SmartnoiseSynthRequestModel) – JSON request object for the query.
- Returns:
- The tuple of costs, the first value
is the epsilon cost, the second value is the delta value.
- Return type:
tuple[float, float]
# TODO: verify and model.rho
- query(query_json: SmartnoiseSynthQueryModel) DataFrame | str [source]
Perform the query and return the response.
- Parameters:
query_json (SmartnoiseSynthQueryModel) – The request object for the query.
- Raises:
ExternalLibraryException – For exceptions from libraries external to this package.
InvalidQueryException – If the budget values are too small to perform the query.
- Returns:
The resulting pd.DataFrame samples.
- Return type:
pd.DataFrame
lomas_server.dp_queries.dp_libraries.utils module
- lomas_server.dp_queries.dp_libraries.utils.handle_missing_data(df: DataFrame, imputer_strategy: str) DataFrame [source]
Impute missing data based on given imputation strategy for NaNs :param df: dataframe with the data :type df: pd.DataFrame :param imputer_strategy: string to indicate imputatation for NaNs
“drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values
- Raises:
InvalidQueryException – If the “imputer_strategy” does not exist
- Returns:
dataframe with the imputed data
- Return type:
df (pd.DataFrame)
- lomas_server.dp_queries.dp_libraries.utils.serialise_model(model: Any) str [source]
Serialise a python object (fitted Smartnoise Synth synthesizer of fitted DiffPrivLib pipeline) into an utf-8 string
- Parameters:
model (Any) – An object to serialise
- Returns:
string of serialised model
- Return type:
str