lomas_server.dp_queries.dp_libraries package

Submodules

lomas_server.dp_queries.dp_libraries.diffprivlib module

class lomas_server.dp_queries.dp_libraries.diffprivlib.DiffPrivLibQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[DiffPrivLibRequestModel, DiffPrivLibQueryModel, DiffPrivLibQueryResult]

Concrete implementation of the DPQuerier ABC for the DiffPrivLib library.

complete_pipeline(feature_columns: list[str], target_columns: list[str] | None) None[source]

Finalize the DiffPrivLib pipeline by injecting accountant and privacy constraints.

Steps:
  1. Attach the shared budget accountant to all compatible steps.

2. Add metadata-driven privacy constraints (data_norm, bounds, bounds_X, bounds_y) to the first pipeline step when supported.

Parameters:
  • feature_columns – List of feature columns used for training.

  • target_columns – Optional list of target columns (required if bounds_y is needed).

Raises:
cost(query_json: DiffPrivLibRequestModel) tuple[float, float][source]

Estimate the privacy budget cost of running a DiffPrivLib query.

Steps:
  1. Fit the model on the dataset (including accountant injection).

  2. Retrieve the total budget consumed from the accountant.

Parameters:

query_json – The request object describing the query (features, targets, pipeline JSON).

Raises:

ExternalLibraryException – If the pipeline fitting fails.

Returns:

A tuple of (epsilon, delta) costs.

fit_model_on_data(query_json: DiffPrivLibRequestModel) None[source]

Fit the DiffPrivLib pipeline on the dataset provided by the data connector.

Steps:
  1. Validate inputs (no overlap between feature and target columns).

  2. Select and preprocess relevant columns (handle missing data).

  3. Split data into training and test sets.

  4. Deserialize the pipeline and inject server parameters.

  5. Fit the pipeline while treating PrivacyLeakWarning as an error.

Parameters:

query_json – Request object describing feature/target columns, pipeline definition, and preprocessing options.

Raises:
query(query_json: DiffPrivLibQueryModel) DiffPrivLibQueryResult[source]

Run the query on the fitted DiffPrivLib pipeline and return the results.

Parameters:

query_json – The request object describing the query parameters.

Raises:
Returns:

  • score: Model accuracy on the test set.

  • model: The trained DiffPrivLib pipeline.

Return type:

DiffPrivLibQueryResult containing

lomas_server.dp_queries.dp_libraries.diffprivlib.get_dpl_bounds(columns_metadata: dict, feature_columns: list[str]) tuple[list[float], list[float]][source]

Format metadata bounds of feature columns in format expected by DiffPrivLib.

Parameters:
  • columns_metadata (-) – metadata

  • feature_columns (-) – list of feature columns

Returns:

tuple of lower and upper bounds as expected by DiffPrivLib

lomas_server.dp_queries.dp_libraries.diffprivlib.split_train_test_data(df: DataFrame, query_json: DiffPrivLibRequestModel) tuple[DataFrame, DataFrame, DataFrame, DataFrame][source]

Split the data between train and test set.

Parameters:
  • df (pd.DataFrame) – dataframe with the data

  • query_json (DiffPrivLibRequestModel) – user input query indication feature_columns (list[str]): columns from data to use as features target_columns (list[str]): columns from data to use as target (to predict) test_size (float): proportion of data in the test set test_train_split_seed (int): seed for the random train-test split

Returns:

training data features x_test (pd.DataFrame): testing data features y_train (pd.DataFrame): training data target y_test (pd.DataFrame): testing data target

Return type:

x_train (pd.DataFrame)

lomas_server.dp_queries.dp_libraries.factory module

lomas_server.dp_queries.dp_libraries.factory.querier_factory(lib: str, data_connector: DataConnector, admin_database: AdminDatabase) DPQuerier[source]

Builds the correct DPQuerier instance.

Parameters:
  • lib (str) – The library to build the querier for. One of DPLibraries.

  • data_connector (DataConnector) – The dataset to query.

  • admin_database (AdminDatabase) – An initialized instance of an AdminDatabase.

Raises:

InternalServerException – If the library is unknown.

Returns:

The built DPQuerier.

Return type:

DPQuerier

lomas_server.dp_queries.dp_libraries.opendp module

class lomas_server.dp_queries.dp_libraries.opendp.OpenDPQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[OpenDPRequestModel, OpenDPQueryModel, OpenDPQueryResult]

Concrete implementation of the DPQuerier ABC for the OpenDP library.

cost(query_json: OpenDPRequestModel) tuple[float, float][source]

Estimate cost of query.

Parameters:

query_json (OpenDPRequestModel) – The request model object.

Raises:
Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

query(query_json: OpenDPQueryModel) OpenDPQueryResult | OpenDPPolarsQueryResult[source]

Perform the query and return the response.

Parameters:

query_json (OpenDPQueryModel) – The input model for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

(Union[List, int, float]) query result

lomas_server.dp_queries.dp_libraries.opendp.get_output_measure(opendp_pipe: Measurement) str[source]

Get output measure type.

Parameters:

opendp_pipe (dp.Measurement) – Pipeline to get measure type.

Raises:

InternalServerException – If the measure type is unknown.

Returns:

One of OpenDPMeasurement.

Return type:

str

lomas_server.dp_queries.dp_libraries.opendp.has_dataset_input_metric(pipeline: Measurement) None[source]

Check that the input metric of the pipeline is a dataset metric.

Parameters:

pipeline (dp.Measurement) – The pipeline to check.

Raises:

InvalidQueryException – If the pipeline input metric is not a dataset input metric.

lomas_server.dp_queries.dp_libraries.opendp.is_measurement(pipeline: Measurement) None[source]

Check if the pipeline is a measurement.

Parameters:

pipeline (dp.Measurement) – The measurement to check.

Raises:

InvalidQueryException – If the pipeline is not a measurement.

lomas_server.dp_queries.dp_libraries.opendp.set_opendp_features_config(features: Sequence[Literal['contrib', 'floating-point', 'honest-but-curious']]) None[source]

Enable opendp features based on config.

See https://github.com/opendp/opendp/discussions/304

Also sets the “OPENDP_POLARS_LIB_PATH” environment variable for correctly creating private lazyframes from deserialized polars plans.

lomas_server.dp_queries.dp_libraries.opendp.validate_measurement_pipeline(opendp_pipe: Measurement) None[source]

Verify that the pipeline is safe and valid.

Parameters:

pipeline (dp.Measurement) – The pipeline to check.

Raises:

InvalidQueryException – If the pipeline does not meet the requirements.

lomas_server.dp_queries.dp_libraries.smartnoise_sql module

class lomas_server.dp_queries.dp_libraries.smartnoise_sql.SmartnoiseSQLQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSQLRequestModel, SmartnoiseSQLQueryModel, SmartnoiseSQLQueryResult]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSQL library.

cost(query_json: SmartnoiseSQLRequestModel) tuple[float, float][source]

Estimate cost of query.

Parameters:

query_json (SmartnoiseSQLModelCost) – JSON request object for the query.

Raises:

ExternalLibraryException – For exceptions from libraries external to this package.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

query(query_json: SmartnoiseSQLQueryModel) SmartnoiseSQLQueryResult[source]

Performs the query and returns the response.

Parameters:

query_json (SmartnoiseSQLQueryModel) – The request model object.

Returns:

The dictionary encoding of the result pd.DataFrame.

Return type:

dict

query_with_iter(query_json: SmartnoiseSQLQueryModel, nb_iter: int = 0) SmartnoiseSQLQueryResult[source]

Perform the query and return the response.

Parameters:
  • query_json (SmartnoiseSQLQueryModel) – Request object for the query.

  • nb_iter (int, optional) – Number of trials if output is Nan. Defaults to 0.

Raises:
Returns:

The dictionary encoding of the resulting pd.DataFrame.

Return type:

SmartnoiseSQLQueryResult

lomas_server.dp_queries.dp_libraries.smartnoise_sql.convert_to_smartnoise_metadata(metadata: Metadata, query_columns: list[str]) dict[source]

Convert Lomas metadata to smartnoise metadata format (for SQL).

Parameters:
  • metadata (Metadata) – Dataset metadata from admin database

  • query_columns (list[str]) – List of column names used in the query

Returns:

metadata of the dataset in smartnoise-sql format

Return type:

dict

lomas_server.dp_queries.dp_libraries.smartnoise_sql.get_query_columns(query: str) list[str][source]

Extract all column names used in a SQL query.

Traverses the query AST (Abstract Syntax Tree) to find every column reference across SELECT, WHERE, GROUP BY, ORDER BY, etc. Assumes only one table is present in the query.

Parameters:

query (str) – SQL query string.

Returns:

List of unique column names used in the query.

Return type:

list[str]

lomas_server.dp_queries.dp_libraries.smartnoise_sql.set_mechanisms(privacy: Privacy, mechanisms: dict[str, str]) Privacy[source]

Set privacy mechanisms on the Privacy object.

For more information see: https://docs.smartnoise.org/sql/advanced.html#overriding-mechanisms

Parameters:
  • privacy (Privacy) – Privacy object.

  • mechanisms (dict[str, str]) – Mechanisms to set.

Returns:

The updated Privacy object.

Return type:

Privacy

lomas_server.dp_queries.dp_libraries.smartnoise_synth module

class lomas_server.dp_queries.dp_libraries.smartnoise_synth.SmartnoiseSynthQuerier(data_connector: DataConnector, admin_database: AdminDatabase)[source]

Bases: DPQuerier[SmartnoiseSynthRequestModel, SmartnoiseSynthQueryModel, SmartnoiseSynthSamples | SmartnoiseSynthModel]

Concrete implementation of the DPQuerier ABC for the SmartNoiseSynth library.

cost(query_json: SmartnoiseSynthRequestModel) tuple[float, float][source]

Return cost of query_json.

Parameters:

query_json (SmartnoiseSynthRequestModel) – JSON request object for the query.

Returns:

The tuple of costs, the first value

is the epsilon cost, the second value is the delta value.

Return type:

tuple[float, float]

# TODO: verify and model.rho

query(query_json: SmartnoiseSynthQueryModel) SmartnoiseSynthSamples | SmartnoiseSynthModel[source]

Perform the query and return the response.

Parameters:

query_json (SmartnoiseSynthQueryModel) – The request object for the query.

Raises:
Returns:

The resulting pd.DataFrame samples.

Return type:

pd.DataFrame

lomas_server.dp_queries.dp_libraries.smartnoise_synth.datetime_to_float(upper: datetime, lower: datetime) float[source]

Convert the upper date as the distance between the upper date and.

lower date as float

Parameters:
  • upper (datetime) – date to convert

  • lower – start date to convert from

lomas_server.dp_queries.dp_libraries.utils module

lomas_server.dp_queries.dp_libraries.utils.handle_missing_data(df: DataFrame, imputer_strategy: str) DataFrame[source]

Impute missing data based on given imputation strategy for NaNs.

Parameters:
  • df (pd.DataFrame) – dataframe with the data

  • imputer_strategy (str) – string to indicate imputatation for NaNs “drop”: will drop all rows with missing values “mean”: will replace values by the mean of the column values “median”: will replace values by the median of the column values “most_frequent”: : will replace values by the most frequent values

Raises:

InvalidQueryException – If the “imputer_strategy” does not exist

Returns:

dataframe with the imputed data

Return type:

df (pd.DataFrame)

Module contents