Skip to content

Differential Privacy Contributions

CSVW-EO defines metadata for Differential Privacy (DP) calibration.

These properties describe worst-case assumptions about how privacy units may contribute to datasets.

Warning

Contribution assumptions must only describe public, non-sensitive information. More detailed contribution metadata may increase privacy leakage risk and should always be manually reviewed before publication.

Privacy Unit

A privacy unit identifies the entity protected by DP.

Examples:

  • patient
  • user
  • household
  • hospital

Two datasets are considered neighbouring datasets if and only if all rows associated with one privacy unit are added or removed.

CSVW-EO currently assumes a single privacy unit per dataset.

Property Type Meaning Level
privacyUnit string Name of the privacy identifier column Table
privacyId boolean Whether a column identifies privacy units Column

Example

{
  "@type": "csvw:Table",
  "name": "hotpitalisations",
  "privacyUnit": "patient_id",
  "tableSchema": {
    "columns": [
        {
            "@type": "csvw:Column",
            "name": "patient_id",
            "privacyId": true,
            "datatype": "int"
        },
        {
            "@type": "csvw:Column",
            "name": "diagnostic",
            "privacyId": false,
            "datatype": "string"
        },
    ]
  },
  "additionalInformation": []
}

DP Contribution Properties

Property Meaning Table Partition Column / ColumnGroup
maxContributions Maximum rows contributed by one privacy unit (l∞) Yes (1) Yes (3) No
maxLength Maximum dataset or partition size Yes (2) Yes (4) No
publicLength Exact public size Yes Yes No
maxGroupsPerUnit Maximum groups affected by one privacy unit (l0) No No Yes
invariantPublicKeys Whether keys are public independently of privacy units No No Yes

Required DP Fields

Some properties are mandatory for DP calibration depending on the query type.

Requirement Required For Meaning
Yes (1) Table-level queries Maximum contributions in dataset
Yes (2) Table-level queries except counts Maximum dataset size
Yes (3) GROUP BY queries Maximum contributions per group
Yes (4) GROUP BY queries except counts Maximum group size

Other properties are optional but may improve utility and reduce unnecessary DP noise.

maxContributions

Defines the maximum number of rows contributed by one privacy unit.

At:

  • table level → whole dataset
  • partition level → one group

maxGroupsPerUnit

Defines how many groups one privacy unit may affect.

Examples:

  • one patient may appear in 12 months
  • one user may appear in 3 regions

maxLength

Defines theoretical maximum dataset size.

This is useful for:

  • DP calibration
  • numerical stability
  • overflow prevention

Partition-Level Contributions

Partitions may define finer contribution assumptions.

Example:

  • February → 28 contributions
  • July → 31 contributions

Total Influence of a Privacy Unit

The total influence of a privacy unit corresponds to the total number of rows that may be affected when one privacy unit is added or removed from the dataset.

It is defined as:

$$ l_1 = l_0 \cdot l_\infty $$

where:

  • $l_0$ = maxGroupsPerUnit
    (maximum number of groups a privacy unit may affect)

  • $l_\infty$ = maxContributions
    (maximum number of rows contributed within one group)

CSVW-EO does not define $l_1$ as a separate metadata property because its interpretation depends on the query structure and grouping context.

Contribution Levels

CSVW-EO supports multiple granularity levels for contribution metadata.

Level Description Privacy Risk
table Table-level contributions only Lowest
table_with_keys Table-level + public keys Medium
column Per-column/group contributions Medium
partition Fine-grained partition contributions Highest

More detailed contribution assumptions may increase privacy risk.

General recommendation:

  • start with table
  • only increase granularity if required
  • always minimise disclosed metadata

Example

Example of partition-level DP contributions:

{
  "@type": "Partition",
  "predicate": {
    "partitionValue": "Adelie"
  },
  "maxLength": 200,
  "maxGroupsPerUnit": 3,
  "maxContributions": 1
}

Interpretation:

  • at most 200 rows in this partition
  • one privacy unit may affect at most 3 groups
  • one privacy unit contributes at most 1 row inside this partition