Dummy Modeling Properties¶
CSVW-EO defines properties useful for generating realistic dummy datasets.
These properties improve dummy data quality while remaining compatible with differential privacy workflows.
Warning
If they disclose private information, they should not be added.
Nullable Proportion¶
| Property | Meaning |
|---|---|
nullableProportion |
Approximate fraction of null values |
maxNumPartitions |
Maximum number of partition (keys) |
Example:
{
"nullableProportion": 0.15,
"maxNumPartitions": 5,
"publicKeys": ["January", "February"],
"exhaustiveKeys": false
}
maxNumPartitions.
Note: nullableProportion may be approximate.
Dependencies¶
Dependencies describe relationships between columns.
| Property | Meaning |
|---|---|
dependsOn |
Source column |
dependencyType |
Type of dependency |
valueMap |
Mapping definition |
Dependency Types¶
bigger¶
Indicates that one column is always greater than another. Only useful for columns whose bounds overlap.
Example:
- date_treatment_1 > date_treatment_2
- date_of_death > date_of_birth
mapping¶
Defines deterministic or constrained mappings.
Example:
{
"dependencyType": "mapping",
"valueMap": {
"child": false,
"adult": true
}
}
Another example:
{
"dependencyType": "mapping",
"valueMap": {
"medical": ["doctor", "nurse"],
"engineer": ["civil", "mechanical"]
}
}
fixedPerEntity¶
Indicates values remain constant for the same privacy unit.
Example:
- birth date per patient
- country of birth per individual
- height per (adult) person
Public Keys¶
CSVW-EO can define known public domains.
| Property | Meaning |
|---|---|
publicKeys |
List of public values |
exhaustiveKeys |
Whether keys are exhaustive |
Example:
{
"publicKeys": ["January", "February"],
"exhaustiveKeys": false
}
{
"keyValues": [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December"
],
"exhaustiveKeys": true
}
exhaustiveKeys=true, all possible keys are publicly listed.
Partitions¶
Partitions describe public regions of the domain.
| Property | Meaning |
|---|---|
partition |
List of csvw-eo:Partition objects |
exhaustivePartitions |
Whether partitions fully cover the domain |
Examples:
- categories
- numerical intervals
- grouped partitions
Partitions may be:
- exhaustive
- overlapping
- disjoint
Both csvw:Column and csvw:ColumnGroup may define partitions.
Example: Column Partitions¶
Example partitioning on the species column:
{
"name": "species",
"datatype": "string",
"partitions": [
{
"@type": "Partition",
"predicate": {
"partitionValue": "Adelie"
},
"maxLength": 152
},
{
"@type": "Partition",
"predicate": {
"partitionValue": "Gentoo"
},
"maxLength": 124
}
],
"keyValues": [
"Adelie",
"Gentoo"
],
"exhaustiveKeys": true,
"maxNumPartitions": 2
}
Example: Column Group Partitions¶
Example partitioning on the (species, island) column:
{
"@type": "ColumnGroup",
"columnsInGroup": [
"species",
"island"
],
"partitions": [
{
"@type": "Partition",
"predicate": {
"species": {
"partitionValue": "Adelie"
},
"island": {
"partitionValue": "Dream"
}
},
"maxLength": 100
}
],
"maxNumPartitions": 5
}