TableMetadata (CSVW-EO)¶
Overview¶
TableMetadata is the top-level container for describing a CSVW-EO table in the library. It aggregates:
- Global privacy constraints
- Column-level metadata
- Optional multi-column (group) metadata
- JSON-LD context information
It supports:
- Serialization to JSON-LD via to_dict()
- Deserialization from JSON-LD via from_dict()
Fields¶
Core Privacy Parameters¶
| Field | Type | Description |
|---|---|---|
privacy_unit |
str \| None |
Identifier defining the unit of privacy (e.g., user ID). |
max_contributions |
int \| None |
Maximum number of contributions per privacy unit across the dataset. |
max_length |
int \| None |
Maximum number of rows per privacy unit. |
public_length |
int \| None |
Total number of rows in the released dataset (public size). |
Schema Definition¶
| Field | Type | Description |
|---|---|---|
columns |
list[ColumnMetadata] |
List of column metadata definitions forming the table schema. |
column_groups |
list[ColumnGroupMetadata] \| None |
Optional definitions for grouped columns sharing joint partitioning or constraints. |
JSON-LD Metadata¶
| Field | Type | Description |
|---|---|---|
context |
list[str] |
JSON-LD context. Defaults to [CSVW_CONTEXT, CSVW_SAFE_CONTEXT]. |
table_type |
str |
JSON-LD type of the object. Defaults to "Table". |
to_dict() and from_dict() Representation¶
The to_dict() method converts the TableMetadata instance into a CSVW-EO compliant JSON-LD dictionary.
The from_dict() method reconstructs a TableMetadata instance from a CSVW-EO compliant JSON-LD dictionary.
Field Mapping¶
| Python Attribute | JSON Key | Notes |
|---|---|---|
context |
@context |
JSON-LD context array |
table_type |
@type |
Typically "Table" |
privacy_unit |
privacyUnit |
May be null |
max_contributions |
maxContributions |
May be null |
max_length |
maxLength |
May be null |
public_length |
publicLength |
May be null |
columns |
tableSchema.columns |
Always present |
column_groups |
additionalInformation |
Only included if not None |
Serialization (to_dict())¶
- Produces a JSON-LD compliant dictionary
- Always includes:
@context@typetableSchema.columns- Includes optional fields even if their value is
None - Serializes nested objects via:
ColumnMetadata.to_dict()ColumnGroupMetadata.to_dict()
Deserialization (from_dict())¶
- Reconstructs a full
TableMetadataobject from a dictionary - Parses nested structures:
- Columns via
ColumnMetadata.from_dict() - Column groups via
ColumnGroupMetadata.from_dict() - Uses
.get()for optional fields, defaulting toNonewhen missing - Preserves JSON-LD metadata (
@context,@type) if provided
Parsing Details¶
tableSchema.columnsis required and always parsedadditionalInformationis optional- Missing optional fields do not raise errors (handled by Pydantic defaults)
- (For now): Input is assumed to already follow CSVW-EO structure (no deep validation layer beyond model constraints). TODO: be more strict here (like datatypes options).
Example¶
Python pydantic object:
metadata = TableMetadata(
privacy_unit="user_id",
max_contributions=5,
max_length=10,
public_length=1000,
columns=[...]
)
Serialized json-ld output after json_repr = metadata.to_dict():
json_repr = {
"@context": [
"http://www.w3.org/ns/csvw",
"path/to/csvw-eo-context.jsonld"
],
"@type": "Table",
"privacyUnit": "user_id",
"maxContributions": 5,
"maxLength": 10,
"publicLength": 1000,
"tableSchema": {
"columns": [
{
"@type": "Column",
"name": "...",
"datatype": "..."
}
]
}
}
metadata = TableMetadata.from_dict(json_repr)
Library usage¶
make_metadata_from_data.pygenerates a validTableMetadatabased on the input metadata and arguments and then serialise it withmetadata.to_dict().validate_metadata.pyvalidates the json representation with thefrom_dict()method.- Other scripts (
make_dummy_from_metadata.py,csvw_to_opendp_context.py,csvw_to_opendp_margins.py,csvw_to_smarntoise_sql.py) expect the json representation as input but it is recommended to usevalidate_metadata.pyon the json representation first to ensure that the scripts will work.
Summary¶
TableMetadata is the entry point for CSVW-EO metadata, providing:
- A structured way to define dataset privacy constraints
- Full schema description (columns + groups)
- Bidirectional conversion:
- to_dict() → JSON-LD serialization
- from_dict() → object reconstruction
This makes it suitable for validation, transformation, and interoperability workflows.