Skip to content

TableMetadata (CSVW-EO)

Overview

TableMetadata is the top-level container for describing a CSVW-EO table in the library. It aggregates:

  • Global privacy constraints
  • Column-level metadata
  • Optional multi-column (group) metadata
  • JSON-LD context information

It supports: - Serialization to JSON-LD via to_dict() - Deserialization from JSON-LD via from_dict()


Fields

Core Privacy Parameters

Field Type Description
privacy_unit str \| None Identifier defining the unit of privacy (e.g., user ID).
max_contributions int \| None Maximum number of contributions per privacy unit across the dataset.
max_length int \| None Maximum number of rows per privacy unit.
public_length int \| None Total number of rows in the released dataset (public size).

Schema Definition

Field Type Description
columns list[ColumnMetadata] List of column metadata definitions forming the table schema.
column_groups list[ColumnGroupMetadata] \| None Optional definitions for grouped columns sharing joint partitioning or constraints.

JSON-LD Metadata

Field Type Description
context list[str] JSON-LD context. Defaults to [CSVW_CONTEXT, CSVW_SAFE_CONTEXT].
table_type str JSON-LD type of the object. Defaults to "Table".

to_dict() and from_dict() Representation

The to_dict() method converts the TableMetadata instance into a CSVW-EO compliant JSON-LD dictionary.

The from_dict() method reconstructs a TableMetadata instance from a CSVW-EO compliant JSON-LD dictionary.

Field Mapping

Python Attribute JSON Key Notes
context @context JSON-LD context array
table_type @type Typically "Table"
privacy_unit privacyUnit May be null
max_contributions maxContributions May be null
max_length maxLength May be null
public_length publicLength May be null
columns tableSchema.columns Always present
column_groups additionalInformation Only included if not None

Serialization (to_dict())

  • Produces a JSON-LD compliant dictionary
  • Always includes:
  • @context
  • @type
  • tableSchema.columns
  • Includes optional fields even if their value is None
  • Serializes nested objects via:
  • ColumnMetadata.to_dict()
  • ColumnGroupMetadata.to_dict()

Deserialization (from_dict())

  • Reconstructs a full TableMetadata object from a dictionary
  • Parses nested structures:
  • Columns via ColumnMetadata.from_dict()
  • Column groups via ColumnGroupMetadata.from_dict()
  • Uses .get() for optional fields, defaulting to None when missing
  • Preserves JSON-LD metadata (@context, @type) if provided

Parsing Details

  • tableSchema.columns is required and always parsed
  • additionalInformation is optional
  • Missing optional fields do not raise errors (handled by Pydantic defaults)
  • (For now): Input is assumed to already follow CSVW-EO structure (no deep validation layer beyond model constraints). TODO: be more strict here (like datatypes options).

Example

Python pydantic object:

metadata = TableMetadata(
    privacy_unit="user_id",
    max_contributions=5,
    max_length=10,
    public_length=1000,
    columns=[...]
)

Serialized json-ld output after json_repr = metadata.to_dict():

json_repr = {
  "@context": [
    "http://www.w3.org/ns/csvw",
    "path/to/csvw-eo-context.jsonld"
  ],
  "@type": "Table",
  "privacyUnit": "user_id",
  "maxContributions": 5,
  "maxLength": 10,
  "publicLength": 1000,
  "tableSchema": {
    "columns": [
      {
        "@type": "Column",
        "name": "...",
        "datatype": "..."
      }
    ]
  }
}
and then the round-trip conversion:
metadata = TableMetadata.from_dict(json_repr)

Library usage

  • make_metadata_from_data.py generates a valid TableMetadata based on the input metadata and arguments and then serialise it with metadata.to_dict().
  • validate_metadata.py validates the json representation with the from_dict() method.
  • Other scripts (make_dummy_from_metadata.py, csvw_to_opendp_context.py, csvw_to_opendp_margins.py, csvw_to_smarntoise_sql.py) expect the json representation as input but it is recommended to use validate_metadata.py on the json representation first to ensure that the scripts will work.

Summary

TableMetadata is the entry point for CSVW-EO metadata, providing: - A structured way to define dataset privacy constraints - Full schema description (columns + groups) - Bidirectional conversion: - to_dict() → JSON-LD serialization - from_dict() → object reconstruction

This makes it suitable for validation, transformation, and interoperability workflows.