Metadata Validation¶
CSVW-EO provides two validation layers:
- Internal schema validation
- SHACL validation
Internal Validation¶
The validate_metadata.py utility validates metadata against the internal Pydantic schema.
It checks:
- Required fields
- Datatypes
- Structural consistency
- Metadata object validity
Usage¶
python validate_metadata.py metadata.json
SHACL Validation¶
The validate_metadata_shacl.py utility validates metadata against RDF SHACL constraints.
This is the strictest validation layer.
It checks:
- RDF consistency
- SHACL constraints
- Cross-field consistency
- Structural restrictions
TODO: As the dataset is not published yet, it is not functional for now (or put the .ttl within the library..).
Usage¶
python validate_metadata_shacl.py \
metadata.json \
csvw-eo-constraints.ttl
Validation Recommendations¶
| Validator | Purpose |
|---|---|
validate_metadata.py |
Fast schema validation |
validate_metadata_shacl.py |
Formal RDF validation |
Both validators should be used before publishing metadata.
Structure Validation¶
The assert_same_structure.py utility validates that a generated dummy dataset matches the structure of the original dataset.
It checks:
- Column names
- Column ordering
- Datatypes
- Nullable constraints
- Optional categorical domains
Usage¶
python assert_same_structure.py \
original.csv \
dummy.csv
Important Notes¶
Validation ensures structural correctness only.
Validation does not guarantee:
- Metadata safety
- Privacy preservation
- Statistical validity