Typical Workflows¶
CSVW-EO workflows usually follow four main steps:
- Generate metadata with the minimal details level
- Review and validate metadata (remove non public information)
- Generate dummy datasets
- Use metadata in DP systems
Jupyter notebook example¶
The CSVW-EO DEMO notebook provides a complete end-to-end demonstration of the workflow, including several practical usage examples.
The repository also includes a set of example metadata files located in: examples/metadata/. These examples illustrate how metadata can be defined at different levels of precision for use in CSVW-EO workflows.
CLI Workflow¶
1. Generate Metadata¶
python make_metadata_from_data.py data.csv \
--privacy_unit user_id
2. Review Metadata¶
Review the generated metadata manually.
Important checks:
- Remove sensitive statistics
- Remove unnecessary keys
- Minimize disclosure
- Verify DP assumptions
3. Validate Metadata¶
Internal schema validation¶
python validate_metadata.py metadata.json
SHACL validation¶
python validate_metadata_shacl.py \
metadata.json \
csvw-eo-constraints.ttl
4. Generate Dummy Dataset¶
python make_dummy_from_metadata.py \
metadata.json \
--rows 1000 \
--output dummy.csv
4. Validate Dummy Structure¶
python assert_same_structure.py \
data.csv \
dummy.csv
Python API Workflow¶
import pandas as pd
from csvw_eo.make_metadata_from_data import make_metadata_from_data
from csvw_eo.validate_metadata import validate_metadata
from csvw_eo.make_dummy_from_metadata import make_dummy_from_metadata
df = pd.read_csv("data.csv")
metadata = make_metadata_from_data(
df,
individual_col="user_id",
)
validate_metadata(metadata)
dummy_df = make_dummy_from_metadata(
metadata,
nb_rows=500,
)
Workflow Recommendations¶
| Step | Recommendation |
|---|---|
| Metadata generation | Use lowest contribution detail possible |
| Validation | Always run both validators |
| Publication | Remove unnecessary information |
| Dummy generation | Use fixed random seeds for reproducibility |
Important Warning¶
Warning
Automatically generated metadata is not safe.
Warning
Human review is mandatory before publication or sharing.
Warning
When in doubt, remove information.