Project Types

All project types follow AEA Data Editor guidelines for reproducible research.

mintd organizes work around two core concepts:

Data products (data_*) -- versioned, validated datasets with clear ownership and a reproducible pipeline. These are the building blocks that research projects consume.
Research projects (prj_*) -- analysis projects that import data products as tracked dependencies and produce tables, figures, and estimates.

Data Products (`data_*`)

Each data product has a DVC pipeline (ingest, clean, validate), a Frictionless Table Schema, and governance metadata. The validated output in data/final/ is the product that other projects can import. Supports Python, R, and Stata.

Python:

data_hospital_project/
├── README.md                 # Project documentation
├── metadata.json             # Product manifest
├── requirements.txt          # Python dependencies
├── data/
│   ├── raw/                  # Original source data (DVC tracked)
│   ├── intermediate/         # Temporary processing results (DVC tracked)
│   └── final/                # Final processed data (DVC tracked)
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json       # Data schema
├── code/
│   ├── _mintd_utils.py       # Utilities (paths, schema generation)
│   ├── ingest.py             # Data acquisition
│   ├── clean.py              # Data cleaning
│   └── validate.py           # Data validation
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml             # DVC variables
└── dvc.yaml                  # Pipeline configuration

R:

data_hospital_project/
├── README.md
├── metadata.json
├── DESCRIPTION               # R package description
├── renv.lock                 # R environment snapshot
├── data/
│   ├── raw/
│   ├── intermediate/
│   └── final/
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json
├── code/
│   ├── _mintd_utils.R
│   ├── ingest.R
│   ├── clean.R
│   └── validate.R
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml
└── dvc.yaml

Stata:

data_hospital_project/
├── README.md
├── metadata.json
├── data/
│   ├── raw/
│   ├── intermediate/
│   └── final/
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json
├── code/
│   ├── _mintd_utils.do
│   ├── ingest.do
│   ├── clean.do
│   └── validate.do
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml
└── dvc.yaml

Research Projects (`prj_*`)

Research projects consume data products and produce analysis outputs. They import data products as tracked DVC dependencies, so updates to upstream data can be pulled with mintd data update.

Python:

├── README.md ├── metadata.json ├── citations.md ├── requirements.txt ├── run_all.py ├── data/ │   ├── raw/ │   ├── analysis/ │   └── enclave-out/ ├── code/ │   ├── config.py │   ├── _mintd_utils.py │   ├── 01_data_prep/ │   ├── 02_analysis/ │   │   └── │   ├── 03_tables/ │   └── 04_figures/ ├── results/ │   ├── figures/ │   ├── tables/ │   ├── estimates/ │   └── presentations/ ├── notebooks/ ├── docs/ ├── references/ ├── tests/ ├── .gitignore └── .dvcignore

href="#__codelineno-3-1">prj_cost_study/ # AEA-compliant documentation # Project metadata # Data and software citations # Python dependencies # Master run script # Original source data # Processed data for analysis # Outputs downloaded from secure enclaves # Configuration (paths, seeds, lookups) # Utilities # Data preparation scripts # Main analysis scripts __init__.py # Table generation # Figure generation # Generated plots # Generated tables # Model outputs # Presentation materials # Jupyter notebooks # Documentation # Reference materials # Test files

R:

prj_cost_study/
├── README.md
├── metadata.json
├── citations.md
├── DESCRIPTION
├── renv.lock
├── run_all.R                 # Master run script
├── .Rprofile
├── data/
│   ├── raw/
│   ├── analysis/
│   └── enclave-out/          # Outputs downloaded from secure enclaves
├── code/
│   ├── config.R              # Configuration (paths, seeds, lookups)
│   ├── _mintd_utils.R
│   ├── 01_data_prep/
│   ├── 02_analysis/
│   │   └── analysis.R
│   ├── 03_tables/
│   └── 04_figures/
├── results/
│   ├── figures/
│   ├── tables/
│   ├── estimates/
│   └── presentations/
├── notebooks/
├── docs/
├── references/
├── tests/
├── .gitignore
└── .dvcignore

Stata:

prj_cost_study/
├── README.md
├── metadata.json
├── citations.md
├── run_all.do                # Master run script
├── data/
│   ├── raw/
│   ├── analysis/
│   └── enclave-out/          # Outputs downloaded from secure enclaves
├── code/
│   ├── config.do             # Configuration (paths, seeds, lookups)
│   ├── _mintd_utils.do
│   ├── 01_data_prep/
│   ├── 02_analysis/
│   ├── 03_tables/
│   └── 04_figures/
├── results/
│   ├── figures/
│   ├── tables/
│   ├── estimates/
│   └── presentations/
├── notebooks/
├── docs/
├── references/
├── tests/
├── .gitignore
└── .dvcignore

Key Project Files

File	Purpose
`config.{py,R,do}`	Centralized paths, random seeds, and lookup functions
`run_all.{py,R,do}`	Master script to run full analysis pipeline
`citations.md`	Data and software citations per AEA guidelines
`_mintd_utils.{py,R,do}`	Path utilities and schema generation helpers

Config Lookup Functions

The config file includes lookup functions for managing analysis specifications:

# Python example
from config import case2tag, case2vars, pretty_name

tag = case2tag("baseline")        # Returns "base"
spec = case2vars("baseline")      # Returns {"depvar": "...", "controls": [...]}
label = pretty_name("outcome")    # Returns "Outcome Variable"

Code Projects (no prefix)

For libraries, packages, and tools that need governance tracking without directory scaffolding. Unlike data and project types, mintd create code only drops a metadata.json — no directories, no DVC, no templates. The repo keeps its own layout.

mintd create code --name mylib --lang python

mylib/
└── metadata.json             # Governance, ownership, access control

Use this when you want the registry to track a code repository for governance, mirroring, or discoverability, but the repo manages its own structure.

When to Use Code vs. Data

Code should live inside a data repo until there's a reason to extract it. The trigger for extraction is a second consumer:

You build a data product with specialized code (e.g., HHI calculation)
Another project needs the code, not just the output
Extract the code into a standalone package and track it with mintd create code

Extraction Checklist

[ ] Second consumer exists (not hypothetical)
[ ] Code has clear API boundary (inputs/outputs well-defined)
[ ] Can be versioned independently of the data pipeline
[ ] Has tests that run without the full data pipeline

Secure Enclave Projects (`enclave_*`)

For air-gapped environments requiring secure data transfer:

enclave_secure_workspace/
├── README.md                 # Enclave documentation
├── metadata.json             # Project metadata
├── enclave_manifest.yaml     # Data transfer tracking
├── requirements.txt          # Dependencies
├── data/
│   └── .gitkeep
├── code/
│   ├── __init__.py
│   ├── registry.py           # Registry integration
│   ├── download.py           # Data pulling logic
│   ├── package.py            # Transfer packaging
│   └── verify.py             # Integrity verification
├── scripts/
│   ├── pull_data.sh          # Pull latest data
│   ├── package_transfer.sh   # Create transfer archive
│   ├── unpack_transfer.sh    # Unpack in enclave
│   └── verify_transfer.sh    # Verify checksums
├── transfers/                # Transfer archives
├── .gitignore
└── .dvcignore