Skip to content

Project Types

All project types follow AEA Data Editor guidelines for reproducible research.

mintd organizes work around two core concepts:

  • Data products (data_*) -- versioned, validated datasets with clear ownership and a reproducible pipeline. These are the building blocks that research projects consume.
  • Research projects (prj_*) -- analysis projects that import data products as tracked dependencies and produce tables, figures, and estimates.

Data Products (data_*)

Each data product has a DVC pipeline (ingest, clean, validate), a Frictionless Table Schema, and governance metadata. The validated output in data/final/ is the product that other projects can import. Supports Python, R, and Stata.

Python:

data_hospital_project/
├── README.md                 # Project documentation
├── metadata.json             # Product manifest
├── requirements.txt          # Python dependencies
├── data/
│   ├── raw/                  # Original source data (DVC tracked)
│   ├── intermediate/         # Temporary processing results (DVC tracked)
│   └── final/                # Final processed data (DVC tracked)
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json       # Data schema
├── code/
│   ├── _mintd_utils.py       # Utilities (paths, schema generation)
│   ├── ingest.py             # Data acquisition
│   ├── clean.py              # Data cleaning
│   └── validate.py           # Data validation
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml             # DVC variables
└── dvc.yaml                  # Pipeline configuration

R:

data_hospital_project/
├── README.md
├── metadata.json
├── DESCRIPTION               # R package description
├── renv.lock                 # R environment snapshot
├── data/
│   ├── raw/
│   ├── intermediate/
│   └── final/
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json
├── code/
│   ├── _mintd_utils.R
│   ├── ingest.R
│   ├── clean.R
│   └── validate.R
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml
└── dvc.yaml

Stata:

data_hospital_project/
├── README.md
├── metadata.json
├── data/
│   ├── raw/
│   ├── intermediate/
│   └── final/
├── schemas/
│   ├── generate_schema.py    # Schema generation script
│   └── v1/
│       └── schema.json
├── code/
│   ├── _mintd_utils.do
│   ├── ingest.do
│   ├── clean.do
│   └── validate.do
├── .gitignore
├── .dvcignore
├── dvc_vars.yaml
└── dvc.yaml

Research Projects (prj_*)

Research projects consume data products and produce analysis outputs. They import data products as tracked DVC dependencies, so updates to upstream data can be pulled with mintd data update.

Python:

prj_cost_study/
├── README.md                 # AEA-compliant documentation
├── metadata.json             # Project metadata
├── citations.md              # Data and software citations
├── requirements.txt          # Python dependencies
├── run_all.py                # Master run script
├── data/
│   ├── raw/                  # Original source data
│   ├── analysis/             # Processed data for analysis
│   └── enclave-out/          # Outputs downloaded from secure enclaves
├── code/
│   ├── config.py             # Configuration (paths, seeds, lookups)
│   ├── _mintd_utils.py       # Utilities
│   ├── 01_data_prep/         # Data preparation scripts
│   ├── 02_analysis/          # Main analysis scripts
│   │   └── __init__.py
│   ├── 03_tables/            # Table generation
│   └── 04_figures/           # Figure generation
├── results/
│   ├── figures/              # Generated plots
│   ├── tables/               # Generated tables
│   ├── estimates/            # Model outputs
│   └── presentations/        # Presentation materials
├── notebooks/                # Jupyter notebooks
├── docs/                     # Documentation
├── references/               # Reference materials
├── tests/                    # Test files
├── .gitignore
└── .dvcignore

R:

prj_cost_study/
├── README.md
├── metadata.json
├── citations.md
├── DESCRIPTION
├── renv.lock
├── run_all.R                 # Master run script
├── .Rprofile
├── data/
│   ├── raw/
│   ├── analysis/
│   └── enclave-out/          # Outputs downloaded from secure enclaves
├── code/
│   ├── config.R              # Configuration (paths, seeds, lookups)
│   ├── _mintd_utils.R
│   ├── 01_data_prep/
│   ├── 02_analysis/
│   │   └── analysis.R
│   ├── 03_tables/
│   └── 04_figures/
├── results/
│   ├── figures/
│   ├── tables/
│   ├── estimates/
│   └── presentations/
├── notebooks/
├── docs/
├── references/
├── tests/
├── .gitignore
└── .dvcignore

Stata:

prj_cost_study/
├── README.md
├── metadata.json
├── citations.md
├── run_all.do                # Master run script
├── data/
│   ├── raw/
│   ├── analysis/
│   └── enclave-out/          # Outputs downloaded from secure enclaves
├── code/
│   ├── config.do             # Configuration (paths, seeds, lookups)
│   ├── _mintd_utils.do
│   ├── 01_data_prep/
│   ├── 02_analysis/
│   ├── 03_tables/
│   └── 04_figures/
├── results/
│   ├── figures/
│   ├── tables/
│   ├── estimates/
│   └── presentations/
├── notebooks/
├── docs/
├── references/
├── tests/
├── .gitignore
└── .dvcignore

Key Project Files

File Purpose
config.{py,R,do} Centralized paths, random seeds, and lookup functions
run_all.{py,R,do} Master script to run full analysis pipeline
citations.md Data and software citations per AEA guidelines
_mintd_utils.{py,R,do} Path utilities and schema generation helpers

Config Lookup Functions

The config file includes lookup functions for managing analysis specifications:

# Python example
from config import case2tag, case2vars, pretty_name

tag = case2tag("baseline")        # Returns "base"
spec = case2vars("baseline")      # Returns {"depvar": "...", "controls": [...]}
label = pretty_name("outcome")    # Returns "Outcome Variable"

Code Projects (no prefix)

For libraries, packages, and tools that need governance tracking without directory scaffolding. Unlike data and project types, mintd create code only drops a metadata.json — no directories, no DVC, no templates. The repo keeps its own layout.

mintd create code --name mylib --lang python
mylib/
└── metadata.json             # Governance, ownership, access control

Use this when you want the registry to track a code repository for governance, mirroring, or discoverability, but the repo manages its own structure.

When to Use Code vs. Data

Code should live inside a data repo until there's a reason to extract it. The trigger for extraction is a second consumer:

  1. You build a data product with specialized code (e.g., HHI calculation)
  2. Another project needs the code, not just the output
  3. Extract the code into a standalone package and track it with mintd create code

Extraction Checklist

  • [ ] Second consumer exists (not hypothetical)
  • [ ] Code has clear API boundary (inputs/outputs well-defined)
  • [ ] Can be versioned independently of the data pipeline
  • [ ] Has tests that run without the full data pipeline

Secure Enclave Projects (enclave_*)

For air-gapped environments requiring secure data transfer:

enclave_secure_workspace/
├── README.md                 # Enclave documentation
├── metadata.json             # Project metadata
├── enclave_manifest.yaml     # Data transfer tracking
├── requirements.txt          # Dependencies
├── data/
│   └── .gitkeep
├── code/
│   ├── __init__.py
│   ├── registry.py           # Registry integration
│   ├── download.py           # Data pulling logic
│   ├── package.py            # Transfer packaging
│   └── verify.py             # Integrity verification
├── scripts/
│   ├── pull_data.sh          # Pull latest data
│   ├── package_transfer.sh   # Create transfer archive
│   ├── unpack_transfer.sh    # Unpack in enclave
│   └── verify_transfer.sh    # Verify checksums
├── transfers/                # Transfer archives
├── .gitignore
└── .dvcignore