Mintd: A Lightweight Data Product Framework for Research Labs
mintd helps social science researchers build reproducible, governed data products and research projects. It handles the lifecycle from creation to catalog so you can focus on the research.
The Data Product Lifecycle
- Create a data product or research project with
mintd create - Build reproducible pipelines (ingest, clean, validate) with DVC
- Validate metadata and configuration with
mintd check - Push data to S3-compatible cloud storage with
mintd data push - Catalog your work in the Data Product Catalog with
mintd registry register - Reuse data products as tracked dependencies with
mintd data import
Why use mintd?
Built for Research Reproducibility
mintd automatically initializes version control for both your code (Git) and your data (DVC). Every data product has a versioned pipeline, a machine-readable schema, and governance metadata -- ensuring your results can be audited and replicated.
Data Products as First-Class Citizens
A data product is a versioned, validated, governed dataset with clear ownership. mintd scaffolds the pipeline (ingest -> clean -> validate), generates a Frictionless Table Schema, and tracks who produces and consumes each product.
Multi-Tool Compatibility
Whether you prefer Stata, R, or Python, mintd has you covered. It generates language-specific templates and utilities, including native Stata commands and automated logging, so your workflow stays consistent across tools.
Data Product Catalog
Register your data products in the lab's catalog for discoverability and access control. mintd uses a tokenless GitOps architecture -- no personal access tokens to manage, just SSH keys and the GitHub CLI.
Get Started in Seconds
# Install mintd
uv tool install git+https://github.com/health-care-affordability-lab/mintd.git
# Create a data product
mintd create data --name my-research-project --lang python
# Import an existing data product into your research project
mintd data import aha-annual-survey
Next: Installation Guide | Quick Start | GitHub
Design Notes
The notes/ directory in the repository contains design documents and decision records for significant changes:
notes/plan-metadata-cleanup.md— Rationale for the v1.0 → v1.1 metadata schema cleanup (10 redundancies removed)notes/plan-version-aware-lineage.md— Design for version-aware data lineage trackingnotes/centralize-credential-injection.md— Migration from keyring to AWS profilesnotes/dvc-remote-config-consolidation.md— DVC remote configuration simplificationnotes/guard-against-large-directories.md— Large directory detection fordata add