Data Product Catalog
mintd integrates with a Data Product Catalog (the registry) for automatic cataloging, discoverability, and access control enforcement.
Prerequisites
Registry integration requires:
- SSH key configured for GitHub
- GitHub CLI (gh) installed and authenticated: gh auth login
- Push access to the registry repository
Registry Configuration
# Set registry URL (required for registration)
mintd config setup --set registry.url https://github.com/your-org/data-product-catalog
# Or set via environment variable
export MINTD_REGISTRY_URL=https://github.com/your-org/data-product-catalog
Registration Workflow
# Create project with automatic registration
mintd create data --name hospital_project --lang python --register
# Behind the scenes:
# 1. Project scaffolding (Git/DVC setup)
# 2. Clone registry repository via SSH
# 3. Generate catalog entry YAML
# 4. Create feature branch: register-hospital_project
# 5. Commit catalog entry + push branch
# 6. Open PR via GitHub CLI
# 7. Return PR URL to user
# Output:
# β
Created: data_hospital_project
# π Registration PR: https://github.com/org/registry/pull/123
Registry Management Commands
# Register existing projects
mintd registry register --path /path/to/project
# Check registration status
mintd registry status hospital_project
# Process pending registrations (when offline)
mintd registry sync
Registry Features
- β Tokenless Operation: Uses SSH keys + GitHub CLI instead of personal tokens
- β Offline Mode: Queues registrations when network unavailable
- β Automatic Retry: Processes pending registrations on next run
- β PR Tracking: Provides links to registration pull requests
- β Access Control: Automatic permission synchronization via GitHub Actions
GitHub CLI & Git Commands Used
mintd uses a GitOps architecture where all GitHub operations happen via standard Git and the GitHub CLI (gh), eliminating the need for personal access tokens.
GitHub CLI Commands
| Command | Purpose |
|---|---|
gh auth login |
One-time authentication setup (required prerequisite) |
gh pr create --title "..." --body "..." --head <branch> --base main |
Creates pull requests for project registration |
gh pr list --state open --json title,url,headRefName |
Checks registration status by listing open PRs |
Git Commands (via subprocess)
| Command | Purpose |
|---|---|
git clone git@github.com:<org>/<repo>.git |
Clones registry repository via SSH |
git checkout -b register-<project_name> |
Creates feature branch for registration |
git add . |
Stages catalog entry changes |
git commit -m "Register new project: <name>" |
Commits the catalog entry |
git push -u origin <branch> |
Pushes branch to trigger PR workflow |
GitOps Registration Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User's Machine β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β mintd create data --name hospital_project --lang python --registerβ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β 1. Scaffold project β β
β 2. git clone (SSH) βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Clone registry repo β Create branch β Write YAML β β β
β β git commit β git push β gh pr create β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Actions Runner β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β validate-catalog.yml (on PR) β β
β β - Validate YAML schema β β
β β - Check naming conventions β β
β β - Verify access control requirements β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β sync-permissions.yml (on merge to main) β β
β β - Read access_control from catalog YAML β β
β β - Sync GitHub team permissions to repository β β
β β - Apply collaborator settings β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Tokenless?
Traditional approaches require personal access tokens (PATs) that:
- Need manual rotation and secure storage
- Are tied to individual user accounts
- Can become security vulnerabilities if leaked
The GitOps approach instead uses:
- SSH Keys: Already configured for git operations, managed by user
- GitHub CLI: Handles OAuth flow securely via
gh auth login - GitHub Actions: Workflows run with
GITHUB_TOKEN(automatic, scoped, rotated)
This separation means users never handle long-lived tokens, and all sensitive operations happen in controlled GitHub Actions environments.