Skip to content

Data Product Catalog

mintd integrates with a Data Product Catalog (the registry) for automatic cataloging, discoverability, and access control enforcement.

Prerequisites

Registry integration requires: - SSH key configured for GitHub - GitHub CLI (gh) installed and authenticated: gh auth login - Push access to the registry repository

Registry Configuration

# Set registry URL (required for registration)
mintd config setup --set registry.url https://github.com/your-org/data-product-catalog

# Or set via environment variable
export MINTD_REGISTRY_URL=https://github.com/your-org/data-product-catalog

Registration Workflow

# Create project with automatic registration
mintd create data --name hospital_project --lang python --register

# Behind the scenes:
# 1. Project scaffolding (Git/DVC setup)
# 2. Clone registry repository via SSH
# 3. Generate catalog entry YAML
# 4. Create feature branch: register-hospital_project
# 5. Commit catalog entry + push branch
# 6. Open PR via GitHub CLI
# 7. Return PR URL to user

# Output:
# βœ… Created: data_hospital_project
# πŸ“‹ Registration PR: https://github.com/org/registry/pull/123

Registry Management Commands

# Register existing projects
mintd registry register --path /path/to/project

# Check registration status
mintd registry status hospital_project

# Process pending registrations (when offline)
mintd registry sync

Registry Features

  • βœ… Tokenless Operation: Uses SSH keys + GitHub CLI instead of personal tokens
  • βœ… Offline Mode: Queues registrations when network unavailable
  • βœ… Automatic Retry: Processes pending registrations on next run
  • βœ… PR Tracking: Provides links to registration pull requests
  • βœ… Access Control: Automatic permission synchronization via GitHub Actions

GitHub CLI & Git Commands Used

mintd uses a GitOps architecture where all GitHub operations happen via standard Git and the GitHub CLI (gh), eliminating the need for personal access tokens.

GitHub CLI Commands

Command Purpose
gh auth login One-time authentication setup (required prerequisite)
gh pr create --title "..." --body "..." --head <branch> --base main Creates pull requests for project registration
gh pr list --state open --json title,url,headRefName Checks registration status by listing open PRs

Git Commands (via subprocess)

Command Purpose
git clone git@github.com:<org>/<repo>.git Clones registry repository via SSH
git checkout -b register-<project_name> Creates feature branch for registration
git add . Stages catalog entry changes
git commit -m "Register new project: <name>" Commits the catalog entry
git push -u origin <branch> Pushes branch to trigger PR workflow

GitOps Registration Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     User's Machine                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  mintd create data --name hospital_project --lang python --registerβ”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                            β”‚                                    β”‚
β”‚      1. Scaffold project   β”‚                                    β”‚
β”‚      2. git clone (SSH)    β–Ό                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Clone registry repo β†’ Create branch β†’ Write YAML β†’      β”‚  β”‚
β”‚  β”‚  git commit β†’ git push β†’ gh pr create                     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                            β”‚                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    GitHub Actions Runner                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  validate-catalog.yml  (on PR)                            β”‚  β”‚
β”‚  β”‚  - Validate YAML schema                                   β”‚  β”‚
β”‚  β”‚  - Check naming conventions                               β”‚  β”‚
β”‚  β”‚  - Verify access control requirements                     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  sync-permissions.yml  (on merge to main)                 β”‚  β”‚
β”‚  β”‚  - Read access_control from catalog YAML                  β”‚  β”‚
β”‚  β”‚  - Sync GitHub team permissions to repository             β”‚  β”‚
β”‚  β”‚  - Apply collaborator settings                            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Tokenless?

Traditional approaches require personal access tokens (PATs) that:

  • Need manual rotation and secure storage
  • Are tied to individual user accounts
  • Can become security vulnerabilities if leaked

The GitOps approach instead uses:

  • SSH Keys: Already configured for git operations, managed by user
  • GitHub CLI: Handles OAuth flow securely via gh auth login
  • GitHub Actions: Workflows run with GITHUB_TOKEN (automatic, scoped, rotated)

This separation means users never handle long-lived tokens, and all sensitive operations happen in controlled GitHub Actions environments.