Setting up GitHub Actions for ISO 19115 validation

Setting up GitHub Actions for ISO 19115 validation automates structural and semantic compliance checks for geospatial metadata before it reaches production. By parsing XML files against the official ISO 19115-1:2014 XSD schema in a continuous integration pipeline, teams can block non-compliant merges, enforce consistent spatial reference fields, and eliminate manual review bottlenecks. This workflow replaces ad-hoc validation scripts with deterministic, version-controlled checks that run on every commit.

Architecture & Dependencies

The most reliable implementation uses a Python runner with lxml for fast, standards-compliant XSD validation. Unlike pure-Python XML parsers, lxml leverages the libxml2 C backend to handle complex namespace resolution, schema imports, and large metadata files efficiently. For official validation patterns, refer to the lxml validation documentation, which details how to instantiate XMLSchema objects and parse error logs.

To prevent network-dependent CI failures, the workflow caches the official ISO 19115 XSD files locally. This guarantees deterministic builds even when the ISO standards portal experiences downtime or rate limits automated requests. This caching strategy is a foundational practice in Spatial Data Schema Linting in CI, where predictable artifact resolution keeps pipeline execution times under two minutes.

Step 1: Configure the GitHub Actions Workflow

Create .github/workflows/iso19115-validate.yml. The workflow triggers on pushes and pull requests that modify XML metadata or the validation tooling. It uses actions/cache@v4 to store downloaded schemas, falling back to a network fetch only on cache misses.

name: ISO 19115 Metadata Validation

on:
  push:
    paths:
      - '**/*.xml'
      - 'scripts/validate_iso19115.py'
      - '.github/workflows/iso19115-validate.yml'
  pull_request:
    paths:
      - '**/*.xml'
      - 'scripts/validate_iso19115.py'

jobs:
  validate-metadata:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'
          cache: 'pip'

      - name: Install dependencies
        run: pip install lxml

      - name: Cache ISO 19115 XSD schemas
        id: cache-xsd
        uses: actions/cache@v4
        with:
          path: .cache/schemas
          key: iso19115-xsd-v1-${{ runner.os }}

      - name: Download schemas (if cache miss)
        if: steps.cache-xsd.outputs.cache-hit != 'true'
        run: |
          mkdir -p .cache/schemas
          curl -sL https://standards.iso.org/ittf/PubliclyAvailableStandards/c063242_ISO_19115-1_2014_Schemas/19115-1_2014.xsd -o .cache/schemas/iso19115-1.xsd
          curl -sL https://standards.iso.org/ittf/PubliclyAvailableStandards/c063242_ISO_19115-1_2014_Schemas/gco.xsd -o .cache/schemas/gco.xsd

      - name: Run ISO 19115 validation
        run: python scripts/validate_iso19115.py --schema-dir .cache/schemas --metadata-dir ./metadata

Step 2: Implement the XSD Validation Script

Save this as scripts/validate_iso19115.py. The script recursively scans a target directory for .xml files, loads the cached schema, and validates each document. It exits with status 1 on failure, which GitHub Actions interprets as a failed check.

#!/usr/bin/env python3
"""ISO 19115-1:2014 XSD validator for CI pipelines."""
import argparse
import sys
from pathlib import Path
from lxml import etree

def validate_xml(xml_path: Path, schema_dir: Path) -> list[str]:
    """Validate a single XML file against the cached ISO 19115 schema."""
    errors = []
    try:
        schema_file = schema_dir / "iso19115-1.xsd"
        if not schema_file.exists():
            return [f"Schema not found at {schema_file}"]

        # Relative xs:import / xs:include resolve via each file's base URL
        schema_doc = etree.parse(str(schema_file))
        schema = etree.XMLSchema(schema_doc)

        # Parse target metadata XML
        xml_doc = etree.parse(str(xml_path))

        # Collect every validation failure from the schema error log
        if not schema.validate(xml_doc):
            for entry in schema.error_log:
                errors.append(f"  Line {entry.line}: {entry.message}")
    except etree.XMLSchemaParseError as e:
        errors.append(f"Schema parse error: {e}")
    except etree.XMLSyntaxError as e:
        errors.append(f"  Malformed XML in {xml_path.name}: {e}")
    except Exception as e:
        errors.append(f"Unexpected error processing {xml_path.name}: {e}")
    return errors

def main():
    parser = argparse.ArgumentParser(description="Validate ISO 19115 XML metadata")
    parser.add_argument("--schema-dir", required=True, type=Path)
    parser.add_argument("--metadata-dir", required=True, type=Path)
    args = parser.parse_args()

    if not args.metadata_dir.is_dir():
        print(f"❌ Metadata directory not found: {args.metadata_dir}")
        sys.exit(1)

    xml_files = list(args.metadata_dir.rglob("*.xml"))
    if not xml_files:
        print("ℹ️ No XML files found in metadata directory. Skipping validation.")
        sys.exit(0)

    all_errors = []
    for xml_file in xml_files:
        file_errors = validate_xml(xml_file, args.schema_dir)
        if file_errors:
            all_errors.append(f"\n📄 {xml_file.relative_to(args.metadata_dir)}")
            all_errors.extend(file_errors)

    if all_errors:
        print(f"\n❌ Validation failed with {len(all_errors)} issue(s):")
        for err in all_errors:
            print(err)
        sys.exit(1)
    else:
        print(f"✅ All {len(xml_files)} metadata files passed ISO 19115 validation.")
        sys.exit(0)

if __name__ == "__main__":
    main()

Step 3: Local Testing & CI Troubleshooting

Before merging, verify the script locally:

python scripts/validate_iso19115.py --schema-dir .cache/schemas --metadata-dir ./metadata

Common failure modes:

  • Missing relative imports: ISO 19115 schemas reference gco.xsd and gmd.xsd using relative paths. Ensure all required XSD files exist in the same cache directory. The workflow above downloads the two most common dependencies; add additional curl commands if your metadata uses extended profiles.
  • Namespace mismatches: lxml strictly enforces namespace URIs. If your XML uses http://www.isotc211.org/2005/gmd but the schema expects http://www.isotc211.org/2005/gco, validation fails. Align your XML namespace declarations with the 2014 standard.
  • CI cache staleness: If the ISO updates a schema, clear the runner cache manually via the GitHub Actions UI or bump the key value in the workflow YAML. For detailed cache management strategies, consult the GitHub caching documentation.

Step 4: Scaling to Policy Enforcement

Once the validation job passes consistently, integrate it into branch protection rules. Require the validate-metadata check to pass before merging pull requests. This transforms a technical validation step into a governance control.

When deployed across multiple repositories, the same pattern scales into a broader CI/CD Validation & Policy Enforcement for Spatial Data strategy. You can extend the Python script to enforce custom business rules alongside XSD validation: verify that gmd:MD_Metadata/gmd:fileIdentifier matches repository naming conventions, ensure gmd:CI_ResponsibleParty contains valid contact emails, or confirm that gmd:referenceSystemInfo aligns with approved EPSG codes. These semantic checks run in milliseconds and output structured JSON or SARIF reports for dashboard consumption.

Next Steps

  • Add pytest unit tests to validate the script against known-good and known-bad ISO 19115 samples.
  • Configure GitHub Actions matrix strategy to validate against multiple schema versions (e.g., ISO 19115-1:2014 vs ISO 19115-3:2018).
  • Export validation results to a centralized compliance dashboard using GitHub Checks API or OpenTelemetry logging.