Setting up GitHub Actions for ISO 19115 validation
Setting up GitHub Actions for ISO 19115 validation automates structural and semantic compliance checks for geospatial metadata before it reaches production. By parsing XML files against the official ISO 19115-1:2014 XSD schema in a continuous integration pipeline, teams can block non-compliant merges, enforce consistent spatial reference fields, and eliminate manual review bottlenecks. This workflow replaces ad-hoc validation scripts with deterministic, version-controlled checks that run on every commit.
Architecture & Dependencies
The most reliable implementation uses a Python runner with lxml for fast, standards-compliant XSD validation. Unlike pure-Python XML parsers, lxml leverages the libxml2 C backend to handle complex namespace resolution, schema imports, and large metadata files efficiently. For official validation patterns, refer to the lxml validation documentation, which details how to instantiate XMLSchema objects and parse error logs.
To prevent network-dependent CI failures, the workflow caches the official ISO 19115 XSD files locally. This guarantees deterministic builds even when the ISO standards portal experiences downtime or rate limits automated requests. This caching strategy is a foundational practice in Spatial Data Schema Linting in CI, where predictable artifact resolution keeps pipeline execution times under two minutes.
Step 1: Configure the GitHub Actions Workflow
Create .github/workflows/iso19115-validate.yml. The workflow triggers on pushes and pull requests that modify XML metadata or the validation tooling. It uses actions/cache@v4 to store downloaded schemas, falling back to a network fetch only on cache misses.
name: ISO 19115 Metadata Validation
on:
push:
paths:
- '**/*.xml'
- 'scripts/validate_iso19115.py'
- '.github/workflows/iso19115-validate.yml'
pull_request:
paths:
- '**/*.xml'
- 'scripts/validate_iso19115.py'
jobs:
validate-metadata:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'
- name: Install dependencies
run: pip install lxml
- name: Cache ISO 19115 XSD schemas
id: cache-xsd
uses: actions/cache@v4
with:
path: .cache/schemas
key: iso19115-xsd-v1-${{ runner.os }}
- name: Download schemas (if cache miss)
if: steps.cache-xsd.outputs.cache-hit != 'true'
run: |
mkdir -p .cache/schemas
curl -sL https://standards.iso.org/ittf/PubliclyAvailableStandards/c063242_ISO_19115-1_2014_Schemas/19115-1_2014.xsd -o .cache/schemas/iso19115-1.xsd
curl -sL https://standards.iso.org/ittf/PubliclyAvailableStandards/c063242_ISO_19115-1_2014_Schemas/gco.xsd -o .cache/schemas/gco.xsd
- name: Run ISO 19115 validation
run: python scripts/validate_iso19115.py --schema-dir .cache/schemas --metadata-dir ./metadata
Step 2: Implement the XSD Validation Script
Save this as scripts/validate_iso19115.py. The script recursively scans a target directory for .xml files, loads the cached schema, and validates each document. It exits with status 1 on failure, which GitHub Actions interprets as a failed check.
#!/usr/bin/env python3
"""ISO 19115-1:2014 XSD validator for CI pipelines."""
import argparse
import sys
from pathlib import Path
from lxml import etree
def validate_xml(xml_path: Path, schema_dir: Path) -> list[str]:
"""Validate a single XML file against the cached ISO 19115 schema."""
errors = []
try:
schema_file = schema_dir / "iso19115-1.xsd"
if not schema_file.exists():
return [f"Schema not found at {schema_file}"]
# Relative xs:import / xs:include resolve via each file's base URL
schema_doc = etree.parse(str(schema_file))
schema = etree.XMLSchema(schema_doc)
# Parse target metadata XML
xml_doc = etree.parse(str(xml_path))
# Collect every validation failure from the schema error log
if not schema.validate(xml_doc):
for entry in schema.error_log:
errors.append(f" Line {entry.line}: {entry.message}")
except etree.XMLSchemaParseError as e:
errors.append(f"Schema parse error: {e}")
except etree.XMLSyntaxError as e:
errors.append(f" Malformed XML in {xml_path.name}: {e}")
except Exception as e:
errors.append(f"Unexpected error processing {xml_path.name}: {e}")
return errors
def main():
parser = argparse.ArgumentParser(description="Validate ISO 19115 XML metadata")
parser.add_argument("--schema-dir", required=True, type=Path)
parser.add_argument("--metadata-dir", required=True, type=Path)
args = parser.parse_args()
if not args.metadata_dir.is_dir():
print(f"❌ Metadata directory not found: {args.metadata_dir}")
sys.exit(1)
xml_files = list(args.metadata_dir.rglob("*.xml"))
if not xml_files:
print("ℹ️ No XML files found in metadata directory. Skipping validation.")
sys.exit(0)
all_errors = []
for xml_file in xml_files:
file_errors = validate_xml(xml_file, args.schema_dir)
if file_errors:
all_errors.append(f"\n📄 {xml_file.relative_to(args.metadata_dir)}")
all_errors.extend(file_errors)
if all_errors:
print(f"\n❌ Validation failed with {len(all_errors)} issue(s):")
for err in all_errors:
print(err)
sys.exit(1)
else:
print(f"✅ All {len(xml_files)} metadata files passed ISO 19115 validation.")
sys.exit(0)
if __name__ == "__main__":
main()
Step 3: Local Testing & CI Troubleshooting
Before merging, verify the script locally:
python scripts/validate_iso19115.py --schema-dir .cache/schemas --metadata-dir ./metadata
Common failure modes:
- Missing relative imports: ISO 19115 schemas reference
gco.xsdandgmd.xsdusing relative paths. Ensure all required XSD files exist in the same cache directory. The workflow above downloads the two most common dependencies; add additionalcurlcommands if your metadata uses extended profiles. - Namespace mismatches:
lxmlstrictly enforces namespace URIs. If your XML useshttp://www.isotc211.org/2005/gmdbut the schema expectshttp://www.isotc211.org/2005/gco, validation fails. Align your XML namespace declarations with the 2014 standard. - CI cache staleness: If the ISO updates a schema, clear the runner cache manually via the GitHub Actions UI or bump the
keyvalue in the workflow YAML. For detailed cache management strategies, consult the GitHub caching documentation.
Step 4: Scaling to Policy Enforcement
Once the validation job passes consistently, integrate it into branch protection rules. Require the validate-metadata check to pass before merging pull requests. This transforms a technical validation step into a governance control.
When deployed across multiple repositories, the same pattern scales into a broader CI/CD Validation & Policy Enforcement for Spatial Data strategy. You can extend the Python script to enforce custom business rules alongside XSD validation: verify that gmd:MD_Metadata/gmd:fileIdentifier matches repository naming conventions, ensure gmd:CI_ResponsibleParty contains valid contact emails, or confirm that gmd:referenceSystemInfo aligns with approved EPSG codes. These semantic checks run in milliseconds and output structured JSON or SARIF reports for dashboard consumption.
Next Steps
- Add
pytestunit tests to validate the script against known-good and known-bad ISO 19115 samples. - Configure GitHub Actions
matrixstrategy to validate against multiple schema versions (e.g., ISO 19115-1:2014 vs ISO 19115-3:2018). - Export validation results to a centralized compliance dashboard using GitHub Checks API or OpenTelemetry logging.