ISO 19115 Metadata Template Generation

Automating the creation of compliant geographic metadata is a foundational requirement for modern spatial data infrastructures. For GIS data managers, open-source maintainers, and government technology teams, manual metadata authoring introduces structural inconsistencies, delays publication cycles, and creates long-term compliance risks. ISO 19115 Metadata Template Generation addresses these challenges by providing a repeatable, code-driven approach to constructing structurally valid XML documents that align with international geospatial standards.

This guide outlines a production-ready workflow for generating ISO 19115-1/ISO 19139 templates using Python. The methodology emphasizes schema-aware construction, mandatory field enforcement, and seamless integration into broader Automated Metadata Generation & Schema Mapping architectures. By treating metadata as code, organizations can guarantee baseline compliance before dynamic dataset attributes are injected during ingestion or transformation pipelines.

Context and Automation Scope

ISO 19115 defines the abstract schema for describing geographic datasets, services, and related resources. In practice, the standard is implemented through ISO 19139, which specifies the exact XML encoding rules required for machine-readable exchange. A compliant metadata record must include identification information, spatial reference details, distribution channels, data quality metrics, and responsible party contact points. The official ISO 19115-1:2014 specification outlines these structural requirements in detail.

Template generation differs fundamentally from full metadata population. A template establishes the structural skeleton, namespace declarations, and mandatory element placeholders. Once validated against the XSD, the template can be programmatically populated with dataset-specific attributes during ingestion, transformation, or publication workflows. This decoupled approach reduces validation failures, prevents namespace collisions, and ensures that downstream catalog systems receive structurally sound XML before dynamic values are injected.

Organizations migrating legacy records often pair template generation with transformation logic. For example, teams converting North American legacy standards typically route records through FGDC to ISO 19115 Conversion Pipelines to normalize element mappings before applying ISO 19115 structural rules. Similarly, European open data portals frequently cross-reference DCAT-AP Spatial Profile Mapping to ensure interoperability between cataloging frameworks and ISO-compliant metadata stores.

Prerequisites and Environment Setup

Before implementing the template generator, ensure the following dependencies and configurations are in place:

  1. Python 3.9+ with venv or conda environment isolation.
  2. Core Libraries:
  • lxml>=4.9.0 for XML construction, namespace management, and XSD validation (official documentation)
  • pyproj>=3.4.0 for coordinate reference system (CRS) normalization
  • python-dateutil>=2.8.2 for ISO 8601 temporal formatting
  1. Schema Files: Download the official ISO 19139 XML schemas from the OGC schema registry or mirror them locally to avoid network-dependent validation failures.
  2. Directory Structure: Maintain a clean separation between schema files, template generators, and output directories. Example:
  metadata-engine/
  ├── schemas/
  │   └── gmd/
  ├── generators/
  │   └── iso19115_template.py
  └── output/

Core Template Architecture

A compliant ISO 19139 template requires precise namespace declarations and a strict element hierarchy. The root element must be <gmd:MD_Metadata>, with the following mandatory namespaces:

  • gmd (http://www.isotc211.org/2005/gmd)
  • gco (http://www.isotc211.org/2005/gco)
  • gml (http://www.opengis.net/gml)
  • xlink (http://www.w3.org/1999/xlink)
  • gmx (http://www.isotc211.org/2005/gmx)
  • srv (http://www.isotc211.org/2005/srv)

The structural skeleton must include these baseline elements in the correct order:

  1. fileIdentifier (UUID or persistent identifier)
  2. language (ISO 639-2 code)
  3. characterSet (UTF-8)
  4. hierarchyLevel (dataset, service, series, etc.)
  5. contact (responsible party for metadata)
  6. dateStamp (metadata creation/modification date)
  7. metadataStandardName (ISO 19115-1)
  8. metadataStandardVersion (2014)
graph TD
    classDef root fill:#d7efef,stroke:#0e7c86,color:#0a5d65;
    M["gmd:MD_Metadata"]:::root
    M --> FI["fileIdentifier"]
    M --> LANG["language: eng"]
    M --> CS["characterSet: utf8"]
    M --> HL["hierarchyLevel"]
    M --> CT["contact: CI_ResponsibleParty"]
    M --> DS["dateStamp"]
    M --> SN["metadataStandardName"]
    M --> SV["metadataStandardVersion"]
    M -. raster .-> SR["spatialRepresentationInfo"]
    M -. vector .-> FC["featureCatalogueCitation"]

Omitting or misordering these elements will trigger immediate XSD validation failures. The template generator should enforce this sequence programmatically, using placeholder values that can be safely overwritten during population.

Python Implementation Workflow

The following implementation uses lxml.etree to construct a namespace-aware template. It avoids string concatenation, preventing common XML injection and escaping vulnerabilities.

import uuid
from datetime import datetime
from lxml import etree

def generate_iso19115_template(output_path: str = "metadata_template.xml") -> None:
    # Define namespaces
    nsmap = {
        "gmd": "http://www.isotc211.org/2005/gmd",
        "gco": "http://www.isotc211.org/2005/gco",
        "gml": "http://www.opengis.net/gml",
        "xlink": "http://www.w3.org/1999/xlink",
        "gmx": "http://www.isotc211.org/2005/gmx",
        "srv": "http://www.isotc211.org/2005/srv"
    }

    # Root element
    root = etree.Element("{http://www.isotc211.org/2005/gmd}MD_Metadata", nsmap=nsmap)

    # Helper to create elements with gco:CharacterString or gco:Date
    def add_string_element(parent, tag, value):
        child = etree.SubElement(parent, f"{{http://www.isotc211.org/2005/gmd}}{tag}")
        val_elem = etree.SubElement(child, "{http://www.isotc211.org/2005/gco}CharacterString")
        val_elem.text = value

    def add_code_list_element(parent, tag, value):
        child = etree.SubElement(parent, f"{{http://www.isotc211.org/2005/gmd}}{tag}")
        val_elem = etree.SubElement(child, "{http://www.isotc211.org/2005/gmd}MD_CharacterSetCode")
        val_elem.set("codeList", "http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode")
        val_elem.set("codeListValue", value)
        val_elem.set("codeSpace", "http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode")

    # Mandatory baseline elements
    add_string_element(root, "fileIdentifier", str(uuid.uuid4()))
    add_string_element(root, "language", "eng")
    add_code_list_element(root, "characterSet", "utf8")
    add_string_element(root, "hierarchyLevel", "dataset")

    # Contact placeholder
    contact = etree.SubElement(root, "{http://www.isotc211.org/2005/gmd}contact")
    add_string_element(contact, "organisationName", "Placeholder Organisation")
    add_string_element(contact, "positionName", "GIS Data Manager")

    # Date stamp
    date_elem = etree.SubElement(root, "{http://www.isotc211.org/2005/gmd}dateStamp")
    date_val = etree.SubElement(date_elem, "{http://www.isotc211.org/2005/gco}Date")
    date_val.text = datetime.now().strftime("%Y-%m-%d")

    # Standard identifiers
    add_string_element(root, "metadataStandardName", "ISO 19115-1 Geographic information — Metadata")
    add_string_element(root, "metadataStandardVersion", "2014")

    # Write to file with pretty printing
    tree = etree.ElementTree(root)
    tree.write(output_path, xml_declaration=True, encoding="UTF-8", pretty_print=True)
    print(f"Template generated: {output_path}")

if __name__ == "__main__":
    generate_iso19115_template()

This script guarantees namespace consistency, enforces mandatory element ordering, and produces clean, human-readable XML. By using lxml.etree.SubElement with fully qualified names, it bypasses common serialization bugs that occur when mixing default and prefixed namespaces.

Schema Validation and Quality Assurance

Generating a template is only half the workflow. Every output must pass strict XSD validation before entering production pipelines. The ISO 19139 schema defines over 1,200 elements with strict cardinality rules, making manual verification impractical.

Integrate validation directly into the generator using lxml.etree.XMLSchema:

def validate_template(xml_path: str, xsd_path: str) -> bool:
    with open(xsd_path, "rb") as f:
        schema_doc = etree.parse(f)
    schema = etree.XMLSchema(schema_doc)

    with open(xml_path, "rb") as f:
        doc = etree.parse(f)

    try:
        schema.assertValid(doc)
        return True
    except etree.DocumentInvalid as e:
        print(f"Validation failed: {e}")
        return False

Automating this step prevents malformed records from propagating into spatial data catalogs. Teams should pair validation with automated linting rules that flag missing spatial extents, empty distribution blocks, or deprecated code list values. This aligns with broader Metadata Schema Validation and Linting practices, ensuring that structural compliance is verified at every commit.

Extending Templates for Raster and Vector Assets

Base templates must be adapted to accommodate dataset-specific metadata blocks. Raster datasets require spatialRepresentationInfo elements detailing grid dimensions, cell size, and band descriptions. Vector datasets need featureCatalogueCitation and geometryObject definitions.

When working with raster formats, extracting header metadata directly from source files eliminates manual entry errors. For example, teams implementing Generating ISO 19115 metadata from GeoTIFF headers can parse TIFF tags using rasterio or GDAL, map them to ISO 19115 elements, and inject them into the pre-validated template. This approach guarantees that spatial reference systems, bounding coordinates, and acquisition dates match the actual binary payload.

Coordinate reference systems should always be normalized using pyproj before insertion. Converting legacy EPSG codes to modern URIs (e.g., urn:ogc:def:crs:EPSG::4326) prevents catalog interoperability failures and aligns with OGC best practices.

Deployment and CI/CD Integration

Template generation should run as a deterministic, idempotent step in continuous integration pipelines. Recommended deployment patterns include:

  1. Pre-commit Hooks: Run the generator and validator on every metadata file modification. Fail the commit if XSD validation fails.
  2. GitHub Actions / GitLab CI: Trigger template generation on new dataset uploads. Use matrix testing to validate against multiple ISO 19139 schema versions.
  3. Containerized Execution: Package the generator with pinned dependencies and local schema mirrors. This eliminates environment drift and ensures reproducible outputs across staging and production.

Example CI step (GitHub Actions):

- name: Generate and Validate ISO 19115 Template
  run: |
    python -m venv .venv
    source .venv/bin/activate
    pip install lxml pyproj python-dateutil
    python generators/iso19115_template.py
    python -c "from generators import validate_template; assert validate_template('output/metadata_template.xml', 'schemas/gmd/gmd.xsd')"

Production Best Practices

  1. Never Hardcode Placeholders: Use environment variables or configuration files for organization names, contact emails, and default languages. This prevents accidental data leakage and simplifies multi-tenant deployments.
  2. Version Control Templates: Treat template generators as infrastructure code. Track schema updates, namespace changes, and validation rule modifications in Git.
  3. Implement Fallback Logic: If a dynamic value cannot be extracted (e.g., missing CRS), inject a compliant placeholder with a clear gco:nilReason attribute rather than omitting the element entirely.
  4. Log Validation Failures Verbosely: Capture XSD error paths in structured logs. This accelerates debugging when downstream systems reject records.
  5. Separate Generation from Population: Keep the template generator stateless. Use a separate population layer to merge dataset attributes, ensuring that structural compliance is never compromised by malformed input data.

By adopting a disciplined, schema-first approach to ISO 19115 Metadata Template Generation, organizations can eliminate manual authoring bottlenecks, guarantee baseline compliance, and build scalable metadata pipelines that adapt to evolving geospatial standards.