Validating FGDC Metadata Against XML Schemas

Validating FGDC metadata against XML schemas requires parsing the Content Standard for Digital Geospatial Metadata (CSDGM) document against the official fgdc-std-001-1998.xsd using a strict XML parser. The most reliable automation path uses Python’s lxml library with XMLSchema, which enforces structural rules, validates data types, and returns machine-readable error traces. Validation succeeds when the metadata file references the correct schema revision, resolves all relative imports, and adheres to UTF-8 encoding without byte-order marks. For headless environments, xmllint --schema provides equivalent CLI validation.

Schema Acquisition & Local Caching

The official FGDC XML schemas are maintained by the Federal Geographic Data Committee and distributed through the FGDC Standards & Working Groups portal. Download fgdc-std-001-1998.xsd and any companion files (e.g., fgdc-std-001-1998-1.xsd) to a version-controlled directory.

Never rely on remote schema resolution in production pipelines. Network timeouts, deprecated endpoints, and HTTP redirects will silently break validation. Store schemas alongside your validation runner and reference them via absolute or relative filesystem paths. Additionally, ensure your environment uses lxml>=4.9.0 to support modern XML Schema 1.0 compliance and robust error reporting.

pip install lxml>=4.9.0

Production-Ready Python Validator

The following script validates a single FGDC XML file against a local schema, captures validation errors, and returns a structured report suitable for CI/CD or batch processing. It explicitly handles schema parsing failures, malformed XML, and multi-line validation traces.

import sys
from pathlib import Path
from lxml import etree

def validate_fgdc(xml_path: str, schema_path: str) -> dict:
    """
    Validate an FGDC CSDGM XML file against the official XSD.
    Returns a dict with 'valid' (bool), 'errors' (list), and 'warnings' (list).
    """
    xml_file = Path(xml_path)
    schema_file = Path(schema_path)

    if not xml_file.is_file() or not schema_file.is_file():
        return {"valid": False, "errors": ["Missing XML or schema file."], "warnings": []}

    try:
        # Load schema with strict parsing
        schema_doc = etree.XMLSchema(etree.parse(str(schema_file)))
        doc = etree.parse(str(xml_file), parser=etree.XMLParser(recover=False, no_network=True))

        # Enforce validation
        schema_doc.assertValid(doc)
        return {"valid": True, "errors": [], "warnings": []}

    except etree.XMLSchemaParseError as e:
        return {"valid": False, "errors": [f"Schema parse error: {str(e)}"], "warnings": []}
    except etree.DocumentInvalid as e:
        # lxml returns a multi-line error string; parse it for actionable lines
        raw_errors = str(e).strip().splitlines()
        cleaned = [err.strip() for err in raw_errors if err.strip()]
        return {"valid": False, "errors": cleaned, "warnings": []}
    except etree.XMLSyntaxError as e:
        return {"valid": False, "errors": [f"Malformed XML: {str(e)}"], "warnings": []}
    except Exception as e:
        return {"valid": False, "errors": [f"Unexpected validation failure: {str(e)}"], "warnings": []}

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python validate_fgdc.py <xml_file> <xsd_file>")
        sys.exit(1)

    result = validate_fgdc(sys.argv[1], sys.argv[2])
    if result["valid"]:
        print("✅ Validation passed.")
    else:
        print(f"❌ Validation failed ({len(result['errors'])} error(s)):")
        for err in result["errors"]:
            print(f"  • {err}")
    sys.exit(0 if result["valid"] else 1)

Resolving Common CSDGM Validation Errors

FGDC metadata frequently fails strict XSD validation due to legacy authoring tools and inconsistent export practices. Address these patterns before scaling validation:

  • Byte-Order Marks (BOM): Windows editors often prepend \ufeff to UTF-8 files. Strip it during ingestion or configure your parser with encoding="utf-8-sig" to prevent XMLSyntaxError on line 1.
  • Missing xsi:noNamespaceSchemaLocation: CSDGM XML rarely declares schema hints. The Python validator above bypasses this by loading the XSD explicitly, but some legacy tools require injecting xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="fgdc-std-001-1998.xsd" into the root <metadata> element.
  • Relative Path Resolution: FGDC schemas use <xs:include> for modular definitions. If you move fgdc-std-001-1998.xsd without its companion files, XMLSchemaParseError will trigger. Keep the original directory structure intact.
  • Strict Element Ordering: The 1998 standard enforces rigid sequence rules. Swapped <idinfo> and <dataqual> blocks, or misplaced <timeperd> children, will fail immediately. Use the error line numbers to locate structural drift.

CLI Validation & CI/CD Integration

For shell-based pipelines or containerized runners, xmllint provides fast, dependency-light validation. Install via libxml2-utils (Debian/Ubuntu) or libxml2 (macOS/Homebrew):

xmllint --noout --schema fgdc-std-001-1998.xsd metadata.xml

The --noout flag suppresses stdout, returning only validation errors and a non-zero exit code on failure. This behavior integrates cleanly with GitHub Actions, GitLab CI, or Jenkins:

# .github/workflows/validate-metadata.yml
jobs:
  validate-fgdc:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install libxml2
        run: sudo apt-get install -y libxml2-utils
      - name: Run XSD Validation
        run: |
          find ./metadata -name "*.xml" -exec xmllint --noout --schema schemas/fgdc-std-001-1998.xsd {} +

When building Automated Metadata Generation & Schema Mapping workflows, wrap validation in a pre-commit hook or post-export gate. Catching structural violations before data publication prevents downstream catalog ingestion failures and maintains compliance with federal data standards.

Integrating Validation into Metadata Workflows

Schema validation is rarely an endpoint; it’s a quality gate. Once your pipeline reliably flags malformed CSDGM documents, route valid outputs to transformation stages. Validating FGDC metadata against XML schemas ensures structural integrity before field mapping, coordinate system translation, or crosswalk generation. This step often serves as a prerequisite for FGDC to ISO 19115 Conversion Pipelines, where strict source compliance prevents silent data loss during standard migration.

For advanced error triage, consider logging validation traces to a structured format (JSON/CSV) and aggregating failure patterns across datasets. Common recurring issues—like missing <metainfo> contact blocks or invalid <timeinfo> formats—can be patched programmatically or flagged for manual curator review. Pairing automated XSD checks with semantic validation (e.g., controlled vocabulary enforcement) creates a resilient metadata governance loop.