Validating FGDC Metadata Against XML Schemas
Validating FGDC metadata against XML schemas requires parsing the Content Standard for Digital Geospatial Metadata (CSDGM) document against the official fgdc-std-001-1998.xsd using a strict XML parser. The most reliable automation path uses Python’s lxml library with XMLSchema, which enforces structural rules, validates data types, and returns machine-readable error traces. Validation succeeds when the metadata file references the correct schema revision, resolves all relative imports, and adheres to UTF-8 encoding without byte-order marks. For headless environments, xmllint --schema provides equivalent CLI validation.
Schema Acquisition & Local Caching
The official FGDC XML schemas are maintained by the Federal Geographic Data Committee and distributed through the FGDC Standards & Working Groups portal. Download fgdc-std-001-1998.xsd and any companion files (e.g., fgdc-std-001-1998-1.xsd) to a version-controlled directory.
Never rely on remote schema resolution in production pipelines. Network timeouts, deprecated endpoints, and HTTP redirects will silently break validation. Store schemas alongside your validation runner and reference them via absolute or relative filesystem paths. Additionally, ensure your environment uses lxml>=4.9.0 to support modern XML Schema 1.0 compliance and robust error reporting.
pip install lxml>=4.9.0
Production-Ready Python Validator
The following script validates a single FGDC XML file against a local schema, captures validation errors, and returns a structured report suitable for CI/CD or batch processing. It explicitly handles schema parsing failures, malformed XML, and multi-line validation traces.
import sys
from pathlib import Path
from lxml import etree
def validate_fgdc(xml_path: str, schema_path: str) -> dict:
"""
Validate an FGDC CSDGM XML file against the official XSD.
Returns a dict with 'valid' (bool), 'errors' (list), and 'warnings' (list).
"""
xml_file = Path(xml_path)
schema_file = Path(schema_path)
if not xml_file.is_file() or not schema_file.is_file():
return {"valid": False, "errors": ["Missing XML or schema file."], "warnings": []}
try:
# Load schema with strict parsing
schema_doc = etree.XMLSchema(etree.parse(str(schema_file)))
doc = etree.parse(str(xml_file), parser=etree.XMLParser(recover=False, no_network=True))
# Enforce validation
schema_doc.assertValid(doc)
return {"valid": True, "errors": [], "warnings": []}
except etree.XMLSchemaParseError as e:
return {"valid": False, "errors": [f"Schema parse error: {str(e)}"], "warnings": []}
except etree.DocumentInvalid as e:
# lxml returns a multi-line error string; parse it for actionable lines
raw_errors = str(e).strip().splitlines()
cleaned = [err.strip() for err in raw_errors if err.strip()]
return {"valid": False, "errors": cleaned, "warnings": []}
except etree.XMLSyntaxError as e:
return {"valid": False, "errors": [f"Malformed XML: {str(e)}"], "warnings": []}
except Exception as e:
return {"valid": False, "errors": [f"Unexpected validation failure: {str(e)}"], "warnings": []}
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python validate_fgdc.py <xml_file> <xsd_file>")
sys.exit(1)
result = validate_fgdc(sys.argv[1], sys.argv[2])
if result["valid"]:
print("✅ Validation passed.")
else:
print(f"❌ Validation failed ({len(result['errors'])} error(s)):")
for err in result["errors"]:
print(f" • {err}")
sys.exit(0 if result["valid"] else 1)
Resolving Common CSDGM Validation Errors
FGDC metadata frequently fails strict XSD validation due to legacy authoring tools and inconsistent export practices. Address these patterns before scaling validation:
- Byte-Order Marks (BOM): Windows editors often prepend
\ufeffto UTF-8 files. Strip it during ingestion or configure your parser withencoding="utf-8-sig"to preventXMLSyntaxErroron line 1. - Missing
xsi:noNamespaceSchemaLocation: CSDGM XML rarely declares schema hints. The Python validator above bypasses this by loading the XSD explicitly, but some legacy tools require injectingxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="fgdc-std-001-1998.xsd"into the root<metadata>element. - Relative Path Resolution: FGDC schemas use
<xs:include>for modular definitions. If you movefgdc-std-001-1998.xsdwithout its companion files,XMLSchemaParseErrorwill trigger. Keep the original directory structure intact. - Strict Element Ordering: The 1998 standard enforces rigid sequence rules. Swapped
<idinfo>and<dataqual>blocks, or misplaced<timeperd>children, will fail immediately. Use the error line numbers to locate structural drift.
CLI Validation & CI/CD Integration
For shell-based pipelines or containerized runners, xmllint provides fast, dependency-light validation. Install via libxml2-utils (Debian/Ubuntu) or libxml2 (macOS/Homebrew):
xmllint --noout --schema fgdc-std-001-1998.xsd metadata.xml
The --noout flag suppresses stdout, returning only validation errors and a non-zero exit code on failure. This behavior integrates cleanly with GitHub Actions, GitLab CI, or Jenkins:
# .github/workflows/validate-metadata.yml
jobs:
validate-fgdc:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install libxml2
run: sudo apt-get install -y libxml2-utils
- name: Run XSD Validation
run: |
find ./metadata -name "*.xml" -exec xmllint --noout --schema schemas/fgdc-std-001-1998.xsd {} +
When building Automated Metadata Generation & Schema Mapping workflows, wrap validation in a pre-commit hook or post-export gate. Catching structural violations before data publication prevents downstream catalog ingestion failures and maintains compliance with federal data standards.
Integrating Validation into Metadata Workflows
Schema validation is rarely an endpoint; it’s a quality gate. Once your pipeline reliably flags malformed CSDGM documents, route valid outputs to transformation stages. Validating FGDC metadata against XML schemas ensures structural integrity before field mapping, coordinate system translation, or crosswalk generation. This step often serves as a prerequisite for FGDC to ISO 19115 Conversion Pipelines, where strict source compliance prevents silent data loss during standard migration.
For advanced error triage, consider logging validation traces to a structured format (JSON/CSV) and aggregating failure patterns across datasets. Common recurring issues—like missing <metainfo> contact blocks or invalid <timeinfo> formats—can be patched programmatically or flagged for manual curator review. Pairing automated XSD checks with semantic validation (e.g., controlled vocabulary enforcement) creates a resilient metadata governance loop.