FGDC to ISO 19115 Conversion Pipelines
Migrating legacy geospatial metadata from the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) to the international ISO 19115/19139 standard is a foundational requirement for modern data portals, cross-agency interoperability, and automated cataloging. For GIS data managers, open-source maintainers, and Python automation builders, manual translation is unsustainable at scale. Implementing robust FGDC to ISO 19115 Conversion Pipelines ensures consistent schema alignment, preserves data provenance, and enables seamless publishing to enterprise catalogs and open data platforms.
This workflow sits at the core of Automated Metadata Generation & Schema Mapping, where programmatic transformation replaces error-prone manual editing. The following guide details a production-tested pipeline architecture, deterministic mapping logic, and validation routines tailored for government and open-source environments.
Prerequisites & Environment Setup
Before implementing the conversion pipeline, ensure your runtime environment meets baseline technical requirements. A reliable transformation stack depends on strict XML parsing, namespace resolution, and coordinate reference system (CRS) normalization.
- Python 3.9+ with
lxml,pyproj,pyyaml, andxmlschemainstalled viapip - FGDC XML inputs conforming to the FGDC CSDGM v2.0 specification
- ISO 19115-2/19139 XSD files for structural validation (available from OGC/ISO repositories)
- Namespace awareness: FGDC uses no default namespace, while ISO 19115 relies on
http://www.isotc211.org/2005/gmdandhttp://www.isotc211.org/2005/gco - CRS lookup table mapping legacy FGDC
spdomdefinitions to EPSG codes or WKT strings
Configure your environment with strict XML parsing settings to prevent entity expansion vulnerabilities and ensure deterministic output. Use lxml.etree.XMLParser(resolve_entities=False, recover=False) to maintain schema fidelity during ingestion.
Pipeline Architecture & Step-by-Step Workflow
A reliable conversion pipeline follows a deterministic, stateless ETL pattern. Each stage isolates transformation logic, enabling unit testing, incremental debugging, and horizontal scaling for batch processing.
flowchart TD
classDef ok fill:#d7efef,stroke:#0e7c86,color:#0a5d65;
classDef bad fill:#fde0dd,stroke:#c0392b,color:#922b21;
A(["FGDC CSDGM XML"]) --> B["Ingest & normalize (UTF-8, BOM)"]
B --> WF{"Well-formed?"}
WF -->|no| REJ["Reject & log"]
WF -->|yes| C["Extract & flatten via XPath"]
C --> D["Map & transform (crosswalk registry)"]
D --> E["Enrich & default ISO fields"]
E --> CRS{"CRS present?"}
CRS -->|no| FB["Fallback to EPSG:4326 + warn"]
CRS -->|yes| F["Serialize ISO 19139"]
FB --> F
F --> V{"Valid against XSD?"}
V -->|no| LOG["Log errors for review"]
V -->|yes| OUT["Export & route to staging"]
class REJ,LOG bad
class OUT ok1. Ingest & Normalize
Load raw FGDC XML files, strip processing instructions, and normalize character encoding to UTF-8. Handle legacy BOM markers and non-standard line endings that frequently appear in archived metadata exports. Validate well-formedness before proceeding to prevent downstream XPath failures.
2. Extract & Flatten
Traverse the FGDC tree using targeted XPath expressions. Extract core elements (idinfo, dataqual, spdoinfo, metainfo, distinfo) and flatten them into an intermediate dictionary or JSON structure. This decoupling step is critical: it allows you to apply business logic without wrestling with nested XML namespaces during the mapping phase.
3. Map & Transform
Apply a deterministic mapping registry to convert FGDC nodes to ISO 19115 equivalents. The mapping must explicitly handle cardinality mismatches:
- One-to-Many: A single FGDC
originnode often expands into multiple ISOCI_ResponsiblePartyentries (e.g., separatingindividualName,organisationName, androle). - Many-to-One: FGDC
themekeyandplacekeyarrays consolidate into ISOMD_Keywordswith distinctthesaurusNameURIs. - Type Coercion: Convert FGDC
pubdate(YYYYMMDD or YYYY) to ISO 8601CI_Dateformats.
Use a declarative mapping dictionary rather than imperative if/else chains. This improves maintainability and enables schema versioning.
# Example mapping registry pattern
FGDC_TO_ISO_MAP = {
"idinfo.citation.citeinfo.title": "identificationInfo.MD_DataIdentification.citation.CI_Citation.title",
"idinfo.citation.citeinfo.pubdate": "identificationInfo.MD_DataIdentification.citation.CI_Citation.date.CI_Date.date",
"dataqual.lineage.procstep": "dataQualityInfo.DQ_DataQuality.lineage.LI_Lineage.processStep.LI_ProcessStep.description"
}
4. Enrich & Default Population
ISO 19115 mandates several fields that FGDC either omits or treats as optional. Populate mandatory ISO elements with deterministic defaults when source data is absent:
MD_Metadata/fileIdentifier: Generate a UUID v4 or derive from dataset hashlanguage&characterSet: Default toengandutf8hierarchyLevel: Set todatasetunless FGDCspdomindicatesseriesorfeaturecontact: Map FGDCptcontactoMD_Metadata/contact/CI_ResponsiblePartywith rolepointOfContact
Missing CRS definitions should trigger a fallback to WGS 84 (EPSG:4326) with an explicit warning logged for manual review.
5. Serialize & Validate
Generate ISO-compliant XML using lxml.builder or xmlschema serialization. Validate the output against the official 19139 XSD before committing to storage. Capture structural warnings, missing mandatory nodes, and type mismatches in a structured log (JSON or CSV). For detailed schema conformance checks, see Validating FGDC metadata against XML schemas.
6. Export & Route
Write validated XML to a staging directory with atomic file operations (write to .tmp, rename on success). Trigger downstream routing via webhooks or message queues. If your architecture requires spatial profile alignment for open data portals, integrate with DCAT-AP Spatial Profile Mapping to ensure cross-walk compatibility with European and federal open data frameworks.
Handling Schema Divergence & Edge Cases
Legacy FGDC metadata frequently contains structural anomalies that break naive converters. Address these proactively:
- Bounding Box vs. Polygon: FGDC
westbc,eastbc,northbc,southbcmust be converted to ISOEX_GeographicBoundingBox. If FGDC providesspdom/descgeogwith complex polygons, extract coordinates and serialize asEX_GeographicExtentwithgml:Polygon. - Temporal Ambiguity: FGDC dates often lack precision (
YYYYorYYYY-MM). ISO 19115 requires fullCI_Dateobjects. Implement a parser that defaults toYYYY-01-01for year-only inputs and flags low-confidence dates. - Contact Role Mapping: FGDC
cntinfolacks explicit ISOCI_RoleCodevalues. Mapptcontac→pointOfContact,originator→originator, anddistrib→distributor. Unmapped roles should default touserDefinedwith a free-text note. - Cross-Reference Links: FGDC
crossrefelements map to ISOMD_AssociatedResourcewithassociationTypeset tocrossReference. Preserve original titles and URLs to maintain citation chains.
Validation, Linting & Continuous Integration
Production pipelines require automated quality gates. Integrate xmlschema validation into your CI/CD workflow to block non-compliant outputs. Implement a lightweight linter that checks:
- Mandatory ISO 19115-1/2 elements
- Controlled vocabulary compliance (e.g.,
MD_CharacterSetCode,MD_ScopeCode) - Coordinate system validity via
pyproj.CRS.is_valid() - Date format conformance to ISO 8601
Pair validation with a templating engine to standardize boilerplate across datasets. When building reusable scaffolds for new projects, reference ISO 19115 Metadata Template Generation to enforce consistent header structures, contact blocks, and licensing statements.
Performance & Scaling Considerations
For agency-scale deployments processing thousands of FGDC records, optimize the pipeline for throughput and memory efficiency:
- Streaming Parsing: Use
lxml.etree.iterparse()to avoid loading multi-megabyte XML files entirely into memory. - Connection Pooling: If routing to remote catalogs (GeoNetwork, CKAN, ArcGIS Enterprise), batch API calls and implement exponential backoff.
- Idempotency: Design the pipeline to be re-runnable without duplicating records. Hash input FGDC files and skip processing if the target ISO output already matches the expected checksum.
- Parallel Execution: Use
concurrent.futures.ProcessPoolExecutorfor CPU-bound mapping stages, keeping I/O operations in a separate thread pool.
Conclusion & Next Steps
Implementing a structured FGDC to ISO 19115 Conversion Pipelines eliminates manual translation bottlenecks, enforces schema compliance, and future-proofs geospatial assets for modern cataloging ecosystems. By isolating ingestion, mapping, enrichment, and validation into discrete, testable stages, teams can scale metadata operations across legacy archives and real-time data feeds.
Next steps include integrating automated linting into your CI/CD pipeline, establishing cross-walks for downstream spatial profiles, and deploying monitoring dashboards to track conversion success rates and schema drift over time.