Integrating PyLint with spatial metadata validators

Integrating PyLint with spatial metadata validators requires building a custom PyLint checker plugin that executes geospatial validation logic and maps failures directly to PyLint’s message reporting system. Because PyLint natively analyzes Python abstract syntax trees (AST), it does not parse XML, JSON, or YAML out of the box. You bridge this gap by subclassing pylint.checkers.BaseRawFileChecker, intercepting target file paths, and routing them to dedicated spatial parsers. The result is a unified linting pass that enforces ISO 19115/19139 completeness, INSPIRE licensing declarations, and coordinate reference system (CRS) validation rules while returning standardized exit codes for automated CI pipelines.

Architecture & Message Mapping Strategy

PyLint’s plugin architecture is highly extensible, but spatial metadata lives outside Python’s AST. To make them compatible, your custom checker must:

  • Register user-defined message IDs in the W9000–E9999 range
  • Subclass BaseRawFileChecker to read raw files instead of the Python AST
  • Override process_module to capture the raw file path via node.file
  • Parse the file using standard libraries (json, xml.etree, yaml)
  • Emit structured warnings or errors via self.add_message()

This architecture aligns with modern Spatial Data Schema Linting in CI practices, where metadata quality is treated as first-class code. Instead of maintaining fragmented XML schema checkers, JSON validators, and Python linters, you consolidate validation under a single toolchain. Teams can parse PyLint’s JSON output or rely on standardized exit codes (0 for clean, 1 for warnings, 2 for errors) to gate deployments. For deeper pipeline design patterns, see CI/CD Validation & Policy Enforcement for Spatial Data.

Step-by-Step Implementation

  1. Create the plugin directory: mkdir -p linters && touch linters/__init__.py
  2. Define the checker: Place validation logic in a dedicated module
  3. Register messages: Map domain rules to PyLint’s MSGS dictionary
  4. Configure .pylintrc: Load the plugin via load-plugins=linters.spatial_metadata_checker
  5. Execute: Run pylint --rcfile=.pylintrc metadata.json

Below is a production-ready custom PyLint checker that validates GeoJSON/JSON metadata files for mandatory spatial fields, license compliance, and CRS declarations.

# linters/spatial_metadata_checker.py
import json
import re
from pylint.checkers import BaseRawFileChecker

MSGS = {
    "W9001": (
        "Missing required spatial metadata field: %s",
        "missing-spatial-field",
        "A mandatory spatial metadata key is absent."
    ),
    "E9002": (
        "Invalid or missing license identifier: %s",
        "invalid-license-id",
        "License field must contain a valid SPDX, OGC, or Creative Commons URI."
    ),
    "E9003": (
        "CRS declaration does not match EPSG standard format",
        "invalid-crs-format",
        "CRS must be specified as 'EPSG:XXXX' or a valid OGC URN."
    ),
    "E9004": (
        "Cannot read or parse metadata file: %s",
        "unparseable-metadata-file",
        "The metadata file could not be opened or is not valid JSON."
    ),
}

REQUIRED_FIELDS = ["title", "description", "extent", "crs", "license"]
VALID_CRS_PATTERN = re.compile(r"^(EPSG:\d{4,}|urn:ogc:def:crs:EPSG::\d{4,})$")

class SpatialMetadataChecker(BaseRawFileChecker):
    """Custom PyLint checker for spatial metadata files (JSON/GeoJSON)."""
    name = "spatial-metadata"
    msgs = MSGS
    priority = -1  # Run after core checks

    def process_module(self, node):
        file_path = getattr(node, "file", "")
        if not file_path.endswith((".json", ".geojson")):
            return

        try:
            with open(file_path, "r", encoding="utf-8") as f:
                metadata = json.load(f)
        except (json.JSONDecodeError, OSError) as exc:
            self.add_message("E9004", args=str(exc), line=1)
            return

        # Check required fields
        for field in REQUIRED_FIELDS:
            if field not in metadata:
                self.add_message("W9001", args=field, line=1)

        # Validate license
        license_val = str(metadata.get("license", ""))
        if not license_val or not license_val.startswith(("http://", "https://", "SPDX:", "CC-")):
            self.add_message("E9002", args=license_val or "empty", line=1)

        # Validate CRS
        crs_val = str(metadata.get("crs", ""))
        if not VALID_CRS_PATTERN.match(crs_val):
            self.add_message("E9003", line=1)

CI/CD Pipeline Integration

Once the plugin is registered, wire it into your pipeline using PyLint’s --output-format=json flag. This produces machine-readable output that GitHub, GitLab, and Jenkins can parse into inline annotations.

.pylintrc configuration:

[MASTER]
load-plugins=linters.spatial_metadata_checker
extension-pkg-whitelist=json,xml

[MESSAGES CONTROL]
enable=W9001,E9002,E9003,E9004
disable=all

GitHub Actions example:

- name: Lint Spatial Metadata
  run: |
    pylint --rcfile=.pylintrc --output-format=json data/metadata/*.json > pylint-report.json
    echo "Exit code: $?"

PyLint returns 0 for clean runs, 1 when only warnings trigger, and 2 when errors occur. Configure your pipeline to fail on exit_code >= 2 to block merges with invalid CRS or missing licensing fields. For authoritative guidance on extending PyLint’s lifecycle hooks, consult the official plugin documentation.

Validation Rules & Compliance Scope

Spatial metadata validators must enforce domain-specific constraints that generic JSON schema tools miss. Focus your checker on:

  • ISO 19115/19139 completeness: Mandatory identification, lineage, and distribution elements
  • CRS normalization: Reject ambiguous strings like WGS84 or 4326 without prefixes; enforce EPSG:4326 or urn:ogc:def:crs:EPSG::4326
  • Licensing compliance: Require SPDX identifiers or OGC-compliant URIs for open data mandates
  • Bounding box validity: Ensure extent arrays contain exactly four numeric values with min < max
  • Temporal coverage: Validate ISO 8601 date ranges when temporal keys are present

Government and agency teams benefit most from this unified approach because it eliminates context switching between XML validators, Python linters, and custom shell scripts. By treating metadata as code, you gain version-controlled rule definitions, reproducible validation, and audit-ready pipeline logs. For formal metadata specifications, reference the ISO 19115 standard.

Troubleshooting & Best Practices

  • Plugin not loading: Verify load-plugins points to a valid Python module path. Run pylint --list-plugins to confirm registration.
  • False positives on non-metadata JSON: Add a namespace check or require a spatial_metadata_version key before running validation rules.
  • Line number mapping: Raw checkers default to line=1. If you need precise offsets, parse the file manually and calculate line breaks, or switch to BaseRawFileChecker with get_file_content().
  • Performance: Cache parsed metadata in CI runners. PyLint processes files sequentially; parallelize using pylint --jobs=4 for large catalogs.
  • Rule versioning: Store validation thresholds in a spatial_rules.yaml file and load them dynamically. This keeps your checker stateless and simplifies compliance audits.

By embedding spatial validation directly into your Python quality gates, you enforce data governance without fragmenting your developer workflow. The plugin scales with your catalog, integrates seamlessly with existing CI runners, and produces standardized outputs that satisfy both engineering and compliance teams.