Integrating PyLint with spatial metadata validators
Integrating PyLint with spatial metadata validators requires building a custom PyLint checker plugin that executes geospatial validation logic and maps failures directly to PyLint’s message reporting system. Because PyLint natively analyzes Python abstract syntax trees (AST), it does not parse XML, JSON, or YAML out of the box. You bridge this gap by subclassing pylint.checkers.BaseRawFileChecker, intercepting target file paths, and routing them to dedicated spatial parsers. The result is a unified linting pass that enforces ISO 19115/19139 completeness, INSPIRE licensing declarations, and coordinate reference system (CRS) validation rules while returning standardized exit codes for automated CI pipelines.
Architecture & Message Mapping Strategy
PyLint’s plugin architecture is highly extensible, but spatial metadata lives outside Python’s AST. To make them compatible, your custom checker must:
- Register user-defined message IDs in the
W9000–E9999range - Subclass
BaseRawFileCheckerto read raw files instead of the Python AST - Override
process_moduleto capture the raw file path vianode.file - Parse the file using standard libraries (
json,xml.etree,yaml) - Emit structured warnings or errors via
self.add_message()
This architecture aligns with modern Spatial Data Schema Linting in CI practices, where metadata quality is treated as first-class code. Instead of maintaining fragmented XML schema checkers, JSON validators, and Python linters, you consolidate validation under a single toolchain. Teams can parse PyLint’s JSON output or rely on standardized exit codes (0 for clean, 1 for warnings, 2 for errors) to gate deployments. For deeper pipeline design patterns, see CI/CD Validation & Policy Enforcement for Spatial Data.
Step-by-Step Implementation
- Create the plugin directory:
mkdir -p linters && touch linters/__init__.py - Define the checker: Place validation logic in a dedicated module
- Register messages: Map domain rules to PyLint’s
MSGSdictionary - Configure
.pylintrc: Load the plugin viaload-plugins=linters.spatial_metadata_checker - Execute: Run
pylint --rcfile=.pylintrc metadata.json
Below is a production-ready custom PyLint checker that validates GeoJSON/JSON metadata files for mandatory spatial fields, license compliance, and CRS declarations.
# linters/spatial_metadata_checker.py
import json
import re
from pylint.checkers import BaseRawFileChecker
MSGS = {
"W9001": (
"Missing required spatial metadata field: %s",
"missing-spatial-field",
"A mandatory spatial metadata key is absent."
),
"E9002": (
"Invalid or missing license identifier: %s",
"invalid-license-id",
"License field must contain a valid SPDX, OGC, or Creative Commons URI."
),
"E9003": (
"CRS declaration does not match EPSG standard format",
"invalid-crs-format",
"CRS must be specified as 'EPSG:XXXX' or a valid OGC URN."
),
"E9004": (
"Cannot read or parse metadata file: %s",
"unparseable-metadata-file",
"The metadata file could not be opened or is not valid JSON."
),
}
REQUIRED_FIELDS = ["title", "description", "extent", "crs", "license"]
VALID_CRS_PATTERN = re.compile(r"^(EPSG:\d{4,}|urn:ogc:def:crs:EPSG::\d{4,})$")
class SpatialMetadataChecker(BaseRawFileChecker):
"""Custom PyLint checker for spatial metadata files (JSON/GeoJSON)."""
name = "spatial-metadata"
msgs = MSGS
priority = -1 # Run after core checks
def process_module(self, node):
file_path = getattr(node, "file", "")
if not file_path.endswith((".json", ".geojson")):
return
try:
with open(file_path, "r", encoding="utf-8") as f:
metadata = json.load(f)
except (json.JSONDecodeError, OSError) as exc:
self.add_message("E9004", args=str(exc), line=1)
return
# Check required fields
for field in REQUIRED_FIELDS:
if field not in metadata:
self.add_message("W9001", args=field, line=1)
# Validate license
license_val = str(metadata.get("license", ""))
if not license_val or not license_val.startswith(("http://", "https://", "SPDX:", "CC-")):
self.add_message("E9002", args=license_val or "empty", line=1)
# Validate CRS
crs_val = str(metadata.get("crs", ""))
if not VALID_CRS_PATTERN.match(crs_val):
self.add_message("E9003", line=1)
CI/CD Pipeline Integration
Once the plugin is registered, wire it into your pipeline using PyLint’s --output-format=json flag. This produces machine-readable output that GitHub, GitLab, and Jenkins can parse into inline annotations.
.pylintrc configuration:
[MASTER]
load-plugins=linters.spatial_metadata_checker
extension-pkg-whitelist=json,xml
[MESSAGES CONTROL]
enable=W9001,E9002,E9003,E9004
disable=all
GitHub Actions example:
- name: Lint Spatial Metadata
run: |
pylint --rcfile=.pylintrc --output-format=json data/metadata/*.json > pylint-report.json
echo "Exit code: $?"
PyLint returns 0 for clean runs, 1 when only warnings trigger, and 2 when errors occur. Configure your pipeline to fail on exit_code >= 2 to block merges with invalid CRS or missing licensing fields. For authoritative guidance on extending PyLint’s lifecycle hooks, consult the official plugin documentation.
Validation Rules & Compliance Scope
Spatial metadata validators must enforce domain-specific constraints that generic JSON schema tools miss. Focus your checker on:
- ISO 19115/19139 completeness: Mandatory identification, lineage, and distribution elements
- CRS normalization: Reject ambiguous strings like
WGS84or4326without prefixes; enforceEPSG:4326orurn:ogc:def:crs:EPSG::4326 - Licensing compliance: Require SPDX identifiers or OGC-compliant URIs for open data mandates
- Bounding box validity: Ensure
extentarrays contain exactly four numeric values withmin < max - Temporal coverage: Validate ISO 8601 date ranges when
temporalkeys are present
Government and agency teams benefit most from this unified approach because it eliminates context switching between XML validators, Python linters, and custom shell scripts. By treating metadata as code, you gain version-controlled rule definitions, reproducible validation, and audit-ready pipeline logs. For formal metadata specifications, reference the ISO 19115 standard.
Troubleshooting & Best Practices
- Plugin not loading: Verify
load-pluginspoints to a valid Python module path. Runpylint --list-pluginsto confirm registration. - False positives on non-metadata JSON: Add a namespace check or require a
spatial_metadata_versionkey before running validation rules. - Line number mapping: Raw checkers default to
line=1. If you need precise offsets, parse the file manually and calculate line breaks, or switch toBaseRawFileCheckerwithget_file_content(). - Performance: Cache parsed metadata in CI runners. PyLint processes files sequentially; parallelize using
pylint --jobs=4for large catalogs. - Rule versioning: Store validation thresholds in a
spatial_rules.yamlfile and load them dynamically. This keeps your checker stateless and simplifies compliance audits.
By embedding spatial validation directly into your Python quality gates, you enforce data governance without fragmenting your developer workflow. The plugin scales with your catalog, integrates seamlessly with existing CI runners, and produces standardized outputs that satisfy both engineering and compliance teams.