Blog

Insights on Spark cost optimization, PySpark anti-patterns, and production data engineering.

We Linted 5,046 PySpark Projects on GitHub. Here's What Static Analysis Can and Can't Tell You.

We scraped 5,046 PySpark repos, organized them by code maturity, and scanned for anti-patterns. The data splits into patterns that decline with experience and patterns that concentrate in production code. Then we showed what opens up when you enrich static analysis with runtime context.