count-1 $188/month 2 executions| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $132 | $57 |
| 2 | CY014 linter | WARN Cache to eliminate 1 repeat run | $28 | $28 |
| Total | $160 | $28 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:301result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.CY014result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.act3_antipattern.py:301write-saveAsTable-blog_demo.converted_channels $183/month| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $128 | $55 |
| Total | $128 | $55 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:351write-saveAsTable-blog_demo.converted_quality $110/month| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $77 | $33 |
| Total | $77 | $33 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:361write-saveAsTable-blog_demo.fiscal_events $94/month| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $66 | $28 |
| Total | $66 | $28 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:305write-saveAsTable-blog_demo.converted_daily $57/month| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $40 | $17 |
| Total | $40 | $17 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:341write-saveAsTable-blog_demo.high_value_summary $57/month| # | Rule | Fix | Savings | Remaining |
|---|---|---|---|---|
| 1 | partition-pruning snapshot | CRIT Add partition filter on events | $40 | $17 |
| Total | $40 | $17 |
cluster_yield_sandbox.blog_demo.events — no filter on partition column event_datepartition-pruningcluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.event_date was found anywhere in the plan.spark.table("cluster_yield_sandbox.blog_demo.events")
.filter(col("event_date") === "2025-01-15") // enables partition pruning
.select(...)
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
act3_antipattern.py:321| Plans analyzed | 6 |
| Plans costed | 6 |
| Total monthly cost | $688 |
| Total monthly savings | $510 |
| Confidence | high |
| Snapshots matched | 1 |