6 critical1 warning

count-1 $188/month 2 executions

Savings Waterfall — Plan budget: $188/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$132$57
2CY014 linterWARN Cache to eliminate 1 repeat run$28$28
Total$160$28
CRITICAL [count-1] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $132/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:301
WARNING DataFrame result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.
linter rule CY014
Estimated savings: $28/month
DataFrame result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.
Source: act3_antipattern.py:301

write-saveAsTable-blog_demo.converted_channels $183/month

Savings Waterfall — Plan budget: $183/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$128$55
Total$128$55
CRITICAL [write-saveAsTable-blog_demo.converted_channels] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $128/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:351

write-saveAsTable-blog_demo.converted_quality $110/month

Savings Waterfall — Plan budget: $110/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$77$33
Total$77$33
CRITICAL [write-saveAsTable-blog_demo.converted_quality] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $77/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:361

write-saveAsTable-blog_demo.fiscal_events $94/month

Savings Waterfall — Plan budget: $94/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$66$28
Total$66$28
CRITICAL [write-saveAsTable-blog_demo.fiscal_events] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $66/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:305

write-saveAsTable-blog_demo.converted_daily $57/month

Savings Waterfall — Plan budget: $57/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$40$17
Total$40$17
CRITICAL [write-saveAsTable-blog_demo.converted_daily] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $40/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:341

write-saveAsTable-blog_demo.high_value_summary $57/month

Savings Waterfall — Plan budget: $57/month
#RuleFixSavingsRemaining
1partition-pruning snapshotCRIT Add partition filter on events$40$17
Total$40$17
CRITICAL [write-saveAsTable-blog_demo.high_value_summary] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date
snapshot rule partition-pruning
Estimated savings: $40/month
Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.
Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:
SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'
Source: act3_antipattern.py:321

Summary

Plans analyzed6
Plans costed6
Total monthly cost$688
Total monthly savings$510
Confidencehigh
Snapshots matched1