Cluster Yield CI — Enrichment Report

clusteryield

Enrichment Report

6 critical1 warning

`count-1` $188/month 2 executions

Savings Waterfall — Plan budget: $188/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$132	$57
2	`CY014` linter	WARN Cache to eliminate 1 repeat run	$28	$28
		Total	$160	$28

CRITICAL [count-1] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $132/month

Table cluster_yield_sandbox.blog_demo.events is partitioned by event_date but the scan has no partition filters. Spark will read every partition of the table. The full table is 94.6 GB.

No filter on event_date was found anywhere in the plan.

AQE note: Adaptive Query Execution is enabled but cannot help here — partition pruning is a scan-time decision that occurs before AQE's runtime optimizations. This must be fixed in the query.

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:301

WARNING DataFrame result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.

linter rule CY014

Estimated savings: $28/month

DataFrame result has 2 terminal actions (.count(), .saveAsTable()) without .cache(). Each action recomputes the full lineage from source. Add .cache() before the first action, and .unpersist() when done.

Source: act3_antipattern.py:301

`write-saveAsTable-blog_demo.converted_channels` $183/month

Savings Waterfall — Plan budget: $183/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$128	$55
		Total	$128	$55

CRITICAL [write-saveAsTable-blog_demo.converted_channels] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $128/month

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:351

`write-saveAsTable-blog_demo.converted_quality` $110/month

Savings Waterfall — Plan budget: $110/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$77	$33
		Total	$77	$33

CRITICAL [write-saveAsTable-blog_demo.converted_quality] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $77/month

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:361

`write-saveAsTable-blog_demo.fiscal_events` $94/month

Savings Waterfall — Plan budget: $94/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$66	$28
		Total	$66	$28

CRITICAL [write-saveAsTable-blog_demo.fiscal_events] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $66/month

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:305

`write-saveAsTable-blog_demo.converted_daily` $57/month

Savings Waterfall — Plan budget: $57/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$40	$17
		Total	$40	$17

CRITICAL [write-saveAsTable-blog_demo.converted_daily] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $40/month

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:341

`write-saveAsTable-blog_demo.high_value_summary` $57/month

Savings Waterfall — Plan budget: $57/month

#	Rule	Fix	Savings	Remaining
1	`partition-pruning` snapshot	CRIT Add partition filter on events	$40	$17
		Total	$40	$17

CRITICAL [write-saveAsTable-blog_demo.high_value_summary] Full scan on cluster_yield_sandbox.blog_demo.events — no filter on partition column event_date

snapshot rule partition-pruning

Estimated savings: $40/month

Recommendation: Add a WHERE clause on the partition column:

spark.table("cluster_yield_sandbox.blog_demo.events")
  .filter(col("event_date") === "2025-01-15")  // enables partition pruning
  .select(...)

In SQL:

SELECT * FROM cluster_yield_sandbox.blog_demo.events WHERE event_date = '2025-01-15'

Source: act3_antipattern.py:321

Summary

Plans analyzed	6
Plans costed	6
Total monthly cost	$688
Total monthly savings	$510
Confidence	high
Snapshots matched	1

count-1 $188/month 2 executions

write-saveAsTable-blog_demo.converted_channels $183/month

write-saveAsTable-blog_demo.converted_quality $110/month

write-saveAsTable-blog_demo.fiscal_events $94/month

write-saveAsTable-blog_demo.converted_daily $57/month

write-saveAsTable-blog_demo.high_value_summary $57/month

Summary

`count-1` $188/month 2 executions

`write-saveAsTable-blog_demo.converted_channels` $183/month

`write-saveAsTable-blog_demo.converted_quality` $110/month

`write-saveAsTable-blog_demo.fiscal_events` $94/month

`write-saveAsTable-blog_demo.converted_daily` $57/month

`write-saveAsTable-blog_demo.high_value_summary` $57/month