Optimize Dask EWA persist and prune fornav tasks#707

Open

Zaczero wants to merge 1 commit intopytroll:mainfrom

Zaczero:dask-ewa-persist-prune-upstream

Open

Optimize Dask EWA persist and prune fornav tasks#707
Zaczero wants to merge 1 commit intopytroll:mainfrom
Zaczero:dask-ewa-persist-prune-upstream

Conversation

Copy link

Contributor

Zaczero commented Mar 1, 2026

pyresample/ewa/dask_ewa.py: persist=True now actually enables overlap-aware task pruning by computing ll2cr once, caching per-block extents, and wiring persisted ll2cr blocks as dependencies so later computes do not re-run ll2cr.
pyresample/ewa/dask_ewa.py: fornav task generation avoids doing real work for input/output chunk pairs that provably cannot overlap.
pyresample/ewa/dask_ewa.py: combining results avoids repeated allocations (_sum_arrays) and returns early when only one contributor exists.

Benchmarks

Cold total (precompute + first compute): pruned median 0.177s (min 0.170, max 0.212) vs unpruned median 0.335s (min 0.303, max 0.373) = 1.89x faster (-47.1% wall).
Warm compute (second compute, ll2cr cache reused): pruned median 0.097s (min 0.092, max 0.100) vs unpruned median 0.270s (min 0.251, max 0.288) = 2.79x faster (-64.2% wall).
Peak memory usage unchanged.

Tests added
Tests passed
Fully documented


 Optimize Dask EWA persist and prune fornav tasks

8444f72

djhoese added the enhancement label Mar 2, 2026

djhoese self-assigned this Mar 2, 2026

Copy link

codecov bot commented Mar 4, 2026 *

edited

Loading

Codecov Report

Patch coverage is 99.38272% with 1 line in your changes missing coverage. Please review.
Project coverage is 93.80%. Comparing base (4da28b0) to head (8444f72).
Report is 32 commits behind head on main.

Files with missing lines	Patch %	Lines
pyresample/ewa/dask_ewa.py	98.64%	1 Missing

Additional details and impacted files

@@ Coverage Diff @@ ## main #707 +/- ## ========================================== + Coverage 93.67% 93.80% +0.13% ========================================== Files 89 89 Lines 13621 13764 +143 ========================================== + Hits 12759 12912 +153 + Misses 862 852 -10

Flag	Coverage D
unittests	`93.80% <99.38%> (+0.13%)`

Flags with carried forward coverage won't be shown. Click here to find out more.

View full report in Codecov by Sentry.
Have feedback on the report? Share it here.

New features to boost your workflow:

Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

djhoese reviewed

Mar 4, 2026

View reviewed changes

Copy link

Member

djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting pull request. Thanks for putting this together. I've written a couple comments inline, but otherwise I have some more general comments and concerns:

Just FYI that the persisted ll2cr functionality is something I've turned off in my own processing because I found that it only helped performance when a small amount of the input data overlapped the target area. It'll be interesting to try it again with a better implementation.
One concerning thing in dask in the last year is that they've restructured their dependency graph handling and it is not very bad for performance to mix array tasks and delayed tasks. I think I had fixed the main non-persist usage in this EWA code to use map_blocks instead, but it looks like the persist functionality still depends on it. In the long term this should be replaced if at all possible and may actually simplify some of the logic. For example, carry around a dask array of ll2cr chunks with the row/col built in to the structure.
I asked about how you made these PRs in another thread, but looking at this I want more detail. What AI did you use to analyze your processing chain? How/Why do you have PRs for bilinear resampling, nearest neighbor resampling, and EWA resampling? Are you really using all 3 of these in your processing? Why do you understand this EWA code? I mean, why bother? I've been the only one "maintaining" this so I'm just a little surprised that someone analyzed it enough to contribute this large of a rewrite.

pyresample/ewa/dask_ewa.py

Comment on lines +192 to +193

		if len(arrays) == 1:
		return arrays[0]

Copy link

Member

djhoese Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition isn't possible since you added the if len(valid_chunks) == 1 in the _combine_fornav function, right?

pyresample/ewa/dask_ewa.py

Comment on lines +318 to +323

		ll2cr_delayeds = ll2cr_result.to_delayed()
		flat_delayeds = [
		(in_row_idx, in_col_idx, delayed_block)
		for in_row_idx, delayed_row in enumerate(ll2cr_delayeds)
		for in_col_idx, delayed_block in enumerate(delayed_row)
		]

Copy link

Member

djhoese Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this work done outside of the if persist block? I see that flat_delayeds is used, but since you end up using ll2cr_result.name can't we use the chunks/chunk sizes of ll2cr_result to do the for-looping? It seems this code assumes that iterating over these chunks and the conversion to delayed objects is worth dropping the two num_row/col_blocks arguments.

pyresample/ewa/dask_ewa.py

		key = (ll2cr_result.name, in_row_idx, in_col_idx)
		if extent is None:
		continue
		block_cache[key] = persisted_delayed.key

Copy link

Member

djhoese Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between .name and .key on a dask delayed object?

pyresample/ewa/dask_ewa.py

Comment on lines +141 to +144

		return (
		y_slice.stop > row_min and y_slice.start <= row_max and
		x_slice.stop > col_min and x_slice.start <= col_max
		)

Copy link

Member

djhoese Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure I could be reading this wrong, but doesn't this only allow chunks to be processed that entirely encompass the output chunk? That's not what we want. We want to process the data if there is any overlap.

Labels

enhancement

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Dask EWA persist and prune fornav tasks#707

Optimize Dask EWA persist and prune fornav tasks#707
Zaczero wants to merge 1 commit intopytroll:mainfrom
Zaczero:dask-ewa-persist-prune-upstream

Conversation

Zaczero commented Mar 1, 2026

Benchmarks

Uh oh!

codecov bot commented Mar 4, 2026 *

edited

Loading

Uh oh!

Codecov Report

Uh oh!

djhoese left a comment

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Zaczero commented Mar 1, 2026

Benchmarks

Uh oh!

codecov bot commented Mar 4, 2026 * edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

djhoese left a comment

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

djhoese Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 4, 2026 *

edited

Loading