Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add dynamic shared memory allocation#41

Open
michael-kenzel wants to merge 1 commit intomasterfrom
mikey/smem
Open

Add dynamic shared memory allocation#41
michael-kenzel wants to merge 1 commit intomasterfrom
mikey/smem

Conversation

Copy link
Contributor

michael-kenzel commented Aug 4, 2023 *
edited
Loading

This adds a parameter to allocate a given amount of dynamic shared memory upon kernel launch. Wrapper functions that just pass 0 are provided for backwards compatibility with existing code. Currently implemented for CUDA only, other platforms will error.

corresponding thorin changes: AnyDSL/thorin#144

michael-kenzel requested a review from richardmembarth August 4, 2023 21:34
michael-kenzel marked this pull request as draft August 9, 2023 12:27
michael-kenzel force-pushed the mikey/smem branch from 27f8713 to 4c2b42b Compare August 10, 2023 20:12
michael-kenzel force-pushed the mikey/smem branch 2 times, most recently from d538558 to d8902e2 Compare August 24, 2023 11:56
michael-kenzel force-pushed the mikey/smem branch from d8902e2 to 7ad75e2 Compare August 1, 2024 04:11
michael-kenzel marked this pull request as ready for review August 1, 2024 17:59
Hugobros3 reviewed Aug 5, 2024
aql.kernarg_address = kernel_info.kernarg_segment;
aql.private_segment_size = kernel_info.private_segment_size;
aql.group_segment_size = kernel_info.group_segment_size;
aql.group_segment_size = (kernel_info.group_segment_size + 15) / 16 * kernel_info.group_segment_size + launch_params.lmem;
Copy link
Contributor

Hugobros3 Aug 5, 2024 *
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevance of this change (the rounding part) ?

Copy link
Contributor Author

michael-kenzel Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to account for padding to align the start of the dynamic allocation.

stlemme reviewed Aug 5, 2024
#[import(cc = "thorin", name = "opencl")] fn opencl_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
#[import(cc = "thorin", name = "amdgpu_hsa")] fn amdgpu_hsa_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
#[import(cc = "thorin", name = "amdgpu_pal")] fn amdgpu_pal_with_lmem(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), i32, _body: fn() -> ()) -> ();
#[import(cc = "thorin")] fn local_memory() -> &mut addrspace(3)[u8];
Copy link
Member

stlemme Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename that to shared_memory_base to be more specific

Copy link
Contributor Author

michael-kenzel Aug 5, 2024 *
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to call it "local memory" since that's a more adequate/common moniker outside of CUDA. But we can call it local_memory_base() I guess.

Comment on lines +48 to +52
fn @@cuda(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { cuda_with_lmem(dev, grid, block, 0, body) }
fn @@nvvm(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { nvvm_with_lmem(dev, grid, block, 0, body) }
fn @@opencl(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { opencl_with_lmem(dev, grid, block, 0, body) }
fn @@amdgpu_hsa(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_hsa_with_lmem(dev, grid, block, 0, body) }
fn @@amdgpu_pal(dev: i32, grid: (i32, i32, i32), block: (i32, i32, i32), body: fn() -> ()) { amdgpu_pal_with_lmem(dev, grid, block, 0, body) }
Copy link
Member

richardmembarth Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ should be @ at the function

Copy link
Contributor Author

michael-kenzel Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

michael-kenzel force-pushed the mikey/smem branch from 7ad75e2 to 3c6e9a4 Compare August 5, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Hugobros3 Hugobros3 left review comments

richardmembarth richardmembarth left review comments

stlemme stlemme left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants