Skip to content

Conversation

@RayChromium
Copy link
Owner

First of all, now it can be compiled and run on dione after fixing some errors;
and I tried to eliminate the loop in calculateHistograms by mapping the one-dimensional thread ID onto two-dimensional indices. A common way (from good old google 🙏) to do this is to use division and modulus operations, here I did i = index / numD and j = index % numD map the one-dimensional thread ID onto a two-dimensional pair (i, j).
Let's say the maxInputLength is 4, then all the mapped indices are:

(0, 0), (0, 1), (0, 2), (0, 3),
(1, 0), (1, 1), (1, 2), (1, 3),
(2, 0), (2, 1), (2, 2), (2, 3),
(3, 0), (3, 1), (3, 2), (3, 3)

And it seems to work.

Also, modify the printf with outfil pointer using fprintf, so that the result is saved in omega.out.....but the results looks a little bit wierd though......

haven't got idea about the wierd values in omega.out at lines around 20.000, the omega values are so large around 1000, don't know it that make sense.

@RayChromium
Copy link
Owner Author

But at least it gets the job done, good "start" though...if this make sense, now we can think about how to improve the runtime I guess

@RayChromium
Copy link
Owner Author

And also, I figured out how to build and run this on dione after its upgrade:

  1. when login to dione, run module load gcccuda and module load cuda, these are the modules that actually works;
  2. go to the directory of this repo, run ./generate.sh make and ./generate.sh run, I simply wrapped the building commands in the makefile and the run command in the shell script

@javipg32 javipg32 merged commit 3be56a3 into main Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants