C++ Computation parallelization

Never parallelized before. Before I hop in I wanna make sure this situation is even relevant:
Every frame, I have to calculate 25 million addition expressions. (n^2) operations on 500 elements. Can this be improved with parallelization?

Attached: 1543113408061.jpg (556x616, 43K)

Other urls found in this thread:

software.intel.com/sites/landingpage/IntrinsicsGuide/
software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions
en.wikipedia.org/wiki/Prefix_sum
twitter.com/NSFWRedditImage

5000*

yes, see openmp

ty

use Rust and Rayon, problem solved

> waaaaah i have to do 25 million operations that take less than one cycle each on a multi ghz computer
> also for some reason i'm doing this on the cpu instead of the gpu
this is why /tech/ was better

your an asshat
what the fuck is a frame?

Anyone have the handshake gif?

Use openCL

are you adding matrices? you should use a linear algebra library like armadillo, it provides built in support for parallel and vector processing.

cpp-taskflow
thank me later

gpu usually cant have shared memory
better try in-process thread first

>calculating ints with gpu
imagine

Use C++
Use simd intrinsics in C++
software.intel.com/sites/landingpage/IntrinsicsGuide/
You can't do this stuff properly in any other language.
Not even rust, which considers all simd intrinsics unsafe.

plz tell me about doing these on the gpu while im sultaneously doing 5000 draw calls for modern opengl

This is a perfect case for simd
I sure hope you didn't fall for the latest Jow Forums meme and bought an AMD processor which are notoriously terrible at SIMD
software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions

not OP but actually thanks for this

SIMD

>nibbas first SIMD

stop using asm tier raw simd like a cave man lmao
at least use portable library like xsimd or eigen

SSBO, draw Indirect, bindless textures + dsa.
vulkan like performance 1mil draw class with ez

*calls

If you make a classical for-loop in C++ it will vectorize it and emit the simd instructions for you anyways. Just check out godbolt with -O3

>takes less than one cycle each
No. No they don't.

If you have simd, then yes.
If you can process 4 at a time with simd intrinsics. Then that's 0.25 cycles per element, 4 additions each cycle/uop.

You can use a c++ library called thrust if you want to use the gpu for those calculations instead.

It can be if you have an algorithms that can be.

prefix sums
en.wikipedia.org/wiki/Prefix_sum

>NU-G CANT INTO SIMD

I'm pretty sure the compiler will vectorize it for you.

Write a for loop. Make sure you use a modern C++ compiler. Use the appropriate compiler flags (-mavx or /machine:avx and the like). Look at the assembly code generated.
Watch in amazement how it is all auto-vectorized for you.