C++ Computation parallelization

Question

C++ Computation parallelization

Chase Perry

Never parallelized before. Before I hop in I wanna make sure this situation is even relevant:
Every frame, I have to calculate 25 million addition expressions. (n^2) operations on 500 elements. Can this be improved with parallelization?

Attached: 1543113408061.jpg (556x616, 43K)

August 7, 2019 - 23:28

Other urls found in this thread:

software.intel.com/sites/landingpage/IntrinsicsGuide/
software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions
en.wikipedia.org/wiki/Prefix_sum
twitter.com/NSFWRedditImage

Asher Phillips

5000*

August 7, 2019 - 23:30

Jaxson Ross

yes, see openmp

August 7, 2019 - 23:31

Austin Phillips

ty

August 7, 2019 - 23:44

James Campbell

use Rust and Rayon, problem solved

August 7, 2019 - 23:50

Ethan Howard

> waaaaah i have to do 25 million operations that take less than one cycle each on a multi ghz computer
> also for some reason i'm doing this on the cpu instead of the gpu
this is why /tech/ was better

August 7, 2019 - 23:51

Lucas Taylor

your an asshat
what the fuck is a frame?

August 7, 2019 - 23:53

Jonathan Hill

Anyone have the handshake gif?

August 7, 2019 - 23:54

Oliver Cox

Use openCL

August 7, 2019 - 23:55

Gabriel Morgan

are you adding matrices? you should use a linear algebra library like armadillo, it provides built in support for parallel and vector processing.

August 8, 2019 - 00:05

Colton Miller

cpp-taskflow
thank me later

August 8, 2019 - 00:32

Henry Sanchez

gpu usually cant have shared memory
better try in-process thread first

August 8, 2019 - 00:34

Aaron Bailey

>calculating ints with gpu
imagine

August 8, 2019 - 00:34

Elijah Brown

Use C++
Use simd intrinsics in C++
software.intel.com/sites/landingpage/IntrinsicsGuide/
You can't do this stuff properly in any other language.
Not even rust, which considers all simd intrinsics unsafe.

August 8, 2019 - 00:44

Leo Gomez

plz tell me about doing these on the gpu while im sultaneously doing 5000 draw calls for modern opengl

August 8, 2019 - 00:44

Luis Reed

This is a perfect case for simd
I sure hope you didn't fall for the latest Jow Forums meme and bought an AMD processor which are notoriously terrible at SIMD
software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions

August 8, 2019 - 00:47

Hudson Parker

not OP but actually thanks for this

August 8, 2019 - 00:49

Christopher Robinson

SIMD

August 8, 2019 - 00:50

Dominic Sanders

>nibbas first SIMD

August 8, 2019 - 00:51

Robert Diaz

stop using asm tier raw simd like a cave man lmao
at least use portable library like xsimd or eigen

August 8, 2019 - 00:53

Kayden Myers

SSBO, draw Indirect, bindless textures + dsa.
vulkan like performance 1mil draw class with ez

August 8, 2019 - 00:53

Grayson Powell

*calls

August 8, 2019 - 00:54

Dylan Anderson

If you make a classical for-loop in C++ it will vectorize it and emit the simd instructions for you anyways. Just check out godbolt with -O3

August 8, 2019 - 00:55

Adam Foster

>takes less than one cycle each
No. No they don't.

August 8, 2019 - 02:13

Joseph Turner

If you have simd, then yes.
If you can process 4 at a time with simd intrinsics. Then that's 0.25 cycles per element, 4 additions each cycle/uop.

August 8, 2019 - 02:58

Blake Davis

You can use a c++ library called thrust if you want to use the gpu for those calculations instead.

August 8, 2019 - 03:21

Alexander Evans

It can be if you have an algorithms that can be.

August 8, 2019 - 04:02

Jonathan Lee

prefix sums
en.wikipedia.org/wiki/Prefix_sum

August 8, 2019 - 04:33

Angel Reed

>NU-G CANT INTO SIMD

August 8, 2019 - 06:18

Isaac Lopez

I'm pretty sure the compiler will vectorize it for you.

August 8, 2019 - 06:21

Jason Scott

Write a for loop. Make sure you use a modern C++ compiler. Use the appropriate compiler flags (-mavx or /machine:avx and the like). Look at the assembly code generated.
Watch in amazement how it is all auto-vectorized for you.

August 8, 2019 - 07:28

1 2 ... 4 Next

C++ Computation parallelization

Last threads