Storing and manipulating data that requires less than 1 byte

im not formally educated in programming matters, and as a hobbyist im just beginning to dabble into some of this lower-level stuff, so please be patient with me

say i have a large set of data pieces that each will never exceed 16 possible values, so each data piece only requires 4 bits to represent
a byte on most computers, of course, is usually 8 bits in size
bitwise operators can be used to store and manipulate 2 data pieces in a single byte
as i have a fairly large set of data, halving the memory and storage expenses of this data sounds like a great idea

would using bitwise operators in such a manner significantly impact speed of code execution?

obviously writing code to operate like this significantly increases code complexity in comparison to an implementation that simply stores each data piece in a byte
increased complexity increases the chances of bugs when writing and maintaining code
so on and so forth, but in this example scenario assume we have a perfect, bug-free implementation

would the additional overhead of bitwise operators introduce a significant decrease in speed when actively manipulating the data set?

Attached: 1543175008673.png (965x717, 981K)

Other urls found in this thread:

software.intel.com/sites/landingpage/IntrinsicsGuide/
godbolt.org/z/_Dl2_o
git.io/v5Fcl
amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685/ref=olp_product_details?_encoding=UTF8&me=
hackersdelight.org/
graphics.stanford.edu/~seander/bithacks.html
en.wikipedia.org/wiki/Bit_field
preshing.com/20150324/safe-bitfields-in-cpp/
godbolt.org/z/VL_O2a
en.wikipedia.org/wiki/Pascal_(programming_language)
prettyuglylittleliar.net/topic/10146-jenna-lynn-meowri-cosplayer/?page=1
godbolt.org/z/ntiIlj
godbolt.org/z/UcOunk
twitter.com/SFWRedditGifs

Not really. as you would be able to implement a lot of SIMD/SWAR based methods, where you would tread an a byte as simply a "2 element vector of 4-bit elements" and a 16-bit integer as a "4 element vector of 4-bit elements". And these days there are simd vector registers of 128 bits and 256 bits and even 512 bits.
I'm a C++ developer that regularly gets involved in SIMD code and I can already imagine an algorithm to "unpack" a large vector of 4-bit elements back into bytes for some further processing though it depends on exactly what kind of processing you plan to do with them.

In general though, the speedup in being able to manipulate your 4-bit data in parallel has greater returns over processing and unpacking 4-bit elements out of bytes in some serial manner. I hope that makes sense.

An example situation is: pack your 4-bit elements into 8-bit bytes for optimal storage, and maybe introduce a compression scheme depending on how your data behaves.
when you want to do some arithmetic on your data, then reading and "unpacking" your data to do some arithmetic on them can be done in a hugely paralleled way to where any "overhead" of unpacking your data is nothing compared to the returns of being able to process upwards of 32 or 64 elements in parallel.

Care to share more context?

Attached: nasa.webm (360x696, 137K)

if you can get away with it just use 1byte per data set. while bitwise operations are pretty fast they aren't free.

also not sure what language you plan to use but i'd recommend the book "Hacker's Delight"

thanks for the reply

im not familiar with simd/swar, but if im reading what comes up from my quick search correctly its basically a cpu technology that allows explicit manipulation of specific data segments within the "contents" of a register
i dont know too much on the hardware side of things yet, so i will need to research that with more depth later

if im correct in my interpretation, one concern for me would be how many cpu architectures support this technology
code portability and all that
but it definitely sounds like the sort of thing i would be trying to do in my example

speaking of my example, thats all it is
i dont actually have anything im trying to do
it was simply a mechanism to frame my question with
im just trying to learn about some of the lower-level performance and efficiency factors that "real" programmers have to consider

i was thinking in the context of the c language, as i consider it to be one of the lowest common denominators between systems and as such a good starting place for discussion
c++ is very similar as far as my knowledge goes, and in most aspects is pretty much an extension of c, so that should be fine

>the speedup in being able to manipulate your 4-bit data in parallel has greater returns over processing and unpacking 4-bit elements out of bytes in some serial manner
if im reading this correctly, you mean that loading, say, a 32-bit int into a register and addressing the individual segments of the data is faster than, say, reading a single byte out of memory and manipulating it with bitwise operators?
if so, makes sense to me
i just have no real idea about how one would go about doing that
do you have any recommended reading in that regard?

>any "overhead" of unpacking your data is nothing compared to the returns of being able to process upwards of 32 or 64 elements in parallel.
so more calculations in parallel > faster individual calculations?

again, thank you for your response

SIMD - Single Instruction Multiple Data: Basically the idea of loading in multiple elements of data at once and doing arithmetic on them in parallel

SWAR - SIMD Within a Register: Basically, simulating the same thing, but rather than having a dedicated vector-register, you use a general purpose integer register and do some "bit trickery" to simulate arithmetic on parallel elements.

A little example of swar: you have a 64-bit computer, you can load a 64-bit integer from memory and do some arithmetic on it for any given register. Now lets say you load a 64-bit "chunk" in your packed 4-bit data into a 64-bit register. You essentially now have a register of 32 4-bit-elements in it.

For example, lets say you want to "set the lowest bit of each of the 4-bit elements" in this 64-bit "chunk"
All you have to do is binary OR this 64-bit register with the 64-bit integer value "010101010101010101010101010101"(0x5555555555555555) and then write it back. This would allow you to operate on 32-elements at once in parallel. I could also go over cases like "add one to each 4-bit element, with overflow/saturation/etc". but this is a general case.
This would be something that a lot of platforms support as pretty much anything that has a 32 or 64-bit register and supports binary OR can do this stuff and you can have totally plain C or C++ code for this that can compile on any platform and most compilers will auto-vectorize this for your specified processor to use CPU features like SSE(x86) or NEON(ARM) to process up to 64 4-bit elements in parallel.

You cant always trust the compiler to know what you are doing well enough to vectorize. And sometimes will have to do some hand-optimized code. But you dont have to write direct assembly files, as there exists intrinsics that you can put right in your C or C++ code to get the compiler to generate specific cpu instructions.
I have this page pretty much bookmarked:
software.intel.com/sites/landingpage/IntrinsicsGuide/

>so more calculations in parallel > faster individual calculations?
thats the idea yea
if you can process 32-elements in parallel and get a x32 speedup, then any "bit unpacking" overhead that might would have to do would just dip into that x32 speedup and make it something like x31.4 or x31.6 or something. Which is still nothing compared to the overall speedup.

sorry ive been saying 4-bit. i mean 2 bit

fuck I've been mixing up between 4 bits and 2-bits so some of my numbers are wrong but the general concepts are all there. I'm running on cold brew with an empty stomach on xmas eve

ah, so its the sort of thing that can often be left to the compiler if a system supports it
so when i OR the "packed" set of elements, the compiler will (hopefully) optimize it with cpu-specific SIMD instructions, but the code will still compile and run - if not quite as well - on systems that dont support these instructions or compilers that arent designed with them in mind
that addresses my concerns there

your link goes quite a bit beyond my experience, but i will bookmark it for when i get there

thanks yet again fren

yeah, that makes sense now that you explained how SIMD code is implemented on the programmer's end

no worries, i caught it when reading
no problem there

>I'm a c++ developer
Translation: don't listen to me as half the time I don't have any idea what I'm talking about and the other half I'm jerking myself off to the idea of over utilizing OOP.

each compiler targets a specific architecture

an x86 compiler will use all x86 features it can throw at your code to speed if up -IF- it can detect that it is equivalent to what you are doing.

Something like a simple binary OR is so easy for a compiler to detect that it will probably vectorize it pretty immediately.

By default, gcc will generate "standard" vanilla assembly code that pretty much all 64-bit x86 processors will have. Though through the years, x86 processors have implemented new instruction sets and features that you must ask the compiler to generate. Such as SSE or BMI or AVX. Some of these features aren't something that a compiler can ever detect its usage for so you or might not be "smart" enough to generate special cpu code around the feature sets you tell it too. Or maybe, you're just generally "smarter" than the compiler in some cases to detect that a certain instruction will speed up your code by a lot. And in that case you would use intrinsics pages for the specific processor architecture that you are targetting. And in your code you would probably do some preprocessor arithmetic like "#if 'targetting an arm processor'"

Here's a little novel example of what i mean
godbolt.org/z/_Dl2_o

This site will show you the compiler output of your code snippets in real-time so you can see just how "smart" certain compilers are.

If you have a gitlab or github or something for this project you plan to do then i can toss you a pull request or two if you're serious about this "learning project"

I absolutely love pet-projects like this and making write-ups for it on my pass-time.
git.io/v5Fcl

ah the anti-oop boomer is here.

How trigger happy do you have to be to get an upset seizure at any mention of C++

go back to programming C shit no one uses on your thinkpad old man

i will admit that im not particularly fond of oop, but thats probably mostly because the things that i am interested in arent particularly well-served by such a design
i can definitely be considered younger (22), so during my associates we learned oop-by-default,
but i dont agree to that approach as a rule
i can see the merits of oop in certain scenarios, but not as a "default approach" like i was taught

i definitely need to read up on compilers
i know the general structure of one and how it works in principle, but i havent really learned any of the nuances and specific commands a programmer can issue to one to implement optimizations

that example you made and the site are pretty neat
i dont know much asm (or asm at all, really) but i can see what you mean
the highlighting make is pretty easy
one thing though is that it doesnt compile on a few of them, for example risc-v clang
is that because of your SIMD code specifically, or another part of the code that those compilers dont support?

like i said, i dont actually have a project in mind, quite frankly i cant think of anything in particular i would try to use this for myself
just trying to start a conversation and learn just a little bit more

my formal education really only consists of two years of .net, simple game design, and basic web dev principles
the lowest level thing we ever did is create our own (probably horrible) implementations of linked list in c++

before i really dive into low-level stuff, i just want to spend some time accumulating knowledge and general principles before diving in

>is that because of your SIMD code specifically, or another part of the code that those compilers dont support?
the code is fine. it looks like it's just something wrong on the godbolt backend for some of the compilers. Also some compilers like MSVC(microsoft's compiler) dont support the -Ofast flag and instead using \Ox or something.

What i did in that code is 100% standard C++ that all compliant C++ compilers support but it looks like that site's backend doesn't have some of the standard headers for some of the cross-compilers.

This is the error it has for the clang risc-v compiler.
:1:10: fatal error: 'cstdint' file not found

#include

^~~~~~~~~

1 error generated.

Compiler returned: 1


I highly recommend this book if you wanna get low-level to the bits and bytes of things. It has a chapter on SWAR stuff too. And a fuck load of content on integer-division(which is still a pretty tricky problem).

amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685/ref=olp_product_details?_encoding=UTF8&me=

(for some reason there are github repos with illegal pdfs out there too)

struct thing
{
char first:4;
char second:4;
};
1 byte, 4 bits each

>would the additional overhead of bitwise operators introduce a significant decrease in speed when actively manipulating the data set?
No, not in the slightest. Bitfields are exactly what you want. I have no idea why the c++ fuckhead's bringup SIMD.

Also don't do this since struct bitfields are extremely poorly defined. Use | and &.

if you really recommend it i would be willing to shell out the $50 for it, or at least look around for somewhere cheaper
it is christmas afterall, and if its a good book and a good author i dont see any reason to take away potential income
illegal github repos (something i would never ever consider looking into myself, of course) are something i think a person - absolutely theoretically, of course - would rather use for books that they are somewhat suspect of the value they may provide

about what level do you think i should be at to really understand the things the author discusses?

what languages does that work in?
i cant say i have ever come across that particular mechanic before, but im hardly very experienced

definitely would not call him(?) a fuckhead, very nice and helpful
anyway, he seems to approach it more from the side of efficient mass data processing, where a single operation or set of operations needs to be applied to a large dataset at once
it seems like it would fall flat when you need to do different operations to different sections in the dataset, which is where bitfields would shine more

then again, thats simply my understanding limited as it is

someone uses a language you dont like and suddenly them saying "the sky is blue" is wrong

Flipping and extracting bits is actually fairly easy.

To turn on a bit:
>byte |= 2^n
To turn off a bit:
>byte &= ~2^n
To get the value of a bit:
>byte > 7
...Assuming the bits are numbered 0-7.

>about what level do you think i should be at to really understand the things the author discusses?
It ranges from intermediate to advanced.
You can read some samples here:
hackersdelight.org/

or just check out this page

graphics.stanford.edu/~seander/bithacks.html

>what languages does that work in?
that's supported in C and C++
Though it's implementation layout is pretty unpredictable and unreliable
en.wikipedia.org/wiki/Bit_field

Personally, I use something like this. Which guarantees the underlying structure of your data.
preshing.com/20150324/safe-bitfields-in-cpp/

Also here is an example of how I'd index into a packed 4-bit element array to simulate how a serial "unpack" would probably go. This is the bit of overhead you might deal but it really depends on what you'd be doing with them.
godbolt.org/z/VL_O2a

i knew it was easier to do from the programmer's perspective
i was more concerned about how "easy" it is from the cpu's perspective
but from what i have heard here and read elsewhere, they really arent very taxing (but they arent free either)
im guessing they are somewhere around the overhead of a function call - measurable, but not particularly significant in most cases

>would using bitwise operators in such a manner significantly impact speed of code execution?
no. even on old CPUs like the 6502 it is quite fast and doesn't take up too many clock cycles. besides, what other choice do you have?
> obviously writing code to operate like this significantly increases code complexity in comparison to an implementation that simply stores each data piece in a byte
absolutely not complex at all. it's pretty fucking easy to store data in nibbles for later use.

>that stack overflow answer thats the first result to questions like this

wish i was one of the first to answer trivial shit on stack overflow to get all that rep

these seems like the beginning at least is right around my level, will order when next pay comes in
thanks a bunch

complex in the sense that it is more "involved" than simply using the smallest available built-in data type
and it can have potential issues with code maintainability if you get somebody who isnt too familiar with that sort of thing (like me in particular) working on your code base

good thread
save for the zealous anti C++ idiot

why does the pseudo-code on the page look like Golang? I've only ever seen the := operator there.

Attached: 1526471987026.png (446x265, 22K)

its pseudo code dude it aint gotta look like shit

yeah but I assume this document was written before Golang was a thing.

It's pascal i believe
en.wikipedia.org/wiki/Pascal_(programming_language)

Sauce pl0x

no

god damn that's cute
she's so desperate for attention and will do anything to get it
my ex was a lot like that, adorable if you ignore her for a little bit

Are you an idiot?

Attached: Screenshot_20181224_210332.png (799x757, 261K)

jesus christ this thread is alot of drivel for a simple question
packing values in bits does increase the execution time, do tests and figure out if it's worth the cost

prettyuglylittleliar.net/topic/10146-jenna-lynn-meowri-cosplayer/?page=1

shes like 80% plastic

what is this place

idk i learned a lot

you learned alot of irrelevant information if you're getting into compiler specific SIMD details when you just want to pack some data

that stuff seemed very related to someone thinking about the overhead of unpacking nibbles from bytes.
are you that boomer that was spouting some anti-oop shit when no one was even talking about it?

No, I'm just saying, you probably won't even be using SIMD, so it's kind of misleading advice

"whats the overhead like in unpacking packed nibbles"
"well, it depends on what you're doing with it, if you're doing element-independent stuff, you can offset most of this overhead by doing stuff in parallel, never having to unpack the nibbles"
"oh cool, (wants to know more)"
(discussion about data-parallelism, swar, simd, bit tricks, etc)

seems fine to me man

and you probably won't be doing any of that stuff, that's the point, most likely you'll just unpack it and do an operation manually
its worth it for big data structures where you need to save space

are there simd instructions that deal with 4 bit values? most deal with 64/128/512 packed integers

>and you probably won't be doing any of that stuff, that's the point,
huh? "The point" is to not offset overhead?

>most likely you'll just unpack it and do an operation manually
>its worth it for big data structures where you need to save space
I dont think you coherently followed the discussion

read the fucking thread

The point is you probably aren't going to be using SIMD for your particular use case so bringing it up is kind of irrelevant

>are there simd instructions that deal with 4 bit values? most deal with 64/128/512 packed integers
the swar method of adding a "vector" of 4-bit values within a 32-bit/64-bit integer register involves an and and some xors(i forgot the exact method but if you care i can write up a quick godbolt)

this method can be expanded into simd registers as well.

the godbolt posted above even shows the compiler doing this for you
godbolt.org/z/_Dl2_o


you cant explain SWAR without explaining SIMD too. It's literally the first letter in SWAR.

packing two 4-bit elements into a 8-bit byte is SWAR territory

brainlets people that didnt read the thread are looping back and causing the thread to repeat itself

what kind of mass-data set would only need 4 bits?

Newfags can't yandex

compressed ascii

A database of wages among neets

>implying fuck dolls aren't a thing

Some kind of user database of boolean options
Like if a website has some checkbox settings per-user

I have navigation data for game maps that uses all sorts of wierd bit packing like interlaced 2 and 4 bit values to fit everything in in a reasonable size

Do tell

(seriously though)

C++, define a class with two 4 bit bit-field entries and a pair of get/set methods.
Bit fields are overflow safe and you can pack two in a byte for data locality
Then just slap em in a std::vector.

Were a little past that point but yea sure

WEED

it's a grid of cells, and each cell needs to store its floor and ceiling height, flags, and links and traversal styles to adjacent cells, which gets really complex for long links (like say, for a character jumping from one cell to another) as it needs to mark every cell on the way over too
I managed to fit it all in a 12 byte structure using aggressive bit packing, lookup tables, relative pointers and staggering less important links over several cells, it takes some power to unpack them but it's worth in in this case seeing all the navigation data for the game is easily a few hundred mb in RAM

Why the fuck can't any of these plastic thots just be into it for the plastic.
Would be 10/10 if they just owed up to it and had fun with it.

>im not formally educated in programming matters
Alright then fuck off.

It's not literally Pascal, since ENDFOR is not a Pascal-ism. Like in C, for/while take a single statement, with optional BEGIN/END to form a compound statement (equivalent of { })

But, the use of := may have been inspired by ALGOL or Pascal, which use it as the assignment operator, with = reserved for equality test instead of ==.

In Golang, := has a more specific meaning, it is assignment with type inference.

Merry Xmas to everyone except for that one boomer that hates oop

this seems like an interesting thread so I'm saving it from page 10

Yea this is pretty interesting. Save for the usual spergs.
I wish there was a practical case study where something like this can be implemented rather than some shower thought the OP had

>I wish there was a practical case study where something like this can be implemented
colors are commonly stored as packed bits in a 32 bit int

Is there a practice color space that is just 4 bits per channel

no, that would make it a 12 bit color which is kind of an awkward size
16 colors or 256 colors were common in the past though

so 4-bit elements is pointless then

Depends, bitwise operators are pretty fast and it may be a good idea in many cases. Code complexity is not a problem since you can combine them with a | b

Attached: gyubzyq87xmx.png (640x619, 169K)

Just came back from an xmas party so here's some more sample code.

Here's a sample of doing 4-bit addition upon packed 8/16/32 bit registers. Going higher for 64-bit integers and 128-bit vector registers and such is trivial.

godbolt.org/z/ntiIlj

And here's an example of a compiler vectorizing the code with 128-bit registers

godbolt.org/z/UcOunk

>Oh and by the way, in theory integer may be 4 bytes but there is more memory allocated for it(I think at least 16 depending on your architecture) because of alignment so your memory save attempts may go to waste anyways.

I don't think this is right. First I've heard of it at least. I think you're potentially talking about cache lane size and memory alignment issues as this is something that vector-register loads and stores have to worry a bit about but your x86 cpu will generally always take 64-byte sized chunks into its cache to where integer alignment is kind of a non-issue depending on the context and instructions being used.

Attached: metal slug 3.webm (450x330, 2.02M)

this whole thread has been talking about SWAR arithmetic, in which you can keep all the bits packed as described and do fast arithmetic directly on them without ever having to bitshift-unpack them sequentially.

Adding onto this it's pretty easy to see why doing these calculations in parallel (although the individual calculations aren't running on different CPUs, you're just doing all the work in one compacted go) is faster than performing them individually.

OP, I don't know if you've looked into assembly but to keep it simple all CPUs have a set, move, add, and shift operation. If you had a 16-bit register with four 4-bit elements inside it, and you added 1 to all of them with individual calculations you'd end up with the following assembly...

SET R1 0b0001000100010001

SHIFT_RIGHT R2 R1 12
ADD R2 R2 0b0001
SHIFT_LEFT R2 R2 12

SHIFT_RIGHT R3 R1 8
ADD R3 R3 0b0001
SHIFT_LEFT R3 R3 8

SHIFT_RIGHT R4 R1 4
ADD R4 R4 0b0001
SHIFT_LEFT R4 R4 4

ADD R5 R1 R2
ADD R5 R5 R3
ADD R5 R5 R4

And this code doesn't even account for the data that was already in the three leftmost elements being added to their incremented copies. This assembly also isn't very efficient, which means it sure is good I don't work on compilers.

Here's a look at what the assembly would look like for what the other guy was talking about...

SET R1 0b0001000100010001
SET R2 0b0001000100010001
ADD R3 R1 R2

Boom, same outcome and it accounts for existing data and doesn't require nearly as many registers or instructions to complete.

>Here's a look at what the assembly would look like for what the other guy was talking about...
>SET R1 0b0001000100010001
>SET R2 0b0001000100010001
>ADD R3 R1 R2
>Boom, same outcome and it accounts for existing data and doesn't require nearly as many registers or instructions to complete.

Your code is wrong as it will propagate its carry bit into neighboring elements should any of the elements be something like 0b1111
see which accounts for that

Indeed it is, and I think eventually running into that flaw is part of the fun.

I was just trying to give OP a basic understanding of what's going on.

yeah im definitely going to have to learn assembly at some point, half of that at least went right over my head
thanks though, i will be saving this thread when it gets archived so even if i dont understand something now it will be useful for later

Really? You're this mad and illiterate? Gtfo.

It's not so complex after a while. It's pretty easy, just takes practice.I started learning machine language on a c64. No shortage of bit manipulation and packing techniques on that machine.

yeah, its not that it seems too complex - i just need to learn what everything actually means
it helps to know what you are looking at when trying to understand something, lol

gross 3dpd

>she

Jesus fucking christ you literal faggots glance from the jiggle on the chest to the jiggle between the legs and then kys

She's an actual girl though dude read the pretty ugly little liars thread

As i recall this was common practice back in the 80s.

prettyuglylittleliar.net/topic/10146-jenna-lynn-meowri-cosplayer/?page=1
She's a girl

Now i can fap in piece

Attached: 1409785479427.gif (336x200, 758K)

Would fertilize

Attached: IMG_20180227_215748.thumb.jpg.4f308fd448188f76d9ebf320ce8e2da9.jpg (400x500, 109K)

just fuck some sandwhich wrap its the same thing as her

If you managed to impregnate her the baby would come out wrapped in plastic

you're assuming that you're always going to be performing the same operations on them so you can parallelize, that's not always the case

this was a discussion fool
this whole thing has been based around a "depends what you do with them" discussion and how some things can be insanely faster than others while still keeping them packed together

>doesn't know about branchless multi-lane conditional code

keep em coming i'm havin a laugh
we got a plastic girl at work and could use more plastic jokes

Attached: heh.gif (500x375, 239K)

why yall posting 3d girl in a good thread

because the tread served its purpose unless you have another question to ask or something

>one post by someone lamenting how c++ developers tend to have bad habits
>WOW THIS THREAD ALMOST GOT RUINED
Get triggered easily, fag?

nah man i was just having a time lurking n watching yall argue so i bumped it when on page 10

what's the source of the op's pic?

thread over lets talk titties