Programming Challenge - Binary Parser: BONUS

Original thread here: Some anons wanted me to make another thread for this, so here it is.
In the first challenge you made a program to translate binary digit character
encoded text into raw binary data, now you will make a program to translate
raw binary data into binary digit character encoded text.

Some advice: for this program the bit order of your processor is important.
If you will use the bit-wise shift instructions, then the same shift instruction
will not produce the same output for both endians, so you must design your
program according to your processor's architecture. In addition, make your
program such that it is able to process the input for either big endian or
little endian, depending on the options specified by the user.

Here is the binary text from the last thread, use your program to turn it into
raw data and then use your new program to translate it back into text:

1000011
01101111
01101110
01100111
01110010
01100001
01110100
01110101
01101100
01100001
01110100
01101001
01101111
01101110
01110011
00101100
00100000
01111001
01101111
01110101
00100000
01101000
01100001
01110110
01100101
00100000
01110011
01101111
01101100
01110110
01100101
01100100
00100000
01110100
01101000
01100101
00100000
01100011
01101000
01100001
01101100
01101100
01100101
01101110
01100111
01100101
00101110
00001010

Attached: hacker.png (1527x1017, 2.87M)

Other urls found in this thread:

pastebin.com/Cf16Z4BS
pastebin.com/CQjFVa5p
pastebin.com/7hnxQvHn
godbolt.org/z/u9SB_t
pastebin.com/GDgvqEpG
pastebin.com/XNfMXL3c
commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
twitter.com/SFWRedditVideos

Attached: challenge_bonus.png (745x860, 57K)

All desktops cpus have the same endianess

oh I didn't know that

Attached: 852.jpg (213x237, 9K)

is that first line missing a digit?

>hacker.png
That's from Artillery by SHAPE.

nonetheless the endianness of your CPU does matter for the bit-wise shift operation:
a = 0b10000000; // decimal 128, little endian
a = a

he's hacking the kernal bro

not planning on running Jow Forums games on my solaris server

Shucks! I was going to use my 2004 PowerPC mac mini.

homos not allowed in here sorry

yes here is the complete line

01100001

Endianness has nothing to do with bit order within a byte; the most significant bit is always first.

This is so completely wrong I can't even divine where exactly you went wrong.

B T F O
T
F
O

I should clarify that if you left shift a uint8_t sized type set to 128, it will actually overflow and end up as 0, regardless of your system's endianness

are you sure thats the correct one?
>aongratulations

01000011

I was the pext guy from the last thread

Here's my fastest implementation at the moment. Probably some room for improvement when it comes to reading a byte from stdin
#include
#include
#include
// compile with -O2 -mbmi2
int main()
{
std::uint8_t CurByte;
while( std::fread(&CurByte,1,1,stdin) == 1 )
{
const std::uint64_t Bin = __builtin_bswap64(
_pdep_u64(
CurByte,
0x0101010101010101
)
) | 0x3030303030303030;

std::fwrite(
&Bin,
1,
8,
stdout
);
}

return EXIT_SUCCESS;
}

./a.out

Attached: chrome_2018-10-01_23-24-43.png (182x230, 96K)

This is literally a single line of code in python.

HOW WILL 4GB MALLOC ASSEMBLY GUY EVER COMPETE

Plagiarism

but can you do it fast.

#part1
File.open("bin_out", 'w') do |file|
file.write(
File.open(ARGV[0]).each_line.map do |line|
line.to_i(2)
end.pack('C*'))
end

#part2
File.open("encode_out", "w") do |file|
file.write(
File.read(ARGV[0]).bytes.map do |c|
c.to_s(2).rjust(8, '0') + "\n"
end.join())
end

package main

import (
"fmt"
"os"
"bufio"
)

func main() {
fileName := os.Args[1]
file, err := os.Open(fileName)
if err != nil {
fmt.Fprint(os.Stderr, "%v\n", err)
}
defer file.Close()
// 30 0 31 1
// 7 6 5 4 3 2 1 0
lineScanner := bufio.NewScanner(file)
for lineScanner.Scan() {
uint8Slice := []byte(lineScanner.Text())
var buffer int
for i, j := range uint8Slice {
if j == '\x31' {
buffer += 1

bump

turns out of i move the OR to happen before the bswap i can get a movbe in there for the final write and get a little more speed
updated a bit
#include
#include
#include
// compile with -O2 -mbmi2 -mmovbe
int main() {
std::uint8_t CurByte;
while (std::fread(&CurByte, 1, 1, stdin) == 1) {
const std::uint64_t Bin = __builtin_bswap64(
_pdep_u64(CurByte, 0x0101010101010101) | 0x3030303030303030);
std::fwrite(&Bin, 1, 8, stdout);
}

return EXIT_SUCCESS;
}


now lets see one of you do faster

Attached: 2018-10-07_22-37-23.gif (82x78, 117K)

Post instructions for how to compile

>Not recognizing Anal AIDS cancer that is Go

I'm not into effeminate meme languages

Neither am I, but you should be able to spot go at glance.

Nope

>imports declaration
>function declaration
>sanity checks
It's glaring.

It's not pretty, but it works, and it's fast. Put OP's string in a file called bin.txt
#include
#include
#include
#include
#include

int bin2int(const char *);
size_t powll(int, int);

int main(void)
{
/* Open bin.txt */
int file = open("./bin.txt", O_RDONLY, S_IRUSR |S_IWUSR);
if (file == -1)
{
perror("unable to open bin.txt");
exit(EXIT_FAILURE);
}

/* Get size of file */
struct stat sb;
if (fstat(file, &sb) == -1)
{
perror("unable to get file size");
exit(EXIT_FAILURE);
}

/* Map bin.txt into memory */
char *ptr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, file, 0);

/* Process mapped file */
while (*ptr)
{
while (*++ptr != '\n');

printf("%c", bin2int(ptr++ - 1));
}
}

/* Converts binary string into a ascii char */
int bin2int(const char *str)
{
int ret = 0;
int pow = 0;
for (int i = 6; i >= 0; i--)
ret += (*str-- - 48 ) * powll(2, pow++);

return ret;
}

/* Returns the given power of the given base */
size_t powll(int base, int pow)
{
size_t ret = 1;
for (int i = 0; i < pow; i++)
ret *= base;

return ret;
}

If you're going to go with intrinsics, you could fill one vector with '0' characters and one with '1' characters and then use the SSE blend operation with the data itself as the mask to produce the character string.

it still wouldn't be as fast as a single pdep though i believe in terms of instructions/cycles per byte.

use this to create one gigabyte of random binary data for benchmarks
$ dd if=/dev/urandom of=randomdata.bin bs=64M count=32


depending on how you read your input, be it stdin or loading a set filename, here's how to bench. If your program is verified to correctly then make the conversion correctly then you can just pipe your stdout to /dev/null during benchmarks. And of course make sure you post what cpu you're running on.
$ time ./(your program) >/dev/null


getting a segfault on yours despite having a 1-gigabyte bin.txt
$ strace ./67962081.out
...(snip)...
mmap(NULL, 1848896, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fd7c5c8b000
mprotect(0x7fd7c5cad000, 1671168, PROT_NONE) = 0
mmap(0x7fd7c5cad000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7fd7c5cad000
mmap(0x7fd7c5df8000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16d000) = 0x7fd7c5df8000
mmap(0x7fd7c5e45000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b9000) = 0x7fd7c5e45000
mmap(0x7fd7c5e4b000, 13888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fd7c5e4b000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7fd7c5e50500) = 0
mprotect(0x7fd7c5e45000, 16384, PROT_READ) = 0
mprotect(0x55cd88b64000, 4096, PROT_READ) = 0
mprotect(0x7fd7c5ea1000, 4096, PROT_READ) = 0
munmap(0x7fd7c5e51000, 157006) = 0
openat(AT_FDCWD, "./bin.txt", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1073741792, ...}) = 0
mmap(NULL, 1073741792, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd785c8b000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fd785c8afff} ---
+++ killed by SIGSEGV (core dumped) +++
[1] 15524 segmentation fault (core dumped) strace ./67962081.out

Attached: 1381338443354.jpg (800x3624, 243K)

>getting a segfault on yours despite having a 1-gigabyte bin.txt
Yeah, that's because I didn't write it to handle raw binary. It's mean to handle binary strings each of which are a single byte, each on it's own line; it fits the parameters of OP's challenge. BTW raw binary would already be in acsii if it was less than 128. Wtf is wrong you?

>raw binary
I meant bytes.

if you read the op and other people's source files you'd realize we're doing the reverse.

as in converting the byte some byte "A" to "01000001"

Oh gotcha. Easy fix.

Famous last words...

I didn't change the comments, but w/e.

Whoops I copied the wrong file.
pastebin.com/Cf16Z4BS

I don't have time to test it now, but here's an improvement.
pastebin.com/CQjFVa5p

pastebin.com/7hnxQvHn

35 seconds to run on:
ryzen 1400
8gb ram single channel
7200 rpm hard drive
borrowing the vector instructions from the cpu guy

to compile use gcc -O2 -mbmi2 -march=native main.c
to run use time ./a.out > /dev/null

Attached: 1672495217927.png (1371x1060, 646K)

I'm its ruby so install ruby and put
#!/usr/bin/env ruby
at the top or run it with
ruby filename.rb
they're separate programs and take a single argument as a file path and output a new file

var{stdin,stdout:o}=process;require("readline").createInterface({input:stdin,output:o,terminal:!1}).on("line",l=>o.write(String.fromCharCode(parseInt(l,2))))

Bump

What's up Jow Forums? No late night hackers at this time of the day?

Okay, this took me too long to make. Sorry for shitty code:
#include

int main(int argc, char **argv) {
char fname[256];

fprintf(stderr, "Enter a filename: ");
fgets(fname, 256, stdin);
unsigned int i;
for (i = 0; fname[i] != '\n' && fname != '\0'; i++) {}
fname[i] = '\0';

FILE* f = fopen((const char*) fname, "r");
if (f == NULL) {
fprintf(stderr, "Unable to open file '%s'!\n", fname);
}

char idx = 0x7;
unsigned char c = 0x00;
int buf = fgetc(f);
while (buf != EOF) {
buf -= 0x30;
if (buf = 0 && idx >= 0x0) {
c |= ((unsigned char) buf - 0x30) > idx);
fclose(f);

return 0;
}

Ok, what is this supposed to do? Also, why are you reading from '/mnt/c/programming/randomdata.bin'
I don't have that file in my system, what is it supposed to be?

W8 b8 m8
I r8 8/8

Yeah, you don't have it because you need to to install a random number generator to get randomdata duh

i'm not very good at haskell but writing this was a lot easier than writing C++
module Main where
import Data.Char

fromBinary :: String -> Int
fromBinary str = sum $ zipWith toDec (reverse str) [0 .. length str]
where toDec a b = digitToInt a * (2 ^ b)

binStrToChar :: String -> String
binStrToChar str = (chr $ (fromBinary str)) : []

main = interact ((foldl (++) "") . (fmap binStrToChar) . lines)

go build progchallenge.go
./progchallenge.go samplebytes.txt

Shouldn't the reflection in his glasses be reversed?

you were literally right

all you did was copy the other guys thing and made it stupider by allocating a fuck ton of memory like an idiot.

Attached: tumblr_mcska6AEVo1qb9x1g.jpg (500x464, 39K)

#include

int main() {
while (!feof(stdin) && !ferror(stdin)) {
char line[100], *c = line, byte = 0;
fgets(line, 100, stdin);

do {
byte

you run it like this:
./program < input_file

>buffer = (char*) malloc(1010000000);
>out_buffer = (uint64_t*)malloc(1010000000);

no way am i running this shit

Can anyone explain why this runs so fast?

he just copied the actual speedup from

and put some massive fucking mallocs around it for some fucking dumb reason.

it's the pdep instruction combined with movbe that makes it fast
godbolt.org/z/u9SB_t

i get 23.4 seconds with (i had to edit the path to the file) and 35.2 seconds with >67962659 on an i3-6100 with 8gb of ram on a SATA SSD following this benchmark

i had to edit the path to the test file for the other guy's thing i mean. since he had it pointing to /mnt/c/programming/randomdata.bin

If you ever had to make a legitimate tool for users to use to convert a stream of bytes to binary ascii would you be going

>//>inb4 muh RAM
>//it's faster this way

because you require allocating over 2gigs of ram for it because this is as fucking stupid as it gets. At the very least use memory mapped IO if you're reading from a hard coded file. Everyone else here is at least reading from stdin or using argv

He loads the entire file into memory

Probably the most coherent solution here

Bump

So did I with mmap. It's that pdep instruction that makes it so fast.

Last thread this guy used the pext instruction to convert from ascii-binary back to ascii and it was also the fastest. The only thing really holding it back was reading the data in.

I don't use sepples.

Everyone that cares about performance uses C and C++. Hence why they're probably the only high level languages that expose a way to use those intrinsics directly.

I just like using C. I don't like sepples, not to set off any autistic screeching.

That's a respectable preference. It's fine.

it's a window dipshit

What

was responding to this guy obviously

>obviously

You're pretty weird, even for a chinese image site.

obviously this won't be faster than intrinsics but I was curious on the performance of lookup tables for such a relatively simple calculation
pastebin.com/GDgvqEpG

this is compared to the "naive" approach:
for(int i = 0; i < 8; i++){
s |= 0x30 + (c&1);
s = s > 1;
}
printf("%.*s", 8, (char*) &s);


surprised at the results - the LUT version is nearly twice as fast
user$ time ./normal_bin < randomdata.bin > /dev/null

real 0m10.130s
user 0m10.042s
sys 0m0.056s
user$ time ./lut_bin < randomdata.bin > /dev/null

real 0m5.632s
user 0m5.599s
sys 0m0.024s


more universal although much of the performance is due to being 64 bit and able to fit the whole string in a register

NEW VERSION HERE: pastebin.com/XNfMXL3c

>you were literally right
He was right because he was myself, baka. I realized the pext guy was right and the bit manipulation instructions were the optimal solution for the conversion itself.

>all you did was copy the other guys thing and made it stupider by allocating a fuck ton of memory like an idiot.
I took it because the user was right, using the bit manipulation extensions is really the most efficient way to do it. There's really not other way to reach as much speed, all other things equal.
But mine is faster because it avoids excessively random IO and too many context switches (syscalls).
But you were right about one thing: using that much memory is not only unnecessary but it makes the program run slower.
Now with that in mind, and not outputting newlines to equalize things (his doesn't) mine runs about %270 faster than his original solution, consuming about 300KBs of RAM.
Also
>tumblr

Attached: Capture.png (1920x1050, 69K)

Depends on the design goals, but for bulk processing, actually, yeah, I would do it this way (just with smaller block size, like my new solution I posted above, and adding more error checking).
In the last thread, the pext guy tried memory mapped IO and it was slower than reading 8 bytes at a time for his solution. Apparently mmIO is optimized for random access rather than sequential.
That Linus' quote about someone retroactively aborting himself actually refers to this:
>Of course, I'd also suggest that whoever was the genius who thought it was a good idea to read things ONE F*CKING BYTE AT A TIME with system calls for each byte should be retroactively aborted. Who the f*ck does idiotic things like that? How did they noty die as babies, considering that they were likely too stupid to find a tit to suck on?

That's not what mmap does. It doesn't load the file in memory, it just means that you can access it as if the file was in memory, and the OS automatically translates those RAM accesses to disk IO.

You can use the bit manipulation instructions from plain C too, that's what I did.

With an SSD you'll probably see less improvements than me, because you aren't benefiting as much from the more sequential reading.

Attached: 1483948617693.png (646x431, 447K)

just so you know, pdep is excruciatingly slow on ryzen. all you're doing is bypassing this fact that the instruction is slow for your system by programming like a fucking java memory hog.

Attached: chrome_2018-10-08_10-34-02.png (1725x762, 198K)

Just so you know, pdep and pext are intel instructions.

With what goal? Just making it work or being optimized? If optimized size or speed?

In the previous thread I still saw pretty big improvements over the 'naive' (ie bit shifting) approach.
>java memory hog
It consumes 300KB now, fuck off.

Nah, they're licenced and implemented by Ayymd too.

>Nah, they're licenced and implemented by Ayymd too.
But they obviously not optimized or native to the amd ISA.

There's no 'native' Amd ISA. They just borrow everything from Intel these days.
Sure, maybe they're not as fast as Intel's, but they're still the fastest option.

>i get 23.4 seconds with (i had to edit the path to the file) and 35.2 seconds with >67962659 on an i3-6100 with 8gb of ram on a SATA SSD following this benchmark
Remember that my first solution did a final processing pass to add newlines, which the pext guy's program did not.

pext/pdep guy here, im at work right now but I'll improve my solution later on today to bypass yours. i got a few ideas

Oh. Feels gud being NEET

I assume we'll just agree to not use multithreading, or is that allowed? What about GPU acceleration?

Maybe with those constraints it could still be improved using a non blocking alternative for fread

If we do "must read from stdin" then that idiot malloc guy will probably have a hard time.

One big fat fread at the beginning would still be the fastest way though

So? Is MT allowed or not? Vote guys

can be still hugely accelerated with MT. plus I have free time to make hand optimized assembly

no mt

trips decide no MT

Attached: 1475179818035.jpg (700x512, 24K)

Actually no, for some reason the optimal amount is around 20 KB at a time.

Maybe because it bypasses memory and goes straight into cache.

bump

commandcenter.blogspot.com/2012/04/byte-order-fallacy.html