4GB ram limit in 32-bit programs

I have a niche 32-bit program that does a specifc task for me. However, it crashed when it reached 4GB of RAM use. (I've patched it with LARGEADRESSAWARE)

The author is long since gone and all I have is the .exe

Is it at all possible to use some kind of trickery and magic to tric kit into using more than 4GB ram? I've read a little bit about Address Windowing Extensions but does that require recompiling from source? I'd rather not try to decompile this piece of shit or have to recreate it, as it's an impossible task.

Picture unrelated.

Attached: kmqdoybxauj11.jpg (900x474, 89K)

Other urls found in this thread:

en.wikipedia.org/wiki/Abandonware
twitter.com/NSFWRedditImage

I know from Fallout 3 that it being a game designed for 32 bit and a 2gb of ram max that it doesn't have to be recompiled to use more ram. A large address aware is all that is needed. That said I imagine some programs would freak out.

Kek no why do you think we switch to 64 bit

you can use fast drive storage and swap

I already tried that, which doubled the addressable ram. Now it can load twice as much data before crashing.

It gradually fills up the ram until it crashes. If I only load data to where it use about 3,8 GB ram everything is fine and dandy.

I'm hoping someone has found out some kind of magic, like say swap out the data in ram and feed it back to the program only when it requests it. I know for a fact that the software don't use all the data in ram all the time, it only use a little bit of it in a sequential manner when processing my data. But I assume for performance reasons it cashes metadata or an index or something like that in ram.

You're stuck.

This is what happens to most proprietary software, it just gets abandoned and is a primary driver behind my use of free shit.

Which program is it? Someone might have an alternative somewhere.

I should specify: I let the program ingest data from a harddrive. As it's ingesting, it slowly fills the RAM (about 100MB / 1GB ingested). If I let it ingest more than 38-39 GB then ir runs out of RAM and the process shuts down with an error.

Oh, then no. Time for new software.

>Someone might have an alternative somewhere.

There is no alternative. It's niche shit and no commercial market for it. My only options are to either a) try to reverse engineer it or b) recreate it from scratch (royal pain in the ass)

>like say swap out the data in ram and feed it back to the program only when it requests it.
>swap out the data
>swap the data
>swap
>swap
>swap
>swap

What does it do? I'm pretty sure you can come up with two or three utilities that do the same or would help you migrate from it. Also the fact that there was an editor developing the product means that there must have been other people out there using it. What did they do? Of course you CAN migrate, unless your program is just a stupid dos game and you want to run it on your windows 10, in which case just gfto

>32 bit program
Is this the 90s again or something? Jesus, windows is so prehistoric. Stop using that deprecated piece of shit.

chunk the data then, problem solved, delete your thread

a 32-bit VM with 4GB RAM?

This program is starting to sound really fascinating.
I need to know what this thing is.

It's a piece of software written by some Visual Computing phds that allows for image sorting using a variety of filters and operations. I use it to process data sets for machine learning.

There are solutions out there that can do most of what it does, but if I were to try and string together a pipeline my self it would be a mess. The good thing about this particular piece of software for me is that it does all the operations I need in one go and I can combine them in a way that is very powerful where I can use my own intuition and judgement in real time with the assistance of the various filters on the entire data set on many different scales. There is nothing like it out there sadly.

>I've read a little bit about Address Windowing Extensions but does that require recompiling from source
Yes, PAE and Windows' AWE needs that.

>I'd rather not try to decompile this piece of shit or have to recreate it, as it's an impossible task.
Try to dump usable data and abandon it, then.

just recompile lmao

Attached: debian.jpg (352x352, 40K)

>Try to dump usable data and abandon it, then.

I can only process chunks of 40GB of data with it. My data set is 17 TB. It would literally take me forever to do it in chunks.

Write a script to chunk it out.
Computers are all about shelling out the busy work for you.

Fair enough. What is stopping you from using other available and modern tools and maybe a scripting language as glue? I'm pretty sure in the worst case scenario it'd still be better than having your program constantly crashing or having to pour resources into rewriting or reverse engineering it.

The problem is that the processing I do in the software take a lot of time. I compare and contrast the data using various techniques and filter it on various scales etc until I get a good result. It's the same process if I do it on 4GB or 40GB chunks, etc. If I could do it in say 1000GB chunks that would take me way less time than if I have to do it in 40GB chunks.

Also, the more data I can work with at once the better results I get because it makes the features I'm looking for stand out clearer.

Tell me what the filters and operations are and I might have something for you.

Are you already doing this on a cluster of machines with Apache Spark or a kubernetes controlled swarm of containers?

You can often just throw more processing power at problems these days.

If it's not possible... you're possibly SOL, redo the essential bits in Apache Spark / OpenCV / TensorFlow / CVTK or whatever else applies.

>What is stopping you from using other available and modern tools and maybe a scripting language as glue? I'm pretty sure in the worst case scenario it'd still be better than having your program constantly crashing or having to pour resources into rewriting or reverse engineering it.

The main issue is that there is no clear cut way to create an automated pipeline for this stuff. I literally have to kneed the data with real time visual feedback, and use human intuition to get the best results.

Unless you want to implement math from research papers into some custom code I don't think you have something for me.

Damn dude, you're pretty fucked.

You can't even get in contact with the original guys who compiled it?
I mean it's last ditch but..

Attached: 1523972393808.png (210x240, 56K)

What you want is essentially impossible since a 32-bit pointer (i.e. the pointers that the program uses) can't address more than 4GB. So you can end up with two pointers with the same value that point to different places. No one will bother with the hacks that you would need for this, when the easier solution is to just recompile with 64-bit.

You should really learn how to write a script to do what you want, it will save you time in the long run and whatever black-box program you're using is effectively doomed anyway.

I've sent the guy an email, he is apparently giving lectures this semester so at least he's alive. I have yet to hear back from him though.

>Unless you want to implement math from research papers into some custom code I don't think you have something for me.
I actually am looking for sideprojects, since my guthub repo looks like shit. I've worked with image processing before. I won't promise you anything, but I have interest in your problem. Hand me the papers.

Did you try contacting his advisor, too? They might have a copy of the source, too.

>So you can end up with two pointers with the same value that point to different places. No one will bother with the hacks that you would need for this, when the easier solution is to just recompile with 64-bit.

Surely this is an edge case that other people have encountered and perhaps been crazy enough to try and tackle?

In this case I think such a hack would work as I believe under the hood the software runs the exact same math in a discrete and sequential manner or sequential data. So it should be agnostic to the values stored, if that makes sense at all.

I have not, but he's in his 50s and the program disappeared from the internet back in 2011 or thereabouts. I believe what happened was that someone bought the rights and incorporated into a cloud solution before going bankrupt.

x86 is a steaming pile of crap.
Peer into the assembly and tremble

Attached: 1280x720-pwI.jpg (1280x720, 140K)

>I actually am looking for sideprojects, since my guthub repo looks like shit. I've worked with image processing before. I won't promise you anything, but I have interest in your problem. Hand me the papers.

Actually it would be much more useful if you could try some kind of hacky magic as mentioned here

If you're saying the program is a fixed pipeline, then why do its memory requirements grow over time? It should be able to reuse the same scratch space.

If it's just leaking memory in a simple enough way then maybe you could try to plug it somehow, then you wouldn't need any address space hacks at all since it would stay below 4gb. It would be quite a hackjob though.

I'm not sure, though I suspect that it does some sort of feature extraction in the data and store that in ram along side a unique identifier for the corresponding image on disk. Then it performs operations on that data in ram and display the results on screen. If I export the results of my processing I believe it reads the unique identifiers and apply the actions to the actual data on disk. This is only speculation but it makes sense to me that this is how it works, as there are no way to store 17TB in ram, and ram use corresponds linearly to how much of the data set on disk I let the program read, and the fact that the program is able to translate what I do in the program to the actual data on disk if I choose to do so.

Hah, I am a CS major, I know assembly and microprocessors architecture well enough to know that I shouldn't stick my nose into obscure x86 assembly.

Oh, if it's holding the data in memory then you're fucked. Realistically though, you shouldn't need to be manually tuning each batch of data. Can't you just look at a sub-sample of your data, tune based on that, then run over with those parameters over your entire set?
If you can't transform/clean your data automatically in a reproducible way you're gonna have all kinds of issues. If this is research how would you even document all that shit you're talking about?

I still recommend learning to script whatever you're doing, it sounds like it would help you a lot.

You're fucking retarded

>I know assembly and microprocessors architecture well enough to know that I shouldn't stick my nose into obscure x86 assembly.

You have to be a little bit crazy to change the world. I realize I'm chasing a pipe dream but I'm allowed to dream.

Think of it this way: You have a really messy data set. You can sort it in various ways, filter it in various ways etc, but there is no one size fits all silver bullet, at least not one that is obvious unless you can get a birds eye view of all the data and use tools to knead it until patterns emerge or more likely you can siphon off parts of it and rerun the remaining data with the same and new tools and techniques until you manage to isolate more good data and siphon it off etc.

If I work on it in chunks, I'll just have to do the same process but on a smaller scale.

This is work related not something that will be published, the only thing that matters is that I get the data I need. I could do it manually on each image, but it's 10 million images so I need some tools to help me out. Cleaning data sets is just as much an "art" (experience) as it is a science and this case can't be automated unless I can train a convoluted neural network to do it. But that would need the cleaned data, so..

run the program with valgrind and check if it has a memory leak. then fixing that may be easier

Okay, I think I get it.

Normally my approach would be to make some heuristic rules to filter the junk out, then iterate by tune them on random subsets of the data and evaluating manually until they're bulletproof and don't bias the results too much.

But it sounds like your problem might be even worse than that, in which case you have my sympathies. Guess you'll have to either write a new version of that program, track down the source code somehow or just do a lot of manual work.

It doesen't, memory use scales with the size of the data set and doesen't increase over time.

I see where you're coming from, and this is one of those data sets that you need a human brain to make sense of.

You are probably right, I've tried to get in touch with the author, and I've already started to mentally justify me writing a new version of the program.. I suppose I just have to run a few batches and calculate how much time it will take me to do it manually.. I'd rather gouge my eyes out with corn holders than spend countless hours doing it manually but perhaps in the end it'll be faster than trying to write the program from scratch.

Thanks for your thoughts guys, it was worth a shot.

Throwing an idea out there, but can you flush memory while the app is running?

If yes, write a wrapper in c#, checks every minute for the exe name, if running, flush memory.

But probably the only thing you can do is recreate it.

>proprietary software

what language is the program written in? does it have debug symbols by chance? is it ELF, or PE?

You can't. In order to work with 4+gb of ram a program has to specifically use them, essentially doing the trickery done in the eighties with non-uniform memory architectures.

t. code monkey for the last 12 years.

Disassemble into C. Recompile into 64 bits.

Is this even remotely feasible?

If he's alive and not maintaining the program can you not pay him to update it? or at least give you yhe source code maybe.

en.wikipedia.org/wiki/Abandonware

I am a diehard freetard simply for this reason.
If I were to latch onto any nonfree software and it dies I am right fucked.

Been there done that count me out if they don't give me the source.

Could bank switching work?

I think IDA Pro can disassemble into C, but it's barely better than looking at the raw assembly. I don't even know if you could even safely recompile it to x64, the disassembled result might have pointers being loaded into ints or other obviously wrong shit.

It could work, but there is a large possibility of things going wrong. If the program does any sort of pointer arithmetic you are pretty much fucked. Some (non-portable) programs are written with the assumption of the size of certain data types -- if you try to recompile to a 64 bit architecture those assumptions are no longer valid and the program will not function. Its always worth a shot, I know it is pretty easy to find a IDA Pro torrent.

ida pro decompiled output cant be recompiled, its sadly not so simple

Well, there you go. Good luck harassing the author for the original source OP.

Is this true? If so, why doesen't it work?

The issue isn't that he doesn't have enough ram, he doesn't have enough address space.

because the decompilation is not perfect. obscure x86 instructions or intentionally obfuscated code sometimes generate wrong output. Sometimes the functions have the wrong arguments too. And furthermore it gets more convoluted if there are shared libraries involved (any serious program basically).

Is it categorically impossible to work or could it theoretically work in some cases if the program was a simple one and with a little bit of luck?

Basically I'm wondering if it's ever worth it to try or if it's always doomed to fail.

I mean, if you hand-fix the C code and fix the dependencies, yes it is of course possible. But the syntax IDA uses is not compatible with any compiler AFAIK, so at that point you may as well just rewrite the program.
Automatically? It may work with a program that doesnt use any library, and even there you'd still have to change some things for it to compile.

Pic related is part (~20 out of 600 lines) of an example I just decompiled (very simple md5 program, single C file), and even there there were errors:
#error "There were 1 decompilation failure(s) on 14 function(s)"
Try fixing 600 lines of code like that, as you see its easier to rewrite it.

Attached: md5.png (869x582, 33K)

>say swap out the data in ram and feed it back to the program
I think this is technically possible but holy fuck the sheer load of assfuckery you’d need to do to get it working