Have we reached the theoretical limits of data compression or is there still room for advancement?

Question

Have we reached the theoretical limits of data compression or is there still room for advancement?

Nathaniel Ross

Attached: file-compression-ch.jpg (200x200, 26K)

February 9, 2019 - 14:55

Other urls found in this thread:

github.com/philipl/pifs
en.wikipedia.org/wiki/Pigeonhole_principle
en.wikipedia.org/wiki/Kolmogorov_complexity.
twitter.com/SFWRedditVideos

Ryder Reed

I've heard pied piper has had some advancements but that's practically it.

February 9, 2019 - 15:05

Dylan Miller

nice meme

February 9, 2019 - 15:09

Jordan Stewart

github.com/philipl/pifs

Once we have quantum computers it might become viable

February 9, 2019 - 15:11

Jack Murphy

That's silly, storing the index of the file in pi may require as much storage as storing the file itself.

February 9, 2019 - 15:28

Asher Diaz

How about storing the SHA-512 hash or something and then recreate the file by generating every possible file until the SHA-512 matches? Might get a few false positives but eventually you'll also get the correct file and then you can store which one it was (e.g. 125 collisions so it's the 126th file).

February 9, 2019 - 15:30

Oliver Stewart

The theoretical limit is only asymptotically reachable.
It so happens that the classic LZ algorithms reach the bound asymptotically, so the problem was solved 40 years ago.

February 9, 2019 - 15:31

Eli Diaz

A few collisions? try infinite.
If you understand Pigeonhole principle, you'd know why this is impossible without storing the file itself
en.wikipedia.org/wiki/Pigeonhole_principle

February 9, 2019 - 15:33

Logan Anderson

Yes. Nothing will beat stealth.

February 9, 2019 - 15:36

Connor Williams

Don't worry I've improved it even more: store the file size of the original file and you only have to generate the possible combinations of ones and zeroes until you get a matching hash

February 9, 2019 - 15:43

Gabriel Ortiz

You just added a few extra bits to the hash...

February 9, 2019 - 15:46

Robert Wright

theres a lot of advancements being made, for example, micro-compression. This is where the data can be compressed so that each bit is only 2 atoms wide

February 9, 2019 - 15:47

Ethan Reed

Yes but if you have a known file size then you do at least have a finite amount of possible combinations

February 9, 2019 - 15:49

Logan Ward

I've actually given this a lot of thought. If you store the file size, multiple hashes of different kinds like sha2, md5 etc. and perhaps even the file type. then maybe it would be possible to reduce the amount of collisions to a manageable amount.
But this requires an unlimited processing power to be feasible anyway

February 9, 2019 - 15:57

Blake Hall

All form of encryption is a tradeoff between CPU time and storage space. This is just a very extreme form.

February 9, 2019 - 16:03

Jose Moore

It doesn't really matter if compression gets better because storage space and bandwidth is exploding.

February 9, 2019 - 16:27

Leo Martin

Wasn't their some guy that claimed to be able to compress files by 99%, but it turned out he was creating shortcuts?

February 9, 2019 - 18:24

Adrian Martin

>and then recreate the file by generating every possible file
do you realize just how rapidly this becomes impossible? Lets say you have a 64-byte file, which is smaller than this post is. It would take you billions of years at billions of files per second to create all possible 64-byte files. Now do it with a 400KB image file.

February 9, 2019 - 18:28

Jacob Fisher

google had some cool bleeding edge compression for pictures, which stores only a few details, and recreates everything else using ai
of course it isn't looseless and won't work in generic files, but only for pictures, videos, and audio, which constitute 90% storage space anyway

February 9, 2019 - 20:55

Henry Evans

>theoretical limits

Well, "in theory" some files should blow up, since the compression has to be an injective mapping, but in practice that has literally never happened to my files, so the current state of the art can't be all too bad. (Although I guess the reason for that is simply that compression algorithms check, if they actually compress what they are trying to compress, and if they do not, then they set a flag to indicate that and use the raw data.)
I suppose there is still stuff that can be done for certain special purpose compression tasks, but it doesn't feel like there is a breakthrough for general purpose compression in the making. If at all, then in terms of speed, but not due to new algorithms for current computers.

February 9, 2019 - 21:16

Cameron Rogers

>being this impatient
My anime images are worth the wait.

February 9, 2019 - 21:18

Bentley Morales

There is always pigzip.

Attached: cartoon310.png (759x280, 30K)

February 9, 2019 - 21:34

Isaac Price

I don't think there will be any big improvement in lossless compression.
In lossy, there will be. Just look at the new media formats appeared semi-recently like h.265, webp or webm. With motion picture going to retarded sites for 8k-17k-32k images there might be new codecs.

February 9, 2019 - 21:38

Michael Lewis

No. Just no. There is no "quantum" anything, this isn't poorly understood near magic effects of some mythical theoretical particle. This is simply electrons being so small they can move through any material at the path of least resistance, because nothing can exert 100% perfect electrical control over them. It is current leakage. It is nothing but current leakage. It is current leakage in short channel devices, and it happens at literally every feature size, it is not exclusive to small FinFET devices like upcoming 5nm EUV FinFETs. Even planar devices have extremely high degrees of leakage through their channels, directly under the gates, electrons still leak out. Yet despite this the transistors still function.

Quantum tunneling is a meme regurgitated by people who know nothing about the field of FETs.

February 9, 2019 - 21:40

Jack White

>I don't think there will be any big improvement in lossless compression.
Compared to what we had 10 years ago WebP and FLIF both provide a pretty good lossless compression. And with AVIF on the horizon we might have an image format that even beats FLIF.

February 9, 2019 - 21:44

Chase Gomez

amen to that

February 9, 2019 - 21:45

Tyler Phillips

based

February 9, 2019 - 21:45

Camden White

there's just no need

storage is so cheap, bandwidth is so cheap

Attached: 36974486_173620796835367_6330065337924452352_n.jpg (720x635, 45K)

February 9, 2019 - 21:52

Gabriel Hughes

magnet link is a form of file compression

February 9, 2019 - 22:02

Evan Gomez

>mobile bandwidth
>cheap

In what world do you live?
Sure, it's not ridiculously expensive anymore, but I still always disable images in browser when not connected to Wi-Fi.

February 9, 2019 - 22:28

John Johnson

You're insane.

February 9, 2019 - 22:53

Jordan Powell

What.

February 9, 2019 - 22:55

Connor Murphy

You are mixing up compression and distributed storage.

Creating a UID for a file doesn't make it compressed.

February 9, 2019 - 23:00

Chase Ward

what the fuck?

I have unlimited 4G mobile internet for €25 per month. How cheap are you?

February 9, 2019 - 23:16

Brody Lopez

>How cheap are you?
In theory unlimited 4G at €4, but it gets slowed down past one or two GB (not sure, I hardly ever reach the "limit").

February 9, 2019 - 23:19

Adrian Fisher

just use black holes

Attached: serveimage.jpg (400x400, 16K)

February 9, 2019 - 23:23

Gabriel Davis

there is still room for advancement.
theoretically it would be possible for an AI to determine the best way to compress a file to reduce file size even further than what we currently have.

here's a possibly stupid idea, it may be possible that an AI could analyze a file and determine an algorithm that re-creates that file instead of compressing the file. the question is, will the algorithm determined by the ai have the same amount entropy as the file itself?

February 9, 2019 - 23:26

Elijah Sanders

Wow racist and sexist much?

February 9, 2019 - 23:33

Joseph Kelly

The only advancement I really see is a potential speed-up with quantum computers.
Those could attempt many ways of compression in parallel, and then choose the best one.
Doesn't improve the result, but the speed.

February 9, 2019 - 23:39

Jose Jones

I tried this once storing only the md5 hash and file size and managed to "unpack" a 2 byte text file. I posted about it here too, although obviously in that case the hash was bigger than the original file. The thing was that the "collisions are impossibly rare" thing only works out in practical scenarios. When you're generating every possible file, you are also going to hit every possible collision, so you would probably expect to see the amount of collisions (and therefore the collision index) to grow proportionally with the file size. So it has the same downfall as that pi storage.

February 9, 2019 - 23:40

Ryder Torres

What?

February 9, 2019 - 23:40

James Jenkins

keked

Attached: deedf5b0.jpg (850x720, 49K)

February 10, 2019 - 00:40

Samuel Perez

Compression ratios are likely not going to move a whole lot unless some completely unexpected and novel compression method is discovered. More recent advancements in compression algorithms have been more about getting the smallest size at close to real time speeds.

We already have ways to compress data to very small sizes but the more something is compressed the more work is required to uncompress it. If you compress something so much that it makes more practical sense to access the file uncompressed then you've only really wasted your own time.

February 10, 2019 - 01:01

Angel Martinez

There are actual limits for how much you can compress something, and it gets exponentially harder to compress more and more, as it gets harder and harder to find redundancies.

February 10, 2019 - 01:06

Jose Gomez

Aren't they using autoencoders, basically?

February 10, 2019 - 01:38

Nathan Hughes

AI assisted lossy compression and interpolating decompression

February 10, 2019 - 02:33

Jaxson Wilson

Then just store the index itself in pi too

February 10, 2019 - 03:33

James Hall

Cloud compression. Upload the file to the cloud. You now reference the file by its SHA512 hash. To decompress the hash back to the original file, send the hash to the server.

February 10, 2019 - 03:50

Bentley Nelson

sandy bridge ought to be enough for everybody

February 10, 2019 - 03:52

Gavin Garcia

You technically can compress data down to the size of a bit generating algorithm. The time it would take to discover such an algo to match your data would be like brute forcing a very long hash.

February 10, 2019 - 03:56

Chase Stewart

Store the start and ending position.

February 10, 2019 - 04:03

Hudson Diaz

That's not compression, that's just a shitty torrent knock off.

February 10, 2019 - 04:48

Kayden Howard

The length of the shortest alogrithm that prodcues a file is its Kolmogorov Complextity en.wikipedia.org/wiki/Kolmogorov_complexity. It is also uncomputable, so not even an AI is going to help.

February 10, 2019 - 05:29

Parker Rodriguez

>AI
can we ban these letters?

Attached: 1549606791362.jpg (800x450, 84K)

February 10, 2019 - 05:36

Thomas Peterson

>Wasn't their some guy that claimed to be able to compress files by 99%

My friend once said that. Turned out he was lying.

February 10, 2019 - 05:41

Ayden Diaz

>AI compression
I would literally never ever use this. Triggers me immensely knowing I wouldn't be getting the same image back.

February 10, 2019 - 05:42

Joshua Reyes

(USER WAS BANNED FOR THIS POST)

Attached: 1549678430994.jpg (500x579, 83K)

February 10, 2019 - 05:45

Henry Ortiz

All data is just a really really long number.
Theoretically, if AI can find a really short representation of that number, you can have lossless compression with very high compression ratios. You'd just have to find AI that can do that.

February 10, 2019 - 06:12

Brayden Phillips

Are we at peak human civilization that has perfected physics, math, and engineering?

February 10, 2019 - 06:54

Eli Stewart

Always room for advancement.
Algorithms too memory and processor intensive today could be cheap in the future.

February 10, 2019 - 06:58

Bentley Evans

Until someone breaks physics and signal theory, then yes. And I'm not sure I want to be around when that happens.

February 10, 2019 - 06:59

Mason Rivera

Possibly. If we fall to socialism.

February 10, 2019 - 07:00

Julian Barnes

who else remember the 1mb gta sa kgb meme from a decade ago

Attached: meme.jpg (480x360, 33K)

February 10, 2019 - 07:00

Alexander Sullivan

You must never use jpg images ever then.

February 10, 2019 - 07:01

Sebastian Collins

yes, jpegs are turbogay.

February 10, 2019 - 07:15

Christopher Peterson

It's math. There never was any room for advancement. You can't compress 1 bit of data to less than 1 bit of data.

The only advancement is in lossy compression, which is just figuring out how much data you can throw out without a human being able to tell. e.g. for a blind person, you can compress any video to 0 bytes. For a colorblind person, you can throw out a range of color data completely.

February 10, 2019 - 07:30

Noah Nelson

Underrated and best post in thread. Wave functions cannot violate neither the unitarity principle of quantum mechanics nor the third law of thermodynamics, even in gravitational singularity. So the information isn't lost in a black hole.

>Jow Forums
>or /sci/
>into science

February 10, 2019 - 07:35

Xavier Stewart

This thread proves Jow Forums is full of uneducated retards

February 10, 2019 - 08:20

Joseph Campbell

your brain can't comprehend just how large the numbers are
think about it this way: sha-512 is considered by basically anyone to be a very safe way to ensure a file is exactly what it should be, right? yet; there are an infinite number of possible collisions for each hash, /infinite/. so how the hell can we consider it highly correlated with a single file? because finding a collision, either accidentally or intentionally, is designed to be near impossible, the search space is literally infinite

February 10, 2019 - 08:59

Nathaniel Phillips

they thought the same when Newton was around

then one asshole came to tell them they were all wrong

Attached: 1538918550368.png (611x611, 350K)

February 10, 2019 - 09:31

Liam Morris

ITT people with room temperature IQs

February 10, 2019 - 09:34

Luis Perez

Retard

February 10, 2019 - 09:41

Carson Young

For general, lossless compression, yes, probably.
For specific types of data (like images or web traffic) and lossy compression?
Probably not.

February 10, 2019 - 13:09

1 2 ... 8 Next

Have we reached the theoretical limits of data compression or is there still room for advancement?

Last threads