Question

Why don't we store bits instead of raster images? Then when an image needs to be displayed, it can be generated from the bits that are already stored on that substrate. For instance, if I wanted a 1x1 image of a red square, and that red square was used in another image on my hard drive (as part of a mosaic of red squares of varying shades), why can't that property be reused, saving memory? I understand this would probably take some computing power but I want to know exactly why we don't do it.

Attached: 1547313557809.jpg (1318x912, 1.49M)

Other urls found in this thread:

github.com/philipl/pifs
libraryofbabel.info/search.html
kernel.org/doc/html/latest/vm/ksm.html
youtube.com/watch?v=M5c_RFKVkko
youtube.com/watch?v=RXJKdh1KZ0w
twitter.com/SFWRedditGifs

the current state of Jow Forums
install gentoo

Attached: foto_no_exif-min.jpg (4096x3072, 2.84M)

Can you explain what's stupid about my question? I just want to know more about how and why information is stored the way it is.

What the actual fuck

duuuuude

Care to explain your idea more? I don't really understand what do you mean.

Interesting concept, but it would be way to slow to store an image, because we would have to go through all the images to find matching pixel groups

Because storing 1s and 0s us the most efficient way to store information

As if all images are part of a big mp4 and you only need to know the frame number?

How would you reference the location of the red image and its shade in less space than 3 0-255 values? I don't really understand what you're even proposing.

Capacity is cheap and your method would be computationally expensive.

It's absolutely retarded. Any sub-image addressing scheme would only expand the final size of images.

To save space we already developed a compression algorithms which are doing exactly what OP wants, but properly.

That's why I was thinking AI could do it.

I know but can't they be reused to safe storage space?

Reusing bits for separate files.

Is that really how compression algorithms work?

Just order them through similarity and make a h256 video from it

That's basically JPEG compression.
Rather than store the image it stores how individual segments compare to a list of pre defined patterns.
Look it up

Also, what happens to your photo if the song or whatever is deleted or changed?

No, the patterns I am speaking of are actual information stored on the same substrate as the image being uploaded. I'm talking about images sharing and reusing properties (in this case a single color)

i'll be more specific
in your example we'd have two copies of the same color but used in different images. yes, there are fewer colors used, but the same color is still duplicated, taking up twice as much space. my method would allow for images with richer colors.

>Is that really how compression algorithms work?
There are plenty of compression mechanisms that each work in its own way, even using a static dictionary is better that referencing a dynamic data on the storage.

Why? 16 million colors could be stored in 16 megabytes.

/thread

You cannot have more efficient storage of 16 milion colors that the sequence of bits with 16 milion combinations. Only way to save a space in the image is done by compressing areas not individual pixels, see . You wouldn't be able to effectively reference them without losing information.

I recommend you to study some basics of data and image compression before you start thinking.

I wasn't trying to put you out of a job, I was just curious.

The question isn't about saving space in an image, it's about how to avoid having to store images entirely. I'm not sure you understand the question. Are you from Netherlands?

Oh you're asking why don't we just associate cached images to every picture file? Because data changes all the time.
Look I get what you're saying and that's vector graphics and it works much faster and more reliably.

github.com/philipl/pifs

Ohhh OK. How does it change? Thanks for trying to understand my misdirected inquiry. Some of the other anons here think I'm trying to show off.

What am I looking at here? I'm scared...

you will need an index of what color to use an where is the position, and the image will be bigger than save repetitive colors
or something like that, it is a stupid idea.

Attached: 47501528_940352676161721_1614859561007579136_n.jpg (960x956, 146K)

>if I wanted a 1x1 image of a red square, and that red square was used in another image on my hard drive (as part of a mosaic of red squares of varying shades), why can't that property be reused, saving memory?
If you break this idea down you'll pretty soon seen what's wrong with it. In order to be able to retrieve a red pixel of the correct shade, the file needs to contain information stating what shade of red the pixel is... Which could just be used to render the pixel in the first place.

You could get around this by, when the image is first saved, looking for blocks of identical data on the computer and storing their addresses instead of the color but that's very complicated, you need to check all files for these addresses when you delete one of the original files (or keep some kind of massive reverse index), some of the files available in storage may be on network storage and so only available temporarily, etc. It's also unlikely to be a space saving unless you're storing very regular images, most images are very noisy making finding large common blocks between them hard and the addresses will be much larger than the colors (probably 24 bits for the colors and 64 bits for the addresses, plus some information about the dimensions of the block).

The one technology that is able to rival Stealth's compression efficiency.

Attached: Stealth.png (756x2038, 58K)

It's a version of the Library Of Babel, where every sequence of words and letters that could ever exist is contained somewhere within the library already.

libraryofbabel.info/search.html

I'll continue my layman's research from here. Thanks for all the help, fellas.

Attached: coop.gif (350x350, 2.92M)

>Why don't we store bits instead of raster images?
Raster images are stored in bits, everything is. What do you mean?

A 1x1 red jpeg looks like this:
d8ff dbff 4300 0600 0504 0506 0604 0506
0706 0607 0a08 0a10 090a 0a09 0e14 0c0f
1710 1814 1718 1614 1a16 251d 1a1f 231b
161c 2016 202c 2623 2927 292a 1f19 302d
282d 2530 2928 ff28 00db 0143 0707 0a07
0a08 0a13 130a 1a28 1a16 2828 2828 2828
2828 2828 2828 2828 2828 2828 2828 2828
0000 0000 0000 0000 0000 0000 0000 0000
2828 2828 2828 2828 2828 2828 c0ff 1100
0008 0001 0301 2201 0200 0111 1103 ff01
00c4 0015 0101 0000 0000 0000 0000 0000
0000 0000 0700 c4ff 1400 0110 0000 0000
0000 0000 0000 0000 0000 0000 c4ff 1500
0101 0001 0000 0000 0000 0000 0000 0000
0600 ff08 00c4 1114 0001 0000 0000 0000
0000 0000 0000 0000 ff00 00da 030c 0001
1102 1103 3f00 9c00 1c00 1fa4 d9ff

If you don't know how to translate hex to bits then google it.
You can run hexdump [filename] on linux to get that
>Then when an image needs to be displayed, it can be generated from the bits that are already stored on that substrate. For instance, if I wanted a 1x1 image of a red square, and that red square was used in another image on my hard drive
You can just compress it to 1 color channel and then compress it with xz and it will be smaller than a pointer pointing to that place on the disk
A 1x1 red bitmap would actually be by it's self smaller than the pointer.
*a pointer is just an address (aka a bunch of bits) that point to a place on disk or in memory
But pointing to a place in memory instead of just loading 2 exact same things is actually done sometimes, in linux you can enable this in the kernel config. So if you start the same program twice linux just replaces the similar bits in the second one with a pointer. Well it's actually a bit more complicated but that's kinda how it works.
For more info:
kernel.org/doc/html/latest/vm/ksm.html

Attached: mm.jpg (690x862, 229K)

Thanks for dedicating this much effort to your answer. I suppose I underestimated how much space the "pointer" would take up (I swear I did consider it, though).

Well in some image formats a pixel can be represented by as little as 1 bit so 1600 bits for a 40 x 40 black and white image while a pointer to an image that let's say is located at 16GiB on a disk would take 41 bits, considering the image isn't just blank that would take ages just to find 1 pointer to a pattern that matches, because in order to save space you would need to point to a pattern 45+ bits long, and even then you would be saving 1-4 bits, because you don't want to scan the entire drive, and also it would take longer to access the file because you would have to read the pointer and the actual bits.
In RAM this is a lot different because RAM is fast and you could just start 2 of the same program which will most certainly have some shared read only data.

Attached: fancy image so you read my post.png (1920x938, 159K)

By bits you probably mean small parts of an image, not computer bits. And the answer to that is you will lose more space storing those parts if you wan't a set that can represent any significant number of images. For text compression this actually works and it's used in some archivers. Filesystem compression can do what you have in mind and more.

>Why don't we store bits instead of raster images? Then when an image needs to be displayed, it can be generated from the bits that are already stored on that substrate.
There are compression algorithms that do that. They use a fractal compression scheme that maps how different parts of an image are similar to each other. They can get very good compression, but are complex as hell to implement and have also been very patent-encumbered in the past.

The pointer is only 64 bits. What you're looking at is the header data and data about the format.

read

What you said made no sense, the aceess time is going to be the same whether it's 8 bits or 200.

i'm thinking theres multiple pointers because if you have a png with random noise theres a chance you could go through the entire filesystem and not find a match

?? Let me explain pointers with an analogy. Pointers store an address to a place in memory. Let us say I give you the address to a store where I need you to pick something up for me. You're not going to scan the entire city until you arrive at the address by chance. You're going to drive directly there. It's the same with a pointer on your computer.

If the image is all stored contiguously in the secondary storage, it only needs one pointer to the head of the file. It can scan the remaining data sequentially.

graphically, you don't store the "color red", you just store a value that would indicate to the computer that the color that it's displaying is red. If you had a pointer instead, then it would be the same as just be an identifier for red. If what you mean is reusing data that the developer already knows that you store locally, or at least have easy access to, that's called dependencies.

sorry if everyone else on this thread was laughing at you, they're just too high on their ego trip to provide any useful information. for future reference, ask these types of questions on Jow Forumssqt to have a higher chance of finding people who are willing to take you seriously

You still have to somehow store what color to search for. If the image uses a reduced number of colors, you could get away with using a lower number of bits to store each color, otherwise you're pretty much fucked.

you scan the disk to find a matching pattern and you point the part of that file to the pattern
i know how pointers work trust me

OP is basically asking for the visual equivalent of MIDI for audio.

I often think about things like static linked binaries and non-compressed formats like BMP and think it should be the responsibility of the filesystem to deduplicate data.
While not exactly the same, it's somewhat similar.
Optimally you'd have 1 copy of each pattern, this could be aided by type information to know bounds and sections of file formats that could still potentially cross other formats.
like raw images acting as raw keyframes in a video.

While technically possible, it is likely to be expensive and hard to implement.
Although I've seen people talking about filesystems being more aware of file formats than they are now. To better optimize instead of just relying on generic/arbitrary blocks.

That's the most reddit comment of the day, congrats you fucking retard.

I doubt you're being ironic on purpose, but you probably will say you are.

You realize accessing a file by a pointer has a time complexity of O(1). What you're proposing would have a time complexity of O(n). Linear time on the slow secondary storage of your computer. Why the fuck would anyone do that?

You win the world's most retarded post award.

christianity was just just as divided back then as it is now, if not more

what im thinking:
(file to compress split for visibility)
addr FFF000:
00000100100100
00001001000100
00010010000100

(pattern 1)
addr FFBBA0:
0000010010010010101...

(pattern 2)
addr AAAFFF:
00001001000100110101...

(pattern 3)
addr BBABBB:
000100100001001000111...

(final file split for visibility)
(pointer size 2 bytes|pointer|bytes to read 2 bytes)
06FFBBA00E
06AAAFFF0E
06BBABBB0E

so you would have to read the pointers and the file which is slower than just the file

Because knowing where the 1x1 red pixel you want is stored takes up more space than just making a copy.

If you go look in the infinite digits of Pi, all the images that exist are somewhere in there, so you could just write down at which digit of Pi your image is instead of storing the image.
Except the position in the digits, well that turns out to be bigger than your image almost always.

It's the same with deduplicating your hard drive. If you try to deduplicate too small things like individual bits and pixels, you waste WAY more space remembering where everything is.
If you try to deduplicate larger things (this exists, see filesystems like ZFS), then you'll notice that most of the things on your hard drives are not actually duplicate so you waste a lot of computing power and gain little.

>so you would have to read the pointers and the file which is slower than just the file

I don't follow. The os is going to have a table to lookup. Scanning a drive is really slow plus you would have to constantly load memory and perform a comparison to see if you're at the right file. A pointer is quick and direct. It's O(1), no debate about it.

you have to scan the drive to get the pattern then you put the pointer to the pattern in the file and when you want to read the file you have to read the pointer from the file and follow it then read the next pointer and so on until you run out of pointers to follow and reading and following a bunch of pointers is slower than just having the file

You can't store the pointer pointing to itself in the file. That would defeat the point of a pointer. You store the pointer to the file elsewhere such as a lookup table so the os can easily lookup where the file is.

Storing the pointer telling you where the file is in the file itself is ultimately pointless.

that's not what im saying
you have 2 patterns in a file
you found a pattern in a second file
and another in a third file
you store, in the first file, the pointer to the pattern in the second file and another pointer to the pattern in third file

What if the original file changes

idk this idea is impractical for disks anyway
you need paging for it to work properly
maybe store a checksum?

You could do that but images are highly compressed already. And except for situations exactly like you describe (a monotone red square) sequences of pixels are unlikely to be duplicated in a way that would improve the storage size if you referenced them. For example any real picture would have almost no sections in common with another image.

You want a linked list. That's slow, it's not as bad on an SSD, but that's what happens when an older computer gets really slow because the data is heavily fragmented so then you defragment the hard drive, and the performance is significantly better. I mean redundant data is already done with programs with common dependencies. That's what the linker does.

youtube.com/watch?v=M5c_RFKVkko

This is you OP
youtube.com/watch?v=RXJKdh1KZ0w

1. Storing a reference to a byte is at least as storage-heavy as saving a pointer to a byte.

2. In practise, how often do you think say, the same large (large enough that storing an address becomes efficient) string of bytes is going to be identically replicated across a drive.

3. A single write operation will now be hundreds, maybe millions of times slower, and require vast amounts of RAM and CPU as it chews through the contents of your drive

4. Adds a huge set of security weaknesses to the storage