Big CSV files

Question

Big CSV files

Andrew Baker

how big can CSV get before it becomes wildly unwieldly?

Attached: CSV_File.jpg (824x643, 124K)

January 31, 2019 - 00:58

Owen Cooper

6 big

January 31, 2019 - 00:59

Liam Sullivan

if it's over 200gb

January 31, 2019 - 01:00

Adam Rogers

Depends on how you plan to utilize and process it.
If you plan to use excel or similar then it depends on your computer's specs.
If you process it in a stream fashion with shell utilities then there isn't really any limit to the size.

January 31, 2019 - 01:06

Joseph Jackson

Stream eh?

Pardon my ignorance, but how would that work? Can you make a program only target specific chunks of the file at a time?

January 31, 2019 - 03:12

Grayson Watson

Spin up an Oracle XE or DB2-C instance and load that bigass CSV file into a table.

January 31, 2019 - 03:33

David Martinez

A well formatted file with 10 million entries beat a badly formatted one with 1000

January 31, 2019 - 03:44

Julian Carter

If you use Excel it's pretty limited. A million lines or something. It gets unusably slow well before that. Computer specs won't help.

January 31, 2019 - 04:02

Levi Taylor

what is swap space

January 31, 2019 - 06:18

Hunter Powell

Slow as balls?

January 31, 2019 - 06:23

Henry Anderson

I've done 37 million before and it was painful. At that point just make a sqlite loader if you need something light to work with.

January 31, 2019 - 06:26

Christian King

Shell utilities typically work on a line-per-line basis, only loading a chunk of the file into memory at any given time. For CSV, check out utilities like "xsv". For JSON, check out "jq". For XML, check out "xq". There are many more. I handle CSVs hundreds of gigabytes large at work with these and it's fine. It makes the majority of people at work think of you as a wizard, since they can't see any more than 1% of the file using their crappy Excels.

January 31, 2019 - 08:10

Chase James

In other words, properly-formatted (well-formed) CSV is unwieldy only if it doesn't fit on your disk (like anything else would be). Otherwise it's perfectly manageable.

January 31, 2019 - 08:14

Jaxson Fisher

If you're using a well-written library to parse or save it, you probably won't run into problems with any data set size you're likely to encounter.

Don't try to parse or save CSV yourself though, you'll just fuck it up.

January 31, 2019 - 08:16

Elijah Davis

There are XML files that are TBs large in the medical insurance industry. It's not about size. It's about how you do it and how much computing power you have.

January 31, 2019 - 08:20

Hudson Murphy

Good luck swapping 1TB
>inb4 he doesn't have terabytes of data

January 31, 2019 - 08:51

Ryan Green

How big is big? I'm taking an ml course and the training data is a 100MB csv file. Pandas handles it fine.

January 31, 2019 - 09:07

Luis Phillips

10 lines.

January 31, 2019 - 09:08

Jaxson Green

but you can convert CSV to XLS (even with LibreOffice or any other tool, like write a python quick script), and then Excel can process millions and millions of lines easily.

i have a machine that spits a big CSV at me. libreoffice opens it properly (no auto data cell assumption bs), and I save it in XLS, and view it in Excel. because LibreOffice Calc is super slow even with a few thousand lines. searching, browsing, can take literally seconds. while in Excel, everything is instant.

kinda clunky, kinda weird, but it works.

you can always use SQLite or MySQL if you have to store gazillion lines

January 31, 2019 - 10:49

Jayden Rivera

File system limits, there's programs to handle large files so that's not a issue.

January 31, 2019 - 12:12

Jacob Reyes

In stats class one of the main csv files we used was about 9.5GB, the professor called it a small population set. Alot of the little ones he called test data. It could just be the professor, idk.

January 31, 2019 - 12:20

Aiden Garcia

yea, "big file handling" is actually a feature. you can find text editors that can do it.
basically what you have to do (if you code) is that you don't read the entire file into memory, but just the portion you need.

for example if you are looking for something, you only need one line. or a few. not the entire 10GB.

January 31, 2019 - 13:23

Easton Sanders

or if you are fucking lazy, you pay 5-10$ on Amazon, rent a machine with gazillion GB of ram and process data there.

January 31, 2019 - 13:25

Charles Long

You must level your wizardry to 30 and learn the art called "sed".
It's once fully mastered is a fucking cheat.

January 31, 2019 - 13:33

Levi Murphy

And what alternative do you think you have?

January 31, 2019 - 14:14

Lucas Allen

SQL? CSV with compression ala. TempleOS style? or a binary file that you compress? the possibilities are endless really.

January 31, 2019 - 15:07

Liam Brown

How would you display a 10gb text file on a machine with only 2gb of ram?

January 31, 2019 - 15:08

Luke Hughes

This. You can fully normalise a good CSV file into a full relational database with a few lines of code.

January 31, 2019 - 15:11

Asher Rodriguez

Just an advice, if you happen to work with indians, use a 1252 code page. I used unicode since we have fields with unicode characters but their "in house system" doesn't have a library for processing unicode. Also, unicode takes more space.

January 31, 2019 - 15:12

Alexander Thompson

Then again what's the point? If you can get a fully normalised database from a CSV file and a script, why not just use a database from the start. I'd say SQL SELECT and INSERT commands are much easier and shorter than stream readers and writers.

January 31, 2019 - 15:15

Robert Williams

over 4GB, then you can't put it on FAT32

January 31, 2019 - 17:01

Cameron Garcia

you can compress CSV as well

I wonder if netstrings would be smaller than CSV with bunch of escaping. But binary variant of netstrings would be even more compact.

January 31, 2019 - 17:04

1 2 ... 4 Next

Big CSV files

Last threads