Database appreciation thread

Relational databases are awesome. Fuck the haters. SQL may read like COBOL but when you write a query and get back a whole fucking report about your data it's like magic --- even when you know how it works under the hood. Graph databases are cool, too.

Attached: database.jpg (300x318, 11K)

Other urls found in this thread:

docs.djangoproject.com/en/2.1/ref/databases/
alternativeto.net/software/microsoft-access/
pastebin.com/raw/4d3yxCH1
datasette.readthedocs.io/en/stable/.
github.com/dokku/dokku
twitter.com/SFWRedditGifs

Okay, grandpa

SELECT * FROM thread WHERE op != 'homo';
>Rows returned 0

>sql
slow.

Anyone who uses nosql shit like MongoDB is basically a webdev fag brainlet hipster who literally likes sucking dicks. Postgresql makes MongoDB obsolete

True. MongoDB has similar storage caps as Postgres.
However NoSQL stuff like Hive and Spark could smoke both on a properly configured Hadoop cluster.

They've fixed up MongoDB enough that you can use it when you need horizontal scaling of non-relational data. It even has ACID transactions now, lol. What you shouldn't do is have MongoDB as your go-to database. _That_ is the retarded hipster thing. Thankfully the trend seems to have died down.

>Slow query language
>Good


It's good for some types of data, but we've long passed the age where you could just store EVERYTHING in a relational database.
Sad, but true.

The fuck do you do with databases? Also, how can a query _language_ be slow?

Not the query language itself. After all, what matters is how the DBMS understands your query and how it actually uses it to fetch the data.

But relational databases have obvious limitations. At some point, even sharding and partitioning won't help against those.
The big companies didn't invent graph databases and time series databases because they couldn't into SQL.

Any good alternatives so SQL?
I got a database with around 200 million entries, currently it's just in sqlite

Attached: 1444611285940.jpg (1024x768, 49K)

A friend of mine recommended Django, but looks like it's still just SQL
docs.djangoproject.com/en/2.1/ref/databases/

I'd love to build a simple relational db for keeping track of users and computers, but it has to be easy to use and require little to no coding to set up and update, since I'd need to share it with others. We're currently using a spreadsheet and it's grown super messy over the last 10 years.
Is there a product or program that can do this?

What is your environment? You could do it with access, but you probably don't want to. You could add these tables into something like DDB on AWS with very minor "coding" as well as long as you plan the tables. There are many options.

It's not a SQL problem, it's a sqlite problem. Install a real database. Either Chad Postgres or Virgin MySQL.

You could also look into something like sqlite, depending on how big your use-case is, but be careful about underestimation, as it will eventually come back to bite you if you just have to re-engineer your decision.

The problem is more with existing relational databases than the model itself. There are limits to what you can do with Postgres and MariaDB but look at something like MemSQL and CockroachDB, which were built for distributed access from the start, and you will see a very different picture.

Graph databases predate relational databases. The relational paper Codd wrote at IBM was a reaction to the hierarchical and the network model (a limited tree and graph database model respectively). I agree that time series databases have a good reason to exist, but I think developers underestimate relational databases. They tend to think that relational databases today are as good as they can be and their development is pretty much finished when this is not the case.

Is it slow? SQLite can be fast enough even with database files measuring in dozens of gigabytes when you have the right indexes. If you can't get SQLite to do what you want, migrate to PostgreSQL.

Yeah, would prefer to avoid Access.
Environment is a utility agency with 300+ computers and ~250 users. I'd have two main users and would like to have the ability to give read access to others. Local hosting if web based.

If a simple front end exists that can use SQL for this type of thing, that's cool too.

alternativeto.net/software/microsoft-access/

If your aren't growing you could do this with sqlite and sqlitebrowser maybe.

You could use mysql workbench with a read-only user in the database, or add individual users with read-only grants.

Thinking bigger, would active directory or samba be applicable here to maintain IT inventory and users/orgs? Then you can grow into groups, permissions, kerb/auth, etc.

Damn, I forgot about that site, I use it all the time to avoid proprietary shit for my personal use but never considered it for this use case,

AD integration would be awesome. I didn't bring it up because I assumed that wouldn't exist in the simpler end of things.

A single AD for a small environment is as simple as running dcpromo on Windows server, with GUI snap-ins to manage users, and permissions to handle who can view what. It was sort of made for this scenario. You can do the same with samba4+ today if you'd like as well, but it seems a little more complicated in general.

You could always re-implement in the DB methods I brought up earlier though.

Looking at the alternatives to Access, VFront, Symphytum and nuBuilder look pretty good. Kexi had some hype around it, but I think it's still immature.

We have an existing AD with all of those things in place already. I meant that I didn't think smaller database management software would leverage it.

Appreciated, thanks user.

NoSQL is great and all, but you *know* when you need it. If you think you need it, you don't.

Fucking "we're setting up a Hadoop cluster because we have millions of rows" tards are the programmer equivalent of kids building secret pillow fort lairs.

>because we have millions of rows
LOL, too true. There are definitely people who actually think that a few million of rows == big data.

>If you think you need it, you don't.
/based/

>I don't know if I need it or not
>suddenly, startup explodes, now were seeing outages because I chose the virgin RDB

If your startup explodes and yet somehow you don't know what the shit you are doing, what you do is hire somebody competent. In most cases, though, you are probably not dumping photos into your RDBMS or anything outrageous like that. It means that you can scale your database server by just getting a bigger box.

>just re-engineer everything later when you are in an outage my dude

Even without binary blobs in your DB like images, you can hit their limits. Getting a larger box isn't the solution to all problems.

With RDBs, you can't scale writes horizontally, and if you keep your databases so large that you need to keep upping your hosts, your data is going to be a mess too, you probably should have broken things up into multiple services with their own RDB prior to that.

>>suddenly, startup explodes, now were seeing outages because I chose the virgin RDB.

If your startup explodes, you divert your ETL to something like BigQuery and off-load when you get your Hadoop cluster online.
There's no reason to spend money on a cluster for a "boom" that may never happen.

You can pay for Spanner, too.

Sure, but investing a small amount of time in nosql is not hard and pays dividends later, you just own a bit more of the query than you would in sql.

1. The vast majority of startups doesn't exactly fail because it can't keep up with demand.
2. A lot of them scale just fine with SQL.
3. "Owning" queries is detrimental to your ability to iterate fast.
4. Startups that need to store a lot of data per users usually can tell in advance.
I would like to hear how you would design the database setup for a random B2B startup, though.

Startups who think they might need to scale quickly usually use a DaaS anyway.
It's easier to pay a monthly bill than to pay a shit-ton of cash for infrastructure up front.

Exactly.

1. Just because most startups fail, doesn't give you a free pass to do it incorrectly.
3. Not really, your data shouldn't change that much, no matter what datastore you choose, so you abstract it with a library that you and other teams can use.
4. Flickr started as an online game company, Facebook was just that small social page, Amazon started as a book store. Each one of these companies went through insane growing pains. I'm not saying every startup is going to reach that level of growth, but they have all been burned hard by RDB.

1. No, it doesn't, but it means you should optimize your resource use correctly and not overinvest in infrastructure.
3. "Your data shouldn't change that much [so build your own query layer over a NoSQL database]" is your advice to startups?
4. I don't know about Amazon, but Flickr and Facebook are success stories of scaling with freaking MySQL.
Tell us what you would use, already. Give some detail.

1. NoSQL is not a huge overinvestment, you are making it bigger than it is. I'm saying it's a relatively small amount of work for big future gains.
3. I'm saying realistically your services conform to your data. This isn't advice, it is truth.
4. Facebook pushed mysql way past its boundaries, and now has moved onto hive and cassandra. Your average startup won't have facebook tier engineers to fix something as complicated as database internals.

3. It will suck when you have to pivot.
4. But you will have the engineering talent to maintain what will grown into an ad-hoc, informally-specified, bug-ridden, slow implementation of half of PostgreSQL?
I'll stop responding to your posts because you don't want to get to the particulars and describe your preferred NoSQL setup out of fear of backlash.

NoSQL is retarded for most use-cases.
>be me
>at previous job
>enterprise architect decides entire company should move to cassandra because MySQL just doesn't scale
>MySQL database is only like TB
>Peak load is only like 10k queries per second
>Enterprise architect orders huge cassandra ring be built
>Starts moving applications over to cassandra
>Busy season hits, new products using cassandra come online
>Crashes and immediately goes offline under the slightest load
>meanwhile, over on my team...
>The MySQL server is at 90% CPU because the connection pooling is turned off and nobody understands what they're doing
>Turn connection pooling back down, set reasonable number of connections
>CPU usage on MySQL database immediately drops to 10%
>Query response times immediately drop by a factor of 10
>Application suddenly scales way past our current needs and then some
SQL is fine. It's the brainlets who don't know how to use it that are the problem.

>pajeet can't use sql correctly
>"sql slow"

>connection pooling is turned off
Weird. Did someone deliberately turn it off? I think nowadays it's generally the default.

From everything I could tell, someone turned it off to debug something and then accidentally committed it, and the brainlets on that team at the time (who eventually moved on to become the SQL can't scale, we need cassandra crowd) never noticed, and never questioned it when the MySQL server kept hitting it's connection limits. They just kept upping the limit. They had it set to something retarded like 20,000 connections. Once we turned the pooling back on I think we backed it down to something like 10 connections per application server, which ended up being like 200 connections total, and suddenly everything started flying...until we uncovered the next retarded thing our predecessors had done. Rinse and repeat a few cycles and we had the amount of hardware cut back to like 1/3 the original setup.

Basically, I think people are just retarded and don't know what they're talking about, so they blame the tech rather than their own stupidity. I honestly can't understand how people look at things like the NASDAQ or Wikipedia running on SQL and say "nah, that just won't do for our workload".

>someone turned it off to debug something and then accidentally committed it
>They just kept upping the limit. They had it set to something retarded like 20,000 connections.
That's a pretty funny story. If that is how people are, no wonder database consulting is so lucrative.

>!=
Spotted the nosql retard.

Attached: 16e.jpg (903x960, 52K)

Relational databases have no limitations other than the CAP theorem, which affects all distributed systems. All of them. The reason why engines like postgres scale "poorly" (if you're running into this problem you're either a fortune 500 or you're an idiot) is because they take C and P extremely goddamn seriously. There is a price to pay for that, and that's poor horizontal scalability for a write-heavy load. To get crazy write performance like Google needs, you have to be willing to let go of absolute consistency. That's what they do (they also shard like an absolute motherfucker). There is no magic or free lunch.

200 million for reasonably sized records is no problem for a good engine. Put your shit in postgres. I've put billions of rows in postgres (a few TB db size with 32GB memory on some shitty Xeon) with little fanfare.

CockroachDB also sits on the CP side of CAP. It distributes extremely well and is very durable as well as consistent, but under a write-heavy load transaction performance will be slow and under bad network conditions writes will fail. This is fundamental.

My data is too big for SQL.

Attached: 1514483951148.jpg (700x700, 110K)

MariaDB is better

At what level of writes/sec does CockroachDB choke, assuming arbitrary horizontal scaling availability and no network drops? I'm considering it for a new project.

i have a real hard time thinking in a use case where nosql is a good alternative.

if you are going to use data may as well convert it into something you can actually use.

I refuse to use Postgres because I don't know how to pronounce it.

post
gress

Use that SQL that has the seriously Christian CoC - the one that has homosexuals all upset

Here's some SQL for you all to criticize.

pastebin.com/raw/4d3yxCH1

It's not actually suitable for anything but small single user tasks.

They lost their CoC when Mozilla allegedly threatened to leave the SQLite Foundation.
It's suitable for large single-user tasks, too, and for multiuser read-only tasks. Just look at datasette.readthedocs.io/en/stable/.

>It's suitable for large single-user tasks, too, and for multiuser read-only tasks.
Right, but still, most use cases for SQL databases, with servers and a ton of users, are not suitable for SQLite.

it's valid sql tho, faggot

>tfw found a query that can pull in 4k of records in a category-tree configuration (single table) in under 0.3s on a typical mysql server
>sql is sloooooow ;_;

>It's the brainlets who don't know how to use it that are the problem.
basically this

sure m8

>it's valid sql tho, faggot
it's not ANSI compliant

>ANSI compliant
lol

My data is too big to fail

Is storing JSON inside a database a bad idea?

Attached: 1526601958944.png (1440x1557, 821K)

It depends on your situation. Ask yourself, can you normalize the data? Will it be updated? Is it for prototyping?

If it was, json wouldn't be in postgres. Relational fields (ie identifiers linking to other places in the database) should definitely be outside of json, in normal columns, but for everything else, generally should be no problem to use json.

Ive got a question. Im using mariadb for a little project of mine. Let's say my server gets hacked and someone has access to the db files, is the data readable? If yes, what would be the best approach to protect the data?

What do you mean? If he has files of the database of course he has your data, he can set up his own mysql database using those files and run any queries on it. Don't get hacked to protect your files.

>Don't get hacked to protect your files.
Of course but Im just thinking worst case. Shouldnt I use the encrypt ion feature of Maria db?

If that makes you sleep better...
If the database is able to launch and run automatically then the attacker can steal the key from it.

>is the data readable?
Yes. You can verify it yourself. Just copy /var/lib/mysql to another instance.
>If yes, what would be the best approach to protect the data?
Basically, try to not get hacked. But if you are serious about security, you can do some things to reduce the potential damage. For one, you can run the database on a separate server from you applications what tables each application is allowed to access and what queries it is allowed to run. It will at least prevent script kiddies from exfiltrating all of your data using automatic tools. PostgreSQL has a lot of functionality for access control; not sure about MariaDB.

I see, another question. Im still relatively new to this and learning. I still dont understand what this whole docker thing is about. My project runs on spring boot and mariadb and I have set up my servers manually. Do people use docker to actually deploy their projects?

>server from you applications what tables
server from you applications, limit what tables

I never used docker myself, but I think, yeah, they use docker in production.

>MySQL
Might as well MariaDB then.

>Do people use docker to actually deploy their projects?
I do. I am even running databases in Docker. Aside from the performance hit they run just fine.

A good way to get started with automated deployment if you are new and don't have a lot of deployment experience is through Dokku. It works like your own tiny Heroku.
github.com/dokku/dokku

It depends on your network latency. Nodes must reach a quorum before a transaction can be considered committed.

So it's pretty much limited to one datacenter if you don't want to wait forever?

Enjoy your vendor lock.

Attached: 15171782264724.png (600x360, 151K)

Depends on your requirements. Cockroachdb is designed for use cases where it absolutely positively needs to be consistent and durable, potentially at the expense of performance. If potentially waiting a few hundred milliseconds for a transaction is okay under your workload then go for it. Heavy distributed writing is not the expected use case. Read scaling should be close to linear though.

If you want close to linear distributed write performance you'll need an engine that's designed for delayed durability, which means you'll need to accept that sometimes you'll get a write conflict.

Sounds like it would be a good fit for a distributed imageboard.

Can postgres store petabytes and trillions of rows?

Yeah probably. Put the DB on ZFS or Gluster and I don't see why not.

Whats so good about them? I know nothing but seems interesting

Anyone else uses commercial systems? Oracle, SQL Server?

I use both. They work fine and are plenty fast. SQL Server has SSMS which is seriously nice, and licensing isn't absurd. Oracle costs as much as Oracle can bleed from you though. Businesses get locked into it and eventually spend literally hundreds of thousands a year on licensing.

ms access > *

They both suck crusty butthole and their developers should be dropkicked into an erupting volcano. Postgres is better for every conceivable workload that isn't hard coded to use Oracle or MSSQL.

I am using Oracle DBs running on Exadata and SAP HANA.

AMA

Not if you have less than 1000 records and dont care about aggregate data and joins

I’m a relatively new financial analyst. Getting lots of data from different databases, using SQL in many applications. Pretty comfy so far.

What can I learn to really set myself apart from my colleagues?

VB / VBA.
Anyone can run it without additional software, it is quite user-friendly and can be included in any file. Also it is easy to learn and has many practical use cases.

Go then from VBA to python, if needed. Programming & databases in combination with solid finance skills are a bit rare in most countries.

>Oracle -and- SAP
How often do you contemplate suicide?

Python or Ruby.

I work for a Healthcare company with an Oracle DB that is over 100TB in size and there are many other organizations just like ours in Healthcare. As long as you don't do dumb shit like querying an entire table that has 50m+ rows you're not going to have bad performance when indexes are setup correctly.

Thanks. I tried doing some VB thing in excel copied off the internet. Didn’t work. Been busy and gave up after that. Any suggestions on where to start? I know a lot of what I do can be shortened and automated with macros and VB. Just need to start somewhere.

Oraclenigger here, the suffering is real, makes me feel sick that this is what we're making out clients use.

I've got an SQL table with sequential ID numbers and a field with a string. Let's say I have a specific ID, and I need a query that get's the result with the next lowest ID number but where the string also matches something.

So in my table I've got:
ID: 1, String: "String1"
ID: 2, String: "String2"
ID: 3, String: "String3"
etc.

Let's say I know that 3 exists. Now I want to get the most recent one (highest number) lower than 3 (which would of course be 2) but I need the next lowest where String = "String1", so in this case the result of the should query the row with of ID 1. Any way I can do this somehow with a query?

Attached: sql.webm (576x720, 834K)