Tagging system

So Jow Forums I was thinking about creating a tagging system for my files.

I was thinking about using a SQLite database and bash together (Windows and OSX are not targeted) I simply will have some folder where the script will hard link the required search results.

I'm only uncertain how to implement the tags in SQLite I'm not big on SQL and I hear that joins are the way to go.

Any suggestions Jow Forums ?

Attached: SQLte-600x400.png (382x333, 40K)

Other urls found in this thread:

unix.stackexchange.com/search?q=scripting sql
stackoverflow.com/questions/2885564/ways-to-implement-tags-pros-and-cons-of-each
tmsu.org/
github.com/hydrusnetwork/hydrus
tmsu.org
twitter.com/AnonBabble

use perl

This is not solving my SQL problem.

unix.stackexchange.com/search?q=scripting sql
go nuts noob

Dont forget to enable foreign keys. Fucked my shit up before. You gotta activate it manually otherwise you wont get the "constraint error".
Lower versions dont support that so get the highest one i guess.

Also the sqlite documentation is pretty good.

If you do something like

You'll run into problems. Learn about normalizing databases.

A good book on this is the manga guide to databases, not even memeing , easily pirated. This might be a good case for NoSQL btw.

*would be a hash if you aren't a brainlet.

that's not really sqlite specific

>
You need to explain this to me.

I was thinking on having the script search the database that has the required tags and only output the file names, and then the script grabs the filenames and works with this.

I was thinking in making it all in one table and having columns for tags however SQLite only supports 200 columns in a table max so this is problematic.

I was thinking there are 2 tables one for the tags where the tag name is the primary key and then I join this with another table that has the filenames only.

Are there smarter ways to do this, I don't want to fuck myself over after I have tagged a lot of shit and deal with problems later.

> Are there smarter ways to do this
Judging from your solutione yes: any other solution, even XML based.

seriuosly, you need to study some db principles

I only want to tag files.

>you need to study some db principles
I'm a DB n00b.

Tagging a file only works if the file doesn't change. If I rename "opisafag.jpg" to "opisstillafag.jpg" do the tags stay with it? This problem is the foundation of hydrus, a file tagging system identical to what you propose. Hydrus locks your files into its own little sandbox so you can have tags, but you can't go in and manually edit files. This is different from gwenview, which stores them in the file itself using filesystem magic that Windows can't into but I've never been able to integrate it into a file manager. I can see pictures, edit pictures, play with tags, etc; but I can't drag & drop from it into another program to choose the file.
Anyway, find a way to store the tags in the file itself is your best bet. Do that and you solve the biggest problem and don't need a database at all.

there's nothing wrong being a noob, what's wrong is stop to learn new things.
Try to study db principles, then implement your tagging system. In this way you'll learn:
.DB is probabily not entirely suited for your goal
.At least now you can create normalized dbs and query them. You can use it in your next project

Never will a tag system that aims to store tags in the file itself be universal.

>Tagging a file only works if the file doesn't change. If I rename "opisafag.jpg" to "opisstillafag.jpg"
Umm 1 There is a difference between changing a file name and changing the file the file name is not the file itself.

>If I rename "opisafag.jpg" to "opisstillafag.jpg"
The system is basically a magic folder with some background folders and scripts and a SQLite DB file in it.

You input files into the database and they are moved to the background folder (scripts) you are never expected to change the contents of the background folder, especially the file names.

Think of it like a box where you have pictures and you can pull them out by manipulating scripts.

Its not a global file system tagging solution (I wish someone implemented this) however I did never have the urge to tag everything on my computer its for a specific purpose and only for 1 collection, don't see a need to have a file system wide tagging.

>Hydrus
I tried to use it
1) the documentation is written by someone who is insane, starts rambling about unrelated things and never even attempts to answer the question the chapter is about.
2) The thing can not remember tags I want to insert
3) I have no idea how to make it work, see (1) documentation is insane.
4) I can not use my system tools on the output = shit.

>.DB is probabily not entirely suited for your goal
Why not?
Creating my own text file format for tags is stupid and using SQLite looks perfect.
I don't know if I give you the fandom example of the problem.
Basically its something like this:
>give me:
>Pictures with
>Character A only
>Funny

Or
>give me:
>Pictures with
>Character A , other characters allowed
>Funny

Or
>give me:
>Pictures with
>Character A and B together (must have 2 tags)
>Funny

You need 3 tables. One for tags, other for files and a third table for many to many connections between tags and files. The third table will be the most active and contain only primary keys of both tables.

>third table
Third table you say?
I was under the impression SQLite (I'm only using it so other SQL flavors are irrelevant to the discussion) will store the joins itself somewhere else. Or will it not?
This is why I asked about these join things, I don't know if this is the right way to go.

Do I really need a 3 table?
I was perfectly good with using columns for tags only then I learned of the 200 column limit.
Why can I not join the tags directly to the row in the table contain the filename?
I was thinking like my previous idea only working with the limitations of SQLite.

just use xattrs

Join is an operation, it doesn't store any data in the database. You can however store "Views", a View is a predefined statement that can be invoked similarly to a read-only table.

>I was perfectly good with using columns for tags only then I learned of the 200 column limit.
And that's why you need a third table.
>Why can I not join the tags directly to the row in the table contain the filename?
This method doesn't make sure the tags don't erroneously repeat. You'd either have to add a trigger to check that or have your script make sure that they don't repeat.

use python it is easier than bash

>And that's why you need a third table.
Why do databases always feel like the most primitive and backwards thing imaginable? And that's saying a lot since no programming language has progressed since the 1970s

>This method doesn't make sure the tags don't erroneously repeat.
What?
Explain this, I want to tag multiple things as TAG_Funny.

>sure that they don't repeat.
Why?

Bash is ass however I need to make move operations and uses command line programs like MD5sum.

How easy its it to implement in python? How easy is system scripting in python VS Bash? And access to installed system tools?

Here's how you could do it:
Table 1: (int, string) for tags where the first is the id of the tag, the second is the name of the tag.
Table 2: (int, string) for files where the first is the id of the file, the second is the name of the file (better make sure second column is unique too).
Table 3: (int, int, int) for connections where an entry (k,x,y) means file x has tag y, x being the id of the file, y being the id of the tag. k is just some automatically generated key.

To get all anime pictures for example: make a query that first gets the id of tag "picture" and "anime" in table 1, then look in table 3 for the id's of the files that have both those tags, then look in table 2 to get the names of those files.

It's been a while since I've written SQL so I don't know if this is a good way to do it, but it should work.

>Table 1: (int, string) for tags where the first is the id of the tag, the second is the name of the tag.
Um I was thinking the tag name will be the primary key.
Since I want them to be unique.

>Table 2
I see where you want to do a int however I was thinking of not using any primary keys in table 2 since duplication of a filename is not a problem in a row the worst thing they do is call the pulling operation for the same file 2 times or more. And this results in the same file overwriting itself with itself in the output. Not really a problem.

The real question is "will me ensuring DB cleanliness(your position) create issues if I want to insert a file with the same MD5 name so I chose a dirty solution allowing for file name duplication"?

Or will SQLite not let me do it this way? Because of database reasons?

>You need to explain this to me.
Let's say you have 1 file and you have a bunch of tags for that 1 file based on the contents of the picture. DONT store all those tags in 1 column of that row in 1 table. If you do it becomes denormalized and updating tags becomes harder.

Read manga guide to DB

I think it would be easier to do it in python than bash. it should be shorter and more readable too

you can choose to use libraries or call external programs from your script to accomplish everything you need

if you don't like idea of many-to-many relationship in sql you can also explore nosql databases , basically research python list,dict json and some nosql database it should be more intuitive for newbie

and download PyCharm IDE , even if you use sqlite it has absolutely invaluable tool for working with database

>Um I was thinking the tag name will be the primary key.
>Since I want them to be unique.
you can still make tags unique, that makes sense. i just thought it would be more storage efficient to have an additional int as a key, since table 3 refers to the tags. so instead of putting a string there, you could use the int.
Same for table 2. It's a common approach with databases to start with some tables that store your "primitive" objects, all of them with an additional key. Then you can have arbitrary tables linking these primitive objects together in any way you want. Using the keys for this makes this efficient and it composes nicely.

Whaaat?
Lets use some visual help:
stackoverflow.com/questions/2885564/ways-to-implement-tags-pros-and-cons-of-each
I was planning on 1 table like in the example of
>Way 1: Seriously denormalized (comma delimited)
Only the column name did have the tag name and the value was a bool (or int SQLite has no bools) denoting if the tag is present making searches fast.

Only I looked up the documentation and SQLite has a 200 column limit for 1 table.
(1 million tags or more limit is the ideal way)
So I search for another way.

>updating tags becomes harder.
A you see the system has no capability to update tags its insert and forget.
The only update way would be for me to issue a delete row command and reinsert the file with the new tags I wanted.

Or is this a stupid idea?

tmsu.org/
github.com/hydrusnetwork/hydrus

>i just thought it would be more storage efficient to have an additional int as a key
How important will this be? Its a simple SQLite database to tag my selected files in a folder.
I was more interested in making the code semi readable.

And what about this join and View thing?

>you can still make tags unique, that makes sense
So you want me to create 2 primary keys in 1 table or what?

>github.com/hydrusnetwork/hydrus

See:
>>Hydrus
>I tried to use it
>1) the documentation is written by someone who is insane, starts rambling about unrelated things and never even attempts to answer the question the chapter is about.
>2) The thing can not remember tags I want to insert
>3) I have no idea how to make it work, see (1) documentation is insane.
>4) I can not use my system tools on the output = shit.

Did anyone even ever use Hydrus?

It's current_year, yet hydrus still can't play audio.

I tried once and it looked like pure autism like GIMP

>tmsu.org/
>command-line

I don't know I was thinking to operate my scripts on files and not command line.
Basically I drag picture files to a command folder and the script checks for specific file names and then creates a SQL request to the database and the output is grabbed by another script to create a move command from the backstage folder to copy into the output.

The benefit of this is that I have auto completion and can not mistype a tag name.
Oh and pictures I see the pictures for icons.

>It's current_year, yet hydrus still can't play audio.
I mean its a joke who even wants to play audio etc in the shitty build in player? Or video? Or pictures? Why is this shit not integrating with the OS?

feasibly, you could just use three tables, one for files, one for tags, third for mapping tags to files

I did see this solution a lot my only question is:
1) Will this not rot to hell and back if I ever delete a row from the file name table? Since the intermediate table will have a bunch of pointers to a row that does not exist? How bad can this get over time?

2) What about the joins and views ? Is there a way to make the DB automatically handle it without me needing to create a middleman table? I did see people say things about views.

>How important will this be?
i haven't run any benchmarks, but it's probably only a noticeable difference if you have thousands of files.

>And what about this join and View thing
I've never liked views as a concept, and I don't think you need them here.
Join is an operation used in the query to 'join' together two tables. For example if you have table 1 for tags and table 3 for connections as I suggested, you could select a tag from table 1 and join that with table 3 to get all file id's that have this tag. then you can join that result with table 2 to get the file names.

Now that I think about it. To get all files that have exactly 2 or more given tags is probably more complicated with this approach. You'd probably have to select the file id grouped by the tag id having count = 2 or something after joining.

>So you want me to create 2 primary keys in 1 table or what?
you can make a column unique without making it a primary key.
create table A (x int primary key, y int unique);
// x is the primary key, but both fields have to be unique
// so inserting (1,1) and (2,1) won't work

if these are setup with proper foreign key constraints you can just cascade delete anything that uses that particular file ID in the third table, even without it it's straightforward

something like DELETE FROM file_tag_mappings WHERE file_id = ? etc

you're overthinking how difficult this is

>Did anyone even ever use Hydrus?
I do. It has a learning curve, the help is 97% useless and the data is internally stored in really weird ways. But, it's the closest thing to an actual working cross-platform local booru that is actively being developed. I initially didn't get it at all and wanted to write my own alternative, but abandoned that idea because writing from scratch for all scenarios would take more time than digging around trying to figure out how it works.

>who even wants to play audio etc in the shitty build in player?
Task switching every time you want to see the contents of an item in annoying.

If I ever write my system it will be simply superior to Hydrus in every way since the system has no idea what file types are (unless you decide to make it a tag; hell you can even make bash auto check and tag it for you *.PNG) and it simply pulls files from the backstage to the results folder based on the tags your requested.

I can view all results and use my system tools on it, you can tag anything in there as long its 1 single file.
If this is not KISS I have no idea what is.

I've been using Hydrus for 4 years without any issues.

>tmsu.org

I came here to post that... what OP wants is the kind of work you can brainstorm in a couple of days but can take literally years to perfect and tune... I would simply use an already developed solution or learn from it

I mean why to reinvent the wheel?

>picture unrelated

Attached: 1515886100118.jpg (900x1200, 234K)

>I've been using Hydrus for 4 years
For what exactly? How can you create your own tags and make them stick to files? The documentation is insane there was something about downloading tagged files from some server?

>I mean why to reinvent the wheel?
Multiple benefits.

Learning about databases for one thing. Reinventing the wheel is always beneficial.

Because as i suspected, with your query needs you'll end up with just one table one table
(name, taglist+)
or you could try to make it more complicated with at least 3 tables [tag], [file], [tag-file]. then
> queries will become more complex
> you have to manually insert new entry for new file
the last one will probably make your db inconsistent

What could be more useful is a userspace fs that support tags and query for them