I have a firefox bookmarks html file. The schema:
$bookmark_name
I want to parse it and get lines like : $url $bookmark_name
How do I do that using gnu/linux tools? Thanks in advance!
I have a firefox bookmarks html file. The schema:
$bookmark_name
I want to parse it and get lines like : $url $bookmark_name
How do I do that using gnu/linux tools? Thanks in advance!
plz help guys
try asking in the mozilla forums
nah, I am asking your mom out tonight already.
regexr.com
html is not regular
xmllint
Could you be more specific? Do you just want the variables?
With awk you stupid nigger
This
>Firefox
Those are not the variables in the usual sense, I just showed the pattern. I want values in HREF="value" and value
And how would one do it? I am in the hurry, so if you could just help me out with this that'd be so fucking great.
sed, awk, tr, cut, idk do some fucking researches you lazy fuck. Also anyone using linux rn without knowing how to use its coreutils should uninstall linux immediately and install wangblows or osx.
use cut, looks like = can be used as your delimiter
But urls or datum with the equal could fuck you so maybe =" would be where to cut
Why limit yourself? Use a parsing package designed for the job of walking through xml or write a script in a common Linux language like python3 that utilises their language-native xml parsing library. Cut and regex and etc are awful hack jobs full of corner cases.
Well the thing is that I don't know how to program. Embarassing, I know.
I think
$ see "s/.*|(.*)/\1 \2/"
Or something close to it should work. Good luck
So I tried this:
cat bookmarks.html | sed -r -e 's/.*HREF="(.*)"\ ADD.*>(.*)/"\1":"\2"/' | grep : > file.txt
But I get picrelated
I tried to replace : between url and name with:
cat file.txt | tr '\042:\042' ' '
but it replaces : in the url too. What should I do?
you could prolly find a way to get jquery to work from the command line in node, it'd make navigating HTML like that pretty easy
good project to learn on—very specific victory condition for you
I added the .*| Because you can't print what you don't match.
On the other hand you could change the print from "\1":"\2" to "\1"dicksdicksdicks"\2". Then grep for the dicks instead
Do it locally on the server
you can't parse html with regex:
stackoverflow.com
Luckily enough he's not parsing HTML with regex, you imbecile. He's pulling values out of very specific places in a file.
And it's not can't it's shouldn't.
I have a Jow Forums post full of insults directed at my (You)'s. I need to read the HTML document to find just the insults. Can I do this with regex?
>
>Luckily enough he's not parsing HTML
>I want to parse it and get lines like
you can still parse parts of it with regex
>While it is true that asking regexes to parse arbitrary HTML is like asking a beginner to write an operating system, it's sometimes appropriate to parse a limited, known set of HTML.
Read before you post.
OP doesn't even know how to use regex, he's obviously not a programmer and if you had read the question you would know he's not parsing HTML because he's not validating it as proper HTML. Just because OP doesn't know how to phrase his dumb question doesn't have you the right to be retarded on purpose.
The file he's getting does not need to be valid HTML for him to get what he needs
Shut the fuck up and consider suicide.
dude you have your answer already. just substitute ": and replace with "\" " instead. It's not the best way but it works for your case.
something like this in BASH, same concept if you using sed
bookmarks=$( bookmarks.txt
It only works for the first line tho
ok, found sed solution myself:
cat file | sed 's:/ /2'
Oh boy, I should really learn coreutils
For sure, piping cats is basically a sin.