I want to master regular expressions. What are the best books I should read in your opinion Jow Forums ?

I want to master regular expressions. What are the best books I should read in your opinion Jow Forums ?

Attached: 80170c11996bd58e422dbb6631b73c4b.jpg (780x1024, 116K)

Other urls found in this thread:

regexr.com
ex-parrot.com/~pdw/Mail-RFC822-Address.html
diversity.google/
regexcrossword.com
shop.oreilly.com/product/9780596514273.dohere
github.com/sparklemotion/nokogiri
regexper.com
regex101.com/
twitter.com/NSFWRedditVideo

Mastering Regular Expressions O'Reilly
Watch Lea Verou's video on Regex (ascii puke)
Read the "Regular Expressions" section in the grep man (Needs a Unix like OS)
Read other people's solutions to grepping for patterns.

This.
And learn how the regex engine works. That's essential.

Cheers user, appreciated. I had a look at Mastering Regular Expressions and looked quite chunky, as in a lot of writing and not so many examples; whereas I have read that the cookbook by O'Reilly is a bit more practical. What do you think is best for a beginner?

Cheers for the other resources.

Set yourself some validation challenges -- zipcodes, email addresses, something that blocks malicious sql injection attempts

Can use an online interativd regex calculator

Play regex crossword!

At least read the part in "Mastering Regular Expressions" about the Regex engine. It really is imperative that you know how that works if you want to create effective regular expressions.

Just look for "Engine" in the Index.

>something that blocks malicious sql injection attempts
Nope. You shouldn't do it like that. This is a terrible idea on every level.

Just don't waste your time on this. Regex is some extreme shit that if you are unlucky enough comes up about all couple of months. And most of the time it's extreme basic shit. If you are extremely unlucky, maybe once a year a mildly difficult Regex will block you way. But you can easily get around it by googling and try and error in a couple of hours max.
What's the point in really learning it? If you don't use it everyday, you will forget everything in a few weeks. And desu. if the Regex gets way too complicated it would be best to solve it another way. Shit is absolutely not maintainable.

>whats the point in learning anything

there are so many things you can / should learn and Regex and XSLT are certainly not it.

Regex are just beautiful

[], ., *, +, ^, {}, $, ^ and | are enough for 99% of use cases. There's no need for more than this.

Why are you posting here. OP wants to learn Regex and you're just trying to dissuade anyone from learning Regex. What's your point? Are you tryin'a prove you're better than OP?

Agreed, its so powerful

.*? and .+? are very useful tho only available in PCRE

>email addresses
Extremely hard to match with regex, maybe even impossible.

You mean IP addresses are hard to match.
Email addresses aren't that hard if you know something about the email addresses themselves. Trying to match every possible email address in existence is a challenge but who would do such a thing.

Great learning and testing tool:

regexr.com

>impossible

ex-parrot.com/~pdw/Mail-RFC822-Address.html

What's wrong with this?
[^@ ]+@[^\. ]+\.[^ ]*\b
Where's the catch?

Correction
\b[^@ ]+@[^\. ]+\.[^ ]*\b

This matches `@;. to give an extreme example.

What's being negated in the last character class?

space

The space character, which is obsolete by the word boundary /b I added after that
Oh, one sec

Start with finite state automata. If you understand those, you only need to know the syntax to know regex. If you need to write a very complicated regex, draw the schematic of the FSA, then translate that to the regex syntax you need.

Looks like email addresses have a whole lot of exceptions, implementing all of these would indeed be long as fuck

Attached: ss (2018-06-09 at 13.05.26).png (684x366, 30K)

The last character class has * as the quantifier but it would be cleaner if it had {2,3} or so.

The Wikipedia page tells you all there is to know.

Daily reminder if you don't know what catastrophic backtracking is you know fuck all about regex.

You don't know about the .pharmaceuticalindustry TLD?

I changed it to +
Problem is, with all those new domain endings that recently got introducted, like ".google" in diversity.google/ for example, I'm not sure what the longest and shortest one are.
So So the + quantifier is what I'd go with.

Wow. I didn't know about that.

And that's only the part before the @. I assume the rules for after the @ are the rules for internet domain names?

Why would you need a book for something that can be learned in one evening. In a couple of minutes for simple stuff, even.

>learned in one evening

sure if you are just matching digits and characters from a simple string lol

Modern top level domains can be much longer than 3 characters.

That's the problem with matching e-mail addresses: a lot is technically allowed even if it's obviously not a real e-mail address.
And even if it looks like a legitimate e-mail address there is no way of being sure it exists until you send an e-mail and you get a reply.

But to serve as an input validator to stop people from accidentally entering their username or telephone number or whatever it's probably enough to just use /.+@.+/

That's what regexps are used for. I only needed a serious reading when I was learning balancing groups(which I ended up never using in real world).

>I want to master regular expressions
Is there a more boring pursuit in existence?

The whole specification can be written on a single page.

Sure it takes some practice and intelligence, but you don't need a book to learn it.
Just like you don't need books to learn chess - simply read the rules of the game and play.

"op".search(/faggot/g);

It is a very useful skill.

Just being able to do regex find-and-replace in a text editor lets you manipulate data in a matter of seconds that takes normies hours of manual labor.

>posts on Jow Forums

.t elo 950.

why?

What kind of brainlet needs a fucking book to learn regexes?

regexcrossword.com

Not him, but you shouldn't build SQL queries containing input values yourself to begin with.
Use a function that automatically escapes all input variables.

But if you absolutely must build the queries yourself, then always escape the
entire inputs so they can't possibly be executed.

All I'm saying is that it's a waste of time to try to master it. Sure you should know a little what you can do and try to understand the more advanced things like groups and backreferences. But reading a book about it, unless your are writing a regex matcher yourself or really need this day to day, is absolutely ridiculous. There is certainly some other programming concept that is way more important that you haven't mastered yet. However, I'm just giving advice here, everybody is free to choose their own suffering.

>Mastering Regular Expressions O'Reilly
shop.oreilly.com/product/9780596514273.dohere ya go buddy

The only thing this thread is really missing is a book for theory. You need to know what a finite automaton is and what a push down automaton is. Further it is helpful to be able to use the pumping lemma to figure out whether the pattern you want is actually regular. Usually the setup for a pumping lemma proof is enough to get you unstuck for the trickier regexes.

Unfortunately I don't have any experience with books on finite automata and theory of computation that I thought were good.

While using regex as a safety mechanism against SQL injections or as an e-mail address validator would be a bad idea, I think these exercises are actually representative of the kind of bodges you would use regex for. Want to throw together a shitty shell script that scrapes a website for e-mail addresses? Regex. Want to see if your users are trying to SQLi your (otherwise properly secured) website? Regex.

I think your correct it shouldn't be done as a safety mechanism, but I don't think you're correct in it being a bad idea.
You touch on it in your final point that it can be used to keep garbage from hitting your sanitization stuff, and that's a good idea.
Basically it's a quick and sane first step, but some applications of it might be limited or impossible.

Use procmail.
All rules are defined by regex.
It's a practical way to learn the after getting the gist of the grammar.

Regexs sucks, PCRE particularly.

>You need to know what a finite automaton is and what a push down automaton is

You don't need to know any of this. You sound like one of those fart huffing spergs who recommends books on discrete mathematics when someone asks how to learn javascript.

Thou shalt not attempt to parse html with regular expressions. Theres 50 better ways.

ya nigger fuck outta here with ur learning hating ass

What's a better way to do it without using extra dependencies that might not already be pre-installed in a normie user (non-dev) environment?
Yes, it's bad. But humor me.

50 betters ways

>You mean IP addresses are hard to match.

What?

Maybe he meant ipv6? But even then...

My recommendation is, don't try to master regex. Whenever you feel the need regex would be useful try to use it.

So why can't emails be validated by regex again? Yes, I've read the RFC, and while it's true that the full spec is much weirder than what most websites even allow as an account, I don't see anything that would make it non-regular. Yes, you'd need a complex regex, but I don't understand what reason there is for that 200-line perl regex which matches "most" emails to exist. What am I missing?

You'd have an easier time whitelisting domains

Name three that do not require additional packages etc.

github.com/sparklemotion/nokogiri

Uh, you look up the special characters and then use them? It's not a difficult thing, you can learn all of it in under an hour.

>reading comprehension
Can Americans suicide themselves faster please?

So you need this AND ruby to parse html, nice.

>only 1 op
>global search
why?

Learn the basic special characters and what they mean.

Type "man regex" into google or the command line of a unix-like os. It's a basic primer that won't give you theory or more complicated examples, but it's how I learned.

Use a visualizer like regexper.com to see the state machine of what the regex will match.

Be aware that different implementations have quirks, but you should be safe learning what is called "extended" regular expressions. It's almost universally supported.

Be aware of their limits. They can not describe contextual languages in general. That means anything with recursive structure like xml is off the table without lots of additional assumptions.

I see you've read that stackoverflow thread, too bad you were too lazy scroll down.

Attached: file.png (1150x278, 79K)

regex101.com/

It's easy to match four numbers with three digits at most separated by dots with regex, but the numbers in IPs only go up to 255 and regex can't do shit like "match numbers less than x" afaik, so you'd have to deal with a few separate cases. Not really hard, but ugly.

To be fair, it works or it doesn't. If you're picking it out properly with your regex you've done your part, blame the source for providing a bogus IP.

Email is impossible to due to stupid comments being included in spec but nobody uses those.

Depends on whether you're matching IP addresses from a text file or validating them. The original poster talked about the latter.

Am I close

Attached: firefox_2018-06-09_14-43-30.png (469x278, 9K)

What does that have to do with anything in my post

Automata are easy and make regexes a breeze to visualize. If you only need the simplest of simple regex, you might as well forget regex altogether and write your own little parser.

import htmlparser; and traversing the tree is just as fast.

>htmlparser dependency
What language is this?

Regex is disgusting symbol vomit

.? means it will match a set of IP-like numbers with no separator.

Also \d?\d is more concise than \d{1,2} but it's a matter of preference

Whether you call it BeautifulSoup (python) or jsoup (Java) or mshtml (C#) doesn't matter.

>.? means it will match a set of IP-like numbers with no separator.
I don't follow

Attached: firefox_2018-06-09_15-04-35.png (492x382, 10K)

Oh wait no you need a word boundary, right. My bad

>I want to master regular expressions.

why? there's a threshold between making a regexp (also what engine? lol) and just making your own tokenizer/parser.

I'd only use regexp for basic bitch shit or for tokenizing. maybe if the grammar is simple enough you could use named match groups to punt a basic bitch parser, but why?

>You mean IP addresses are hard to match.

IPv4? yes
IPv6? I haven't seen a valid one yet and personally just wrote my own parser for it. the main thing that's a nightmare is shit like the zero compressor and omission of zero fields like in ::1

>IPv4? yes

meant to say no. it's pretty trivial to write even in weak ones like BRE/ERE

if your Regexp engine is basically ES2018+, C# or Perl tier you could probably write an unreadable mess for IPv6, but why? if you can't match with ERE, you're probably going to far anyhow.

Write syntax highlighting files for GNU Nano or some other editor with regex-based highlighting. Start by reading and modifying the existing highlight patterns, using man 7 regex as reference. That's how I learned regex.

This. Anything beyond basic groups is a waste of times.

Anything beyond that should use a proper handwritten recursive descending parser anyway.

That'S actually pretty good advice.

Anyone got some sweet torrents for the books?

Use your fucking brain, and Google as well

honestly it's not that hard

my problem is I use it like twice a year and forget it every time