I want to master regular expressions. What are the best books I should read in your opinion Jow Forums ?

Question

I want to master regular expressions. What are the best books I should read in your opinion Jow Forums ?

Evan Campbell

Attached: 80170c11996bd58e422dbb6631b73c4b.jpg (780x1024, 116K)

June 9, 2018 - 09:48

Other urls found in this thread:

regexr.com
ex-parrot.com/~pdw/Mail-RFC822-Address.html
diversity.google/
regexcrossword.com
shop.oreilly.com/product/9780596514273.dohere
github.com/sparklemotion/nokogiri
regexper.com
regex101.com/
twitter.com/NSFWRedditVideo

Grayson Brooks

Mastering Regular Expressions O'Reilly
Watch Lea Verou's video on Regex (ascii puke)
Read the "Regular Expressions" section in the grep man (Needs a Unix like OS)
Read other people's solutions to grepping for patterns.

June 9, 2018 - 09:50

Jason Wood

This.
And learn how the regex engine works. That's essential.

June 9, 2018 - 09:52

Jack Cooper

Cheers user, appreciated. I had a look at Mastering Regular Expressions and looked quite chunky, as in a lot of writing and not so many examples; whereas I have read that the cookbook by O'Reilly is a bit more practical. What do you think is best for a beginner?

Cheers for the other resources.

June 9, 2018 - 09:54

Christopher Walker

Set yourself some validation challenges -- zipcodes, email addresses, something that blocks malicious sql injection attempts

Can use an online interativd regex calculator

June 9, 2018 - 09:56

Lincoln Morgan

Play regex crossword!

June 9, 2018 - 09:58

Jason Morris

At least read the part in "Mastering Regular Expressions" about the Regex engine. It really is imperative that you know how that works if you want to create effective regular expressions.

June 9, 2018 - 09:58

Mason Price

Just look for "Engine" in the Index.

June 9, 2018 - 10:06

Justin Carter

>something that blocks malicious sql injection attempts
Nope. You shouldn't do it like that. This is a terrible idea on every level.

June 9, 2018 - 10:10

Jason Barnes

Just don't waste your time on this. Regex is some extreme shit that if you are unlucky enough comes up about all couple of months. And most of the time it's extreme basic shit. If you are extremely unlucky, maybe once a year a mildly difficult Regex will block you way. But you can easily get around it by googling and try and error in a couple of hours max.
What's the point in really learning it? If you don't use it everyday, you will forget everything in a few weeks. And desu. if the Regex gets way too complicated it would be best to solve it another way. Shit is absolutely not maintainable.

June 9, 2018 - 10:10

Alexander Cooper

>whats the point in learning anything

June 9, 2018 - 10:14

Luis Bailey

there are so many things you can / should learn and Regex and XSLT are certainly not it.

June 9, 2018 - 10:23

Elijah Richardson

Regex are just beautiful

June 9, 2018 - 10:25

Alexander Cook

[], ., *, +, ^, {}, $, ^ and | are enough for 99% of use cases. There's no need for more than this.

June 9, 2018 - 10:28

Thomas Jones

Why are you posting here. OP wants to learn Regex and you're just trying to dissuade anyone from learning Regex. What's your point? Are you tryin'a prove you're better than OP?

June 9, 2018 - 10:29

John Cooper

Agreed, its so powerful

June 9, 2018 - 10:44

Anthony Roberts

.*? and .+? are very useful tho only available in PCRE

June 9, 2018 - 10:45

Jacob Foster

>email addresses
Extremely hard to match with regex, maybe even impossible.

June 9, 2018 - 10:46

Adrian Brown

You mean IP addresses are hard to match.
Email addresses aren't that hard if you know something about the email addresses themselves. Trying to match every possible email address in existence is a challenge but who would do such a thing.

June 9, 2018 - 10:49

Anthony Peterson

Great learning and testing tool:

regexr.com

June 9, 2018 - 10:51

Lincoln Cooper

>impossible

ex-parrot.com/~pdw/Mail-RFC822-Address.html

June 9, 2018 - 10:53

Luis Barnes

What's wrong with this?
[^@ ]+@[^\. ]+\.[^ ]*\b
Where's the catch?

June 9, 2018 - 10:56

Colton Sanchez

Correction
\b[^@ ]+@[^\. ]+\.[^ ]*\b

June 9, 2018 - 10:57

Levi Johnson

This matches `@;. to give an extreme example.

June 9, 2018 - 11:00

Easton Martinez

What's being negated in the last character class?

June 9, 2018 - 11:01

Nathaniel Cooper

space

June 9, 2018 - 11:02

Christopher Wood

The space character, which is obsolete by the word boundary /b I added after that
Oh, one sec

June 9, 2018 - 11:02

Alexander Lee

Start with finite state automata. If you understand those, you only need to know the syntax to know regex. If you need to write a very complicated regex, draw the schematic of the FSA, then translate that to the regex syntax you need.

June 9, 2018 - 11:06

Grayson Kelly

Looks like email addresses have a whole lot of exceptions, implementing all of these would indeed be long as fuck

Attached: ss (2018-06-09 at 13.05.26).png (684x366, 30K)

June 9, 2018 - 11:06

Jacob Myers

The last character class has * as the quantifier but it would be cleaner if it had {2,3} or so.

June 9, 2018 - 11:06

Luke Mitchell

The Wikipedia page tells you all there is to know.

June 9, 2018 - 11:07

Owen Rodriguez

Daily reminder if you don't know what catastrophic backtracking is you know fuck all about regex.

June 9, 2018 - 11:08

Aiden Peterson

You don't know about the .pharmaceuticalindustry TLD?

June 9, 2018 - 11:08

Asher Miller

I changed it to +
Problem is, with all those new domain endings that recently got introducted, like ".google" in diversity.google/ for example, I'm not sure what the longest and shortest one are.
So So the + quantifier is what I'd go with.

June 9, 2018 - 11:09

Nicholas Lopez

Wow. I didn't know about that.

June 9, 2018 - 11:10

Jackson Wright

And that's only the part before the @. I assume the rules for after the @ are the rules for internet domain names?

June 9, 2018 - 11:13

Noah Davis

Why would you need a book for something that can be learned in one evening. In a couple of minutes for simple stuff, even.

June 9, 2018 - 11:15

Jose Diaz

>learned in one evening

sure if you are just matching digits and characters from a simple string lol

June 9, 2018 - 11:18

Asher Myers

Modern top level domains can be much longer than 3 characters.

That's the problem with matching e-mail addresses: a lot is technically allowed even if it's obviously not a real e-mail address.
And even if it looks like a legitimate e-mail address there is no way of being sure it exists until you send an e-mail and you get a reply.

But to serve as an input validator to stop people from accidentally entering their username or telephone number or whatever it's probably enough to just use /.+@.+/

June 9, 2018 - 11:18

Ayden Wright

That's what regexps are used for. I only needed a serious reading when I was learning balancing groups(which I ended up never using in real world).

June 9, 2018 - 11:19

Anthony Rodriguez

>I want to master regular expressions
Is there a more boring pursuit in existence?

June 9, 2018 - 11:20

Gabriel Powell

The whole specification can be written on a single page.

Sure it takes some practice and intelligence, but you don't need a book to learn it.
Just like you don't need books to learn chess - simply read the rules of the game and play.

June 9, 2018 - 11:21

Bentley Cox

"op".search(/faggot/g);

June 9, 2018 - 11:22

Jackson Green

It is a very useful skill.

Just being able to do regex find-and-replace in a text editor lets you manipulate data in a matter of seconds that takes normies hours of manual labor.

June 9, 2018 - 11:24

Owen Campbell

>posts on Jow Forums

June 9, 2018 - 11:26

Nolan Collins

.t elo 950.

June 9, 2018 - 11:29

Logan Edwards

why?

June 9, 2018 - 11:29

Ian James

What kind of brainlet needs a fucking book to learn regexes?

June 9, 2018 - 11:46

Jaxson Clark

regexcrossword.com

June 9, 2018 - 11:49

Nathan Butler

Not him, but you shouldn't build SQL queries containing input values yourself to begin with.
Use a function that automatically escapes all input variables.

But if you absolutely must build the queries yourself, then always escape the
entire inputs so they can't possibly be executed.

June 9, 2018 - 11:54

Isaac Roberts

All I'm saying is that it's a waste of time to try to master it. Sure you should know a little what you can do and try to understand the more advanced things like groups and backreferences. But reading a book about it, unless your are writing a regex matcher yourself or really need this day to day, is absolutely ridiculous. There is certainly some other programming concept that is way more important that you haven't mastered yet. However, I'm just giving advice here, everybody is free to choose their own suffering.

June 9, 2018 - 12:17

Jacob Fisher

>Mastering Regular Expressions O'Reilly
shop.oreilly.com/product/9780596514273.dohere ya go buddy

June 9, 2018 - 12:25

David Clark

The only thing this thread is really missing is a book for theory. You need to know what a finite automaton is and what a push down automaton is. Further it is helpful to be able to use the pumping lemma to figure out whether the pattern you want is actually regular. Usually the setup for a pumping lemma proof is enough to get you unstuck for the trickier regexes.

Unfortunately I don't have any experience with books on finite automata and theory of computation that I thought were good.

June 9, 2018 - 14:52

Josiah Davis

While using regex as a safety mechanism against SQL injections or as an e-mail address validator would be a bad idea, I think these exercises are actually representative of the kind of bodges you would use regex for. Want to throw together a shitty shell script that scrapes a website for e-mail addresses? Regex. Want to see if your users are trying to SQLi your (otherwise properly secured) website? Regex.

June 9, 2018 - 15:12

Elijah Butler

I think your correct it shouldn't be done as a safety mechanism, but I don't think you're correct in it being a bad idea.
You touch on it in your final point that it can be used to keep garbage from hitting your sanitization stuff, and that's a good idea.
Basically it's a quick and sane first step, but some applications of it might be limited or impossible.

June 9, 2018 - 15:26

Josiah Baker

Use procmail.
All rules are defined by regex.
It's a practical way to learn the after getting the gist of the grammar.

June 9, 2018 - 15:45

Ryder Brooks

Regexs sucks, PCRE particularly.

June 9, 2018 - 15:48

Joseph Hill

>You need to know what a finite automaton is and what a push down automaton is

You don't need to know any of this. You sound like one of those fart huffing spergs who recommends books on discrete mathematics when someone asks how to learn javascript.

June 9, 2018 - 15:51

Camden Torres

Thou shalt not attempt to parse html with regular expressions. Theres 50 better ways.

June 9, 2018 - 15:52

Jaxson Johnson

ya nigger fuck outta here with ur learning hating ass

June 9, 2018 - 15:55

Julian Green

What's a better way to do it without using extra dependencies that might not already be pre-installed in a normie user (non-dev) environment?
Yes, it's bad. But humor me.

June 9, 2018 - 16:01

Logan Campbell

50 betters ways

June 9, 2018 - 16:24

John Nguyen

>You mean IP addresses are hard to match.

What?

June 9, 2018 - 16:33

Jeremiah Brooks

Maybe he meant ipv6? But even then...

June 9, 2018 - 16:35

Aiden Collins

My recommendation is, don't try to master regex. Whenever you feel the need regex would be useful try to use it.

June 9, 2018 - 16:37

Matthew Parker

So why can't emails be validated by regex again? Yes, I've read the RFC, and while it's true that the full spec is much weirder than what most websites even allow as an account, I don't see anything that would make it non-regular. Yes, you'd need a complex regex, but I don't understand what reason there is for that 200-line perl regex which matches "most" emails to exist. What am I missing?

June 9, 2018 - 16:46

Chase Jenkins

You'd have an easier time whitelisting domains

June 9, 2018 - 16:59

Ethan Nguyen

Name three that do not require additional packages etc.

June 9, 2018 - 17:03

Tyler Carter

github.com/sparklemotion/nokogiri

June 9, 2018 - 17:05

Julian Russell

Uh, you look up the special characters and then use them? It's not a difficult thing, you can learn all of it in under an hour.

June 9, 2018 - 17:06

Aaron Reed

>reading comprehension
Can Americans suicide themselves faster please?

June 9, 2018 - 17:10

Jackson Torres

So you need this AND ruby to parse html, nice.

June 9, 2018 - 17:14

Evan Fisher

>only 1 op
>global search
why?

June 9, 2018 - 17:15

Aiden Martinez

Learn the basic special characters and what they mean.

Type "man regex" into google or the command line of a unix-like os. It's a basic primer that won't give you theory or more complicated examples, but it's how I learned.

Use a visualizer like regexper.com to see the state machine of what the regex will match.

Be aware that different implementations have quirks, but you should be safe learning what is called "extended" regular expressions. It's almost universally supported.

Be aware of their limits. They can not describe contextual languages in general. That means anything with recursive structure like xml is off the table without lots of additional assumptions.

June 9, 2018 - 17:31

Elijah Ward

I see you've read that stackoverflow thread, too bad you were too lazy scroll down.

Attached: file.png (1150x278, 79K)

June 9, 2018 - 18:04

Jackson Hill

regex101.com/

June 9, 2018 - 18:05

Joshua Rivera

It's easy to match four numbers with three digits at most separated by dots with regex, but the numbers in IPs only go up to 255 and regex can't do shit like "match numbers less than x" afaik, so you'd have to deal with a few separate cases. Not really hard, but ugly.

June 9, 2018 - 18:16

Caleb Scott

To be fair, it works or it doesn't. If you're picking it out properly with your regex you've done your part, blame the source for providing a bogus IP.

June 9, 2018 - 18:23

Nolan Wood

Email is impossible to due to stupid comments being included in spec but nobody uses those.

June 9, 2018 - 18:26

James Martinez

Depends on whether you're matching IP addresses from a text file or validating them. The original poster talked about the latter.

June 9, 2018 - 18:28

Samuel Wood

Am I close

Attached: firefox_2018-06-09_14-43-30.png (469x278, 9K)

June 9, 2018 - 18:43

Julian Rogers

What does that have to do with anything in my post

June 9, 2018 - 18:54

Ryder Gomez

Automata are easy and make regexes a breeze to visualize. If you only need the simplest of simple regex, you might as well forget regex altogether and write your own little parser.

June 9, 2018 - 18:54

Jaxson Rodriguez

import htmlparser; and traversing the tree is just as fast.

June 9, 2018 - 18:56

Cooper Turner

>htmlparser dependency
What language is this?

June 9, 2018 - 18:57

Camden Lopez

Regex is disgusting symbol vomit

June 9, 2018 - 19:00

Liam Bennett

.? means it will match a set of IP-like numbers with no separator.

Also \d?\d is more concise than \d{1,2} but it's a matter of preference

June 9, 2018 - 19:00

Camden Roberts

Whether you call it BeautifulSoup (python) or jsoup (Java) or mshtml (C#) doesn't matter.

June 9, 2018 - 19:00

Jonathan Wood

>.? means it will match a set of IP-like numbers with no separator.
I don't follow

Attached: firefox_2018-06-09_15-04-35.png (492x382, 10K)

June 9, 2018 - 19:05

Brandon Young

Oh wait no you need a word boundary, right. My bad

June 9, 2018 - 19:21

Brody Cook

>I want to master regular expressions.

why? there's a threshold between making a regexp (also what engine? lol) and just making your own tokenizer/parser.

I'd only use regexp for basic bitch shit or for tokenizing. maybe if the grammar is simple enough you could use named match groups to punt a basic bitch parser, but why?

June 9, 2018 - 19:25

Dominic Powell

>You mean IP addresses are hard to match.

IPv4? yes
IPv6? I haven't seen a valid one yet and personally just wrote my own parser for it. the main thing that's a nightmare is shit like the zero compressor and omission of zero fields like in ::1

June 9, 2018 - 19:39

Levi Watson

>IPv4? yes

meant to say no. it's pretty trivial to write even in weak ones like BRE/ERE

if your Regexp engine is basically ES2018+, C# or Perl tier you could probably write an unreadable mess for IPv6, but why? if you can't match with ERE, you're probably going to far anyhow.

June 9, 2018 - 19:41

Jeremiah Price

Write syntax highlighting files for GNU Nano or some other editor with regex-based highlighting. Start by reading and modifying the existing highlight patterns, using man 7 regex as reference. That's how I learned regex.

June 9, 2018 - 20:08

Logan Hernandez

This. Anything beyond basic groups is a waste of times.

June 9, 2018 - 20:16

Alexander Roberts

Anything beyond that should use a proper handwritten recursive descending parser anyway.

June 9, 2018 - 20:21

Jayden Jones

That'S actually pretty good advice.

June 9, 2018 - 20:23

Hunter Garcia

Anyone got some sweet torrents for the books?

June 9, 2018 - 20:24

Wyatt Morris

Use your fucking brain, and Google as well

June 9, 2018 - 20:26

Cooper Brown

honestly it's not that hard

my problem is I use it like twice a year and forget it every time

June 9, 2018 - 20:28

1 2 ... 10 Next

I want to master regular expressions. What are the best books I should read in your opinion Jow Forums ?

Last threads