4.3 Regular Expressions - Video Tutorials & Practice Problems
Video duration:
13m
Play a video:
<v Instructor>One of the more useful</v> built-in JavaScript objects is RegExp, R-E-G-E-X-P, which stands for regular expression. Regular expressions are a very powerful mini language for matching patterns in text. So, RegExp is one common way to shorten it. It's also called RegEx or RegEx. I actually do tend to pronounce it RegEx, even though the G in regular is hard. I actually did a poll on this on Twitter and slightly more people said RegEx than RegEx. So, either way is correct. There's a famous saying about regular expressions. It goes like this, some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems. So this somewhat wry observation is a commentary on how complicated regular expressions can be. But luckily, nowadays, there are really great resources for building up regular expressions incrementally online. Let's take a quick look at that. This is Regex101. And I'm going to expand it. I just wanted to start with this sized window. But if you expand it, you can see extra things will pop into view. Here we go. So this is a regular expression builder. And you can even see, over here on the left, it says Flavor. There is some variation from language to language. So we're going to pick JavaScript just to make sure everything is synced up. Most RegEx builders also have a nice quick reference like this. This is really useful for learning regular expressions or for reminding yourself of things you already knew. Let's take a look at using the RegEx object directly. This is actually not the best way to do things, but it's an easier place to start. As usual, we'll work in Node. And I'm gonna define a variable called zip code that matches a standard five digit American zip code. So let's say let zipCode. Remember camel case. And as with date, we're gonna say new, and then the name of the object, RegExp. And then we're gonna give it a string that represents the regular expression. So if we look down here in the Quick Reference, we're looking for something that says digit. Actually, we can, well, here it is. But we can even search here. It's really useful. So \d matches any digit. And we can also match a certain number of things, like this. So this is exactly three of a, where a is a pattern. So that means we can do something like this, \d, and then if we want to match five digits in a row, we can say 5 in curly braces like this. So here's a famous zip code. You might recognize this from a famous TV show called "Beverly Hills, 90210." Here's another one. You can see it matches. And actually, usually, zip is capitalized. It's all caps. So, you can see here, the highlight indicates that it's matching, and you can also see over here the match information. So we can try this in Node. Let's try the obvious thing, which is to do the exact same RegEx here. \d and 5. And now the way you can try to match it on a string is to call the exec function on this regular expression. So let's give it a string with a valid zip code. So what's the result? Hmm, null. So what happened there is that the obvious thing actually is wrong. This \d actually needs to be escaped out with a second back slash. So remember, we have to get rid of the let or else we'll get an error. So let's redefine this with a second back slash. This is one of several reasons why I don't like this way of doing things. Aha. That looks good. In fact, actually, let's go back here. Let's just do it the wrong way and see what the result is. You can see, aha, it just says d5. So, with the back slash escaping out the other back slash, we get this here, \d5. And then, we can do this again. Aha. Now we've got a match. This is a little confusing though. Like, what does this mean? And there's a match here, but index, input, it's a lot of stuff. And it's really weird, if you look at the length, especially. What is this object? Well, it's got length 1. It looks like an array, or something, maybe, with a couple of keys and values. It's really strange. So we see that this result object from running the RegExp.exec, it's a little confusing. And my preferred way to do RegEx matching in JavaScript is to use an associated string method called match. In order to get around this restriction that you have to back slash out the back slash in the regular expression, I also want to introduce a literal constructor. Remember, we saw literal constructors for strings, with single quotes or double quotes, and arrays using square brackets. Well, you can define a regular expression using slashes like this. / and then \d. Just one back slash, like that. We actually saw this as the result of our call to new RegExp with the string. And indeed, this is the most convenient way to define RegExs directly. All right, now let's make a longer string with a couple of zip codes. It'll be more convenient for our matching. Let's call it s. Feel free to copy this from the text. We'll do this. And then I'm going to use the += operator. See if you can figure out what this does. This is a good technical sophistication. I'm gonna say += space 91125, like that. So you can see, what this did is it just added this second string to the first. So this is equivalent to s equal s plus, then this second string. You can also see we've got this escape out here, because of the single quote inside of other single quotes. It's actually an apostrophe, but you need to be back slashed. So let's take a look at this here. I'm gonna copy this. We don't really need to do this, but I'm just gonna get rid of it. Aha. You can see now we've got two matches. So this is what we want to do in JavaScript. We want, somehow, to extract those matches from the string. We can do that like this, s.match and then the RegEx. So this is also complicated. This isn't actually an improvement, but we'll see in a minute why it's better. But let's take a look at this first. One of the cool things about this is that we can use it in a Boolean context for things like an if statement. So if we do this, as you might expect, remember, bang bang gives us the the Boolean. As you might expect, this is true. It's not empty, or zero, or anything like that. So we can do something like this. Could have a test like this, and then, oops, what happened there? Ah, I missed a closing parenthesis. Let's get out of that. Control+C. Here we go. And I put in two spaces there, I don't really need to do that. There we go. So if it matches, if s.match zipCode, looks like there's at least one zip code. But the best way to do this is to add a little extra piece to the zip code RegEx. After this closing slash, we can add in a g. In fact, let's see if it's over here. Yeah, you can see here, here a couple of different flags. G stands for global, which means, essentially, find them all. So let's take a look at this. (indistinct) And now, s.match. Ah, that's nice. It's an array of all of the matching zip codes. So now, this is the kind of thing that you could iterate through, and say, "Here are all the zip codes in the string you gave me." Very convenient. There's another really useful string method that can take a RegEx, which is one we've already seen before. It's split. So let's review that briefly. Remember, we can split a string like this on a space. But a really common thing when splitting a string is having a situation where you don't really care what's separating them, as long as it's whitespace. As long as it's a space, or a tab, or a new line, or something like that. So, for example, we had a couple spaces here and a tab, which is \t. Maybe a few more spaces. Maybe even a new line, \n. Now, that's no good. Look at that. So what we want is the same result, this ant bat cat duck, but where it's splitting on whitespace rather than just a space like this. Sounds like a job for regular expressions. So let's look down here in our Quick Reference. You can even search whitespace. I can just even type white, whitespace. Ah, any whitespace character, \s. That's pretty cool. So what we want is to split on one or more examples of a whitespace character. So is there a way to do that? As you might guess, there is. Just going to scroll down. Any single character is a dot. That's not what we want. We already have our thing, whitespace. Ah, zero or more of a, so if we wanted to match zero or more whitespace characters, we could do \s*. But what we want is at least one. So one or more, so that's a+. So \s+ will match one or more whitespace characters. So if we split on that, let's take a look. So our slashes give us the literal constructor, \s+/. Ah-ha-ha. Look at that. So this is so useful that my normal way of splitting on a string, even if I think it's probably just spaces, is to split on whitespace, because it's a much more robust way of splitting. If someone adds in a stray space by accident, or through some weird conversion, a space becomes a tab, this will be robust in the face of those sorts of changes. And indeed, this is so useful that in some languages, notably Ruby, it's actually the default. If you just say a string.split on nothing, like this, well, in JavaScript, it just gives you back the string. But in Ruby, this actually splits on whitespace by default. But in JavaScript, we've got this. This is the more common behavior. Most languages require some sort of explicit splitting on whitespace. But this is a really convenient and common way to split. And throughout the rest of this tutorial, unless there's a good reason not to, I'll generally split a string like this on whitespace. All right, well there's lots more to learn about regular expressions. There are interactive tutorials online. You can play around with a RegEx builder like this, like Regex101. But this is a good start. And you know the two most important things. You know, number one, that you can learn about regular expressions using something like this, like Regex101, and you know the basics of how to use RegExs in JavaScript.