Skip to Main Content
| University Libraries
See Updates and FAQs for the latest library services updates. Subject Librarians are available for online appointments, and Virtual Reference has extended hours.

Working with Data

What you need to know for Data Management and Data Wrangling

Regular Expressions are fancy wildcards. Typically abbreviated "regex", they allow you to find / match as well as replace inexact patterns and even special characters (like tabs and line breaks) in text. This is useful in many programming languages, but also for finding-and-replacing in documents. Most programming languages and many text editors, including Microsoft Word (2010+) allow the use of standard regex symbols and syntax. There are some differences in how regular expressions work in different software, but much of the basics are identical. The term "grep" which is used by programmers synonymous with search, stands for "global regular expression print".

Regular Expressions and Wildcards can be used for finding and replacing or removing text in a file, database, or filename. It is very powerful and can even assist in cleaning and reformatting data.

xkcd cartoon about Regular Expressions "saving the day"

Background

 Wildcards (the concept behind RegEx)

Showcase for what RegEx can do

Getting Started

Follow-Along Exercises

Build and Test

  • RegExr - Lots of help, quick reference and a pleasing interface; see capture groups by replacing $& with $1, $2, etc
  • Regular Expressions 101 - Good for programmers; can mimic php, javascript, and python; has a library of scripts

Reference

Advanced

Test Yourself

Tips

  • How to Build a Good RegEx (Sitepoint) - Examples illustrate the process of designing a string match
  • Regex tips (David Birnbaum) - Shows and explains some commonly needed expressions, especially for XSLT

Books

All books are available ONLINE through Mason.

Comprehensive Websites

Implementation in Specific Programs

The vast majority of RegEx symbols are exactly the same across software, so these resources are broadly useful. 

Stata What are regular expressions and how can I use them in Stata?  (STATA)
More examples: How can I extract a portion of a string variable using regular expressions? (UCLA)
R Finding: Regular Expression in R (Gloria Li and Jenny Bryan)
Extracting Extract information from texts with regular expressions in R (Kun Ren)
Python Regular Expression HOW TO and Operations (Python Documentation)
Text Processing in Python by David Mertz -- related: Matching Patterns in Text: The Basics
Notepad++ How to use regular expressions in Notepad++ (NP++ Wiki)
A guide to using regular expressions and extended search mode (Mark Antoniou)
Atom Find and Replace (Atom Manual)
Regular expressions in Atom for code consistency (Modelit)
Office Microsoft: Finding and replacing characters using wildcards (Graham Mayor, MVP)
OpenOffice: Regular Expressions in Writer (OpenOffice Documentation)
Perl Perl Regular Expression (Perl Documentation)
Teach Yourself Perl in 21 Days  by Laura Lemay - excerpt: Pattern Matching
SQL SQL Wildcards (W3 Schools)