Wish to handle whitespaces in the most effective way? Use regular expressions to remove all spaces in a cell, replace multiple spaces with a single character, trim spaces between numbers only, and more.
Whichever input data you are using, you'll hardly encounter a dataset without spaces. In most cases, whitespace is good - you use it to visually separate different pieces of information to make it easier to perceive. In some situations, however, it may become evil - extra spaces can mess up your formulas and make your worksheets almost unmanageable.
Why use regular expression to trim whitespaces in Excel?
Before we dive into the nitty-gritty of using regular expressions to remove whitespaces in Excel worksheets, I'd like to address the question that comes to mind in the first place - why do we need regexes when Excel already has the TRIM function?
To understand the difference, let's see what is considered whitespace in each case:
- The built-in TRIM function can only strip the space character that has value 32 in the 7-bit ASCII system.
- Regular expressions can identify a few different forms of whitespace such as the space ( ), tab (\t), carriage return (\r), and new line (\n). Additionally, there is the whitespace character (\s) that matches all these types and comes extremely helpful for cleaning raw input data.
Knowing exactly what happens behind the scenes, it's a lot easier to work out a solution, right?
How to enable regular expressions in Excel
It is a well-known fact that out-of-the-box Excel does not support regular expressions. To enable them, you need to create a custom VBA function. Luckily, we already have one, named RegExpReplace. Wait, why "replace" while we are talking about removing? In the Excel language, "remove" is just another word for "replace with an empty string" :)
To add the function to your Excel, just copy its code from this page, paste it in the VBA editor, and save your file as a macro-enabled workbook (.xlsm).
Here's the function's syntax for your reference:
The first three arguments are required, the last two are optional.
Where:
- Text - the original string to search in.
- Pattern - the regex to search for.
- Replacement - the text to replace with. To remove whitespaces, you'd set this argument to either:
- empty string ("") to trim absolutely all spaces
- space character (" ") to replace multiple spaces with a single space character
- Instance_num (optional) - the instance number. In most cases, you'll omit it to replace all instances (default).
- Match_case (optional) - a Boolean value indicating whether to match (TRUE) or ignore (FALSE) text case. For whitespace, it is irrelevant and therefore omitted.
For more information, please see RegExpReplace function.
How to remove whitespace with regex - examples
With the RegExpReplace function added to your workbook, let's tackle different scenarios one at a time.
Remove all whitespaces using regex
To remove all spaces in a string, you simply search for any whitespace character, including a space, a tab, a carriage return, and a line feed, and replace them with an empty string ("").
Pattern: \s+
Replacement: ""
Assuming the source string is in A5, the formula in B5 is:
=RegExpReplace(A5, "\s+", "")
To make it easier to manage your patterns, you can input the regex in a predefined cell and supply it to the formula using an absolute reference like $A$2, so the cell address will remain unchanged when copying the formula down the column.
=RegExpReplace(A5, $A$2, "")
Remove more than one whitespace
To remove extra whitespace (i.e. more than one consecutive spaces), use the same regex \s+ but replace the found matches with a single space character.
Pattern: \s+
Replacement: " "
=RegExpReplace(A5, "\s+", " ")
Please pay attention that this formula keeps one space character not only between words but also at the beginning and end of a string, which is not good. To get rid of leading and trailing whitespace, nest the above formula into another RegExpReplace function that strips spaces from the beginning and end:
=RegExpReplace(RegExpReplace(A5, "\s+", " "), "^[\s]+|[\s]+$", "")
Regex to remove leading and trailing whitespace
To search for whitespace at the beginning or end of a line, use the start ^ and end $ anchors.
Leading whitespace:
Pattern: ^[\s]+
Trailing whitespace:
Pattern: [\s]+$
Leading and trailing whitespace:
Pattern: ^[\s]+|[\s]+$
Whichever regex you choose, replace the matches with nothing.
Replacement: ""
For example, to eliminate all spaces at the beginning and at the end of a string in A5, the formula is:
=RegExpReplace(A5, "^[\s]+|[\s]+$", "")
As shown in the screenshot below, this only removes leading and trailing whitespace. Spaces between words remain intact creating a visually pleasing view for the reader's eye.
Remove extra whitespace but keep line breaks
When working with multi-line strings, you may wish to get rid of extra spaces but preserve line breaks. To have this done, instead of the whitespace character \s, search for spaces [ ] or spaces and tabs [\t ]. The latter pattern comes in handy when your source data is imported from another source, e.g. a text editor.
In the below dataset, suppose you wish to trim all leading/trailing spaces and all but one in-between spaces, keeping multiple lines intact. To fulfil the task, you'll need two different RegExpReplace functions.
The first function replaces multiple spaces with a single space character.
=RegExpReplace(A5, " +", " ")
The other one strips spaces from the beginning and end of a line:
=RegExpReplace(A5, "^ +| +$", "")
Just nest the two functions one into another:
=RegExpReplace(RegExpReplace(A5, " +", " "), "^ +| +$", "")
And you'll get a perfect result:
Regex to replace multiple spaces with one character
In case you want to remove all spaces from a string and replace each group of consecutive spaces with a specific character, this is what you need to do:
First, use this regex to trim leading and trailing whitespaces:
=RegExpReplace(A8, "^[\s]+|[\s]+$", "")
Then, serve the above function to the text argument of another RegExpReplace that replaces one or more consecutive whitespaces with the character you specify, e.g. a hyphen:
Pattern: \s+
Replacement: -
Assuming the source string is in A8, the formula takes this shape:
=RegExpReplace(RegExpReplace(A8, "^[\s]+|[\s]+$", ""), "\s+", "-")
Or you can enter the patterns and replacements in separate cells like shown in the screenshot:
Regex to remove empty lines
Here's a question that users who have multiple lines in one cell frequently ask: "There are a lot of blank lines in my cells. Is there any way to get rid of them other than going through each cell and deleting each line manually?" The answer: That's easy!
To match empty lines that do not have a single character from the start ^ of the current line up to the next line \n, the regex is:
Pattern: ^\n
If your visually blank lines contain spaces or tabs, use this regular expression:
Pattern: ^[\t ]*\n
Just replace the regex with an empty string using this formula, and all blank lines will be gone at once!
=RegExpReplace(A5, $A$2, "")
Removing whitespaces with RegEx Tools
The above examples have demonstrated just a small portion of wonderful possibilities provided by regexes. Unfortunately, not all features of classic regular expressions are available in VBA.
Luckily, the RegEx Tools included with our Ultimate Suite are free of these limitations since they are processed by Microsoft's .NET RegEx engine. This lets you construct more sophisticated patterns that are not supported by VBA RegExp. Below you'll find an example of such regular expression.
Regex to remove space between numbers
In an alphanumeric string, suppose you wish to remove whitespaces between numbers only, so a string such as "A 1 2 B" becomes "A 12 B".
To match a whitespace between any two digits, you can use the following look-arounds:
Pattern: (?<=\d)\s+(?=\d)
To create a formula based on the above regexes, here are two easy steps to perform:
- On the Ablebits Data tab, in the Text group, click Regex Tools.
- On the Regex Tools pane, select the source data, enter your regex, choose the Remove option, and hit Remove.
To get the results as formulas, not values, remember to put a tick in the Insert as a formula check box.
In a moment, you'll see the AblebitsRegexRemove function inserted in a new column to the right of the original data.
Alternatively, you can input the regex in some cell, say A5, and insert the formula directly in a cell using the Insert Function dialog box, where AblebitsRegexRemove is categorized under AblebitsUDFs.
As this function is specially designed for removing strings, it requires only two arguments - the input string and regex:
=AblebitsRegexRemove(A5, $A$2)
That's how to remove spaces in Excel using regular expressions. I thank you for reading and look forward to seeing you on our blog next week!
Available downloads
Remove whitespace with regex - examples (.xlsm file)
Ultimate Suite - trial version (.exe file)
One comment
Good work