The tutorial delves into practical real-life scenarios of using Power Query in Excel.
In the previous article, we laid the groundwork by exploring the basics of Excel Power Query. Now, it’s time to put it to use in real-world scenarios. Below, you will find a number of examples that will guide you through the effective applications of PQ in everyday situations.
The examples assume that you have already imported your source data to the Power Query Editor. If not, you can easily catch up by revisiting our previous tutorial that details how to get data into Power Query.
To make it easy for you to follow along, we've prepared a sample workbook that you can download at the end of this post. Let's start our data journey and see how Power Query works in action!
Trim and clean
To remove any leading or trailing spaces, or any other unwanted characters, you can use the Trim and Clean functions. Select the columns you want to clean up, go to the Transform tab > Text Column group and click Format > Trim or Clean.
Remove duplicate rows
To eliminate duplicate rows in your data, Power Query offers the Remove Duplicates function. Select the column(s) you want to check for duplicates, then go to the Home tab, and click on Remove Rows > Remove Duplicates.
Remove duplicates case-insensitive
Please keep in mind that this standard "Remove Duplicates" operation will eliminate only the rows that are identical in every way, including the letter case. For a case-insensitive deduplication, you need to modify the M code for the query. Here’s how:
- Remove case-sensitive duplicates as explained above. Alternatively, you can select and right-click the column that you want to dedupe, and then choose Remove Duplicates from the context menu.
- In the Formula Bar, add Comparer.OrdinalIgnoreCase as a comparison criterion to the second argument of the Table.Distinct function.
In our case, after doing the standard remove duplicates operation, the Table.Distinct function looked like this:
=Table.Distinct(#"Cleaned Text", {"Full name"})
It successfully removed all the rows with the names in column A that were exactly the same, but it left some entries with variations in letter case, as shown in the screenshot below:
To fix this issue, you can add the Comparer.OrdinalIgnoreCase criterion to the function like this:
=Table.Distinct(#"Cleaned Text", {"Full name", Comparer.OrdinalIgnoreCase})
This will eliminate all the rows containing duplicates in column A, ignoring the letter case.
Users with advanced Excel skills can do this operation in the Advanced Editor by changing the Removed Duplicates line to this format:
Table.Distinct(PreviousStep, {"ColumnName", Comparer.OrdinalIgnoreCase})
Note. Sometimes, you may have to look at more than one column to identify duplicate records. For instance, if a person has different name variations like "Johnson, Bill" and "Johnson, William", you can also check the Address column for duplicates.
Change data type
In case the imported data doesn't look quite right, you can easily convert it into the correct format.
In our sample dataset, the Registration Date column shows both date and time. To display only the date part of the values, you need to change the data type of the column from Date/Time to Date. This can be done in two ways:
- On the Home tab, select Data Type > Date from the ribbon.
- Right-click the column header and choose Change Type > Date.
Custom date format
Power Query applies the default date format from your locale (region settings). To display dates in a custom format, you can use the DateTime.ToText function. Here are the steps to follow:
- On the Add Column tab, in the General group, click Custom Column.
- In the Custom Column dialog window, type a name for the new column in the corresponding box, e.g. "Date in custom format".
- In the Custom Column Formula box, enter the DateTime.ToText function with two arguments: the original date column and the custom format code.
- To add the original date column to the first argument, select it under Available Columns on the right and click Insert, or double click the column name.
- For the second argument, enter the desired date format such as "dd-MMM-yy" or any other format code.
The complete formula takes this form:
=DateTime.ToText([Registration date], "dd-MMM-yy")
- Click OK, and a new column with the custom date format will be added to your table.
The formula bar will show the complete formula in the M language, which will look something like this:
=Table.AddColumn(#"Previous step", "Date in custom format", each DateTime.ToText([Registration date], "dd-MMM-yy"))
Note.
- The output of this function is a text value, not a datetime value.
- For custom date and time formats, you can use the same format codes as in Excel. The only difference is that in Power Query, lowercase "m" is for minutes, and uppercase "M" is for months.
By changing the data type, you can format the values in a more suitable way for your analysis. Similarly, you can change other columns to different data types, such as text, number, or currency, depending on your needs.
Split column
To split a column into two or more columns by a certain delimiter, you can use the Split Column function. For example, to split the "Full Name" column into "First Name" and "Last Name", the steps are:
- Select the "Full Name" column.
- Navigate to the Transform tab and, in the Text Column group, click > Split Column > By Delimiter.
- Pick the delimiter from the dropdown list. If your delimiter is not among the predefined options, choose Custom and enter the desired character(s) in the box below (comma and a space ", " in this example).
- Decide at which occurrence of the delimiter to separate the column: left-most, right-most or each delimiter. If a cell has only one delimiter, any option will do. But if a cell has more than one delimiter, then you have to choose carefully.
- When done, click OK.
- Right-click the header of each new column and choose Rename from the context menu to give them appropriate names (such as "First Name" and "Last Name").
Tip. If you want to preserve the original column, then duplicate it before splitting. To do this, right-click the column and choose Duplicate Column from the context menu. This will create a copy of the column with a (2) suffix in its name. You can then split this column as described above.
Extract values into a new column
If some column in your dataset contains lengthy multi-part strings, you may want to extract certain information into a new column.
For example, let’s see how to extract the country name from the Address column:
- Select the column from which you wish to extract values.
- Navigate to the Add Column tab, click Extract, and choose an appropriate option. In our case, the country names are separated by a comma, so we choose Text After Delimiter.
- In the dialog box that pops up, enter the delimiter (a comma and a space ", " in our dataset).
- Expand the Advanced option section and choose to scan for the delimiter from the end of the input, as the country name comes after the last comma in a cell. If you need to extract a value from the middle of the string, indicate how many delimiters to skip.
- When done, click OK.
A new column with the extracted values will be added to the end of the table, and you can move it to any position you want by dragging the column header.
Add column from example
When dealing with inconsistent or incomplete data, the standard Split and Extract functions might not work as expected.
Picture a scenario where country names within the Address column are separated by various delimiters like commas, spaces, or vertical bars. In such cases, you can rely on Power Query to extract country names based on an example you provide. This is similar to how Excel's Flash Fill feature works.
Here's how to add a new column using an example from existing columns:
- Select the column that contains the source data (the Address column in our example).
- On the Add Column tab, click Column from Examples > From Selection.
- In the first row of the new column, type the country name that corresponds to the first address. Power Query will try to infer the pattern and fill the rest of the values based on your example. If some cells are blank or filled with incorrect values, provide another example in the second row or any other row until Power Query gets it right.
- When all the cells are filled with the correct values, press Ctrl + Enter to apply the changes.
You will now have a new column that extracts the country names from the addresses.
Replace missing values
In Power Query, replacing missing values, often represented as null, is a straightforward process:
- Select the column(s) where you want to handle missing values.
- On the Home tab, in the Transform group, click Replace Values.
- In the Replace Values dialog box, fill in two boxes:
- Value To Find: null
- Replace With: enter the replacement value corresponding to your data type (e.g., "0" for numeric columns or "N/A" for text columns).
- Click OK, and Power Query will apply the replacement to all the selected columns.
Add conditional column
To add a new column based on a set of conditions that use existing columns, make use of the Add Conditional Column feature. For example, to add a column that assigns a donor level based on the donation amount, this is what you need to do:
- Select any column in your dataset.
- On the Add Column tab, click on Conditional Column.
- In the dialog box that opens, type Donor Level as the name of the new column. Then, specify the following rules:
| If Donation | is greater than or equal to | 4000 | then | Platinum |
| If Donation | is greater than or equal to | 3000 | then | Gold |
| If Donation | is greater than or equal to | 2000 | then | Silver |
| Else | Bronze |
- Click OK to create the new column.
This feature is similar to writing a nested IF statement in Excel, but it’s a lot easier and more convenient to use.
By default, the new conditional column will appear at the end of your dataset, and you can drag it to any position you want.
Replace or remove errors
Power Query makes it easy to fix errors in Excel without spending too much time on debugging formulas or VBA code. To eliminate errors in your dataset, follow these simple steps.
- Select the column where you want to handle errors.
- Right-click the column header.
- In the context menu, you'll find two key options for handling errors:
- Remove Errors will delete all the rows containing errors within the selected column(s), so be careful with this option.
- Replace Errors will ask you to specify the replacement value. For numeric columns, it has to be a number, e.g. 0. For text columns, you can specify any text value, including a blank cell.
Group and aggregate
To summarize or aggregate data by specific groups, Power Query offers the powerful Group By function.
For example, to calculate the total donation amount by country and donor level, this is what you need to do:
- On the Home tab, in the Transform group, click on Group By.
- In the dialog box that opens, choose Advanced Then, do the following:
- Under Group by, select "Country" and "Donor Level" as the columns to group by.
- Under New column name, type "Total Donation" as the name of the new column.
- Under Operation, choose Sum as the aggregation function.
- Under Column, choose Donation as the column to aggregate.
- Click OK to apply the changes.
As a result, a new table will be created displaying the grouped and aggregated data. If needed, you can sort the table by one or more columns: right-click the filter arrow next to the column name and choose either to sort ascending or descending.
In this example, we get a summary of total donation amounts based on both country and donor level.
Tip. After making the necessary changes in the Power Query Editor, don't forget to load your results into a worksheet.
That’s how to use Power Query in Excel. Now that you know the basics, go ahead and unlock more data transforming secrets to impress your boss, colleagues, and clients with your data mastery :-)
Practice workbook for download
Using Excel Power Query - examples (.xlsx file)
6 comments
I have an excel table with a list of payment
Ordinary Hours
Overtime 1.50
Overtime 2.00
Annual Leave
Leave Loading
list of payment
list of payment Ordinary Hours Ordinary Paid Overtime 1.50 Hours Overtime 1.50 paid Overtime 2.00 Hours Overtime Paid aamount Leave Hours Leave Paid Leave Loading Hours Leave Loading paid
Ordinary Hours 5 150 0 0 0 0 0 0 0 0
Overtime 1.50 0 0 5 200 0 0 0 0 0 0
Overtime 2.00 0 0 0 0 5 400 0 0 0 0
Annual Leave 0 0 0 0 0 0 5 150 0 0
Leave Loading 0 0 0 0 0 0 0 0 20 60
and for each row I have 2 columns e.g. Ordinary hours and ordinary amount earn then another 2 columns with Overtime 1/5 hours and Overtime amount and so on in total I have 10 columns plus the list above.
Question: How can bring the amount of hours in one column and another column with the amount so I can have only 3 columns in total?
and I want only 2 with all the information. eg:
list of payment Hours Paid
Ordinary Hours 5 150
Overtime 1.50 5 200
Overtime 2.00 5 400
Annual Leave 5 150
Leave Loading 20 60
Hi! If I understand your task correctly, I can recommend the following steps.
Combine the cell values of one row into a text string using the TEXTJOIN function.
Then delete the "0" using the SUBSTITUTE function.
Use the TEXTSPLIT function to split the values by cell.
=TEXTSPLIT(SUBSTITUTE(TEXTJOIN(",",TRUE,A1:K1),",0",""),",")
Copy this formula down the column as far as necessary. I hope it’ll be helpful.
Thanks for the article! Was great to learn about power query.
Great article, Pls give more power query example.
thanks for your great work. i tried to download Practice workbook for download but it give me an error. Greatings
Thank you for letting us know and sorry for this. The download link is fixed.