The QUERY function is in a category all on its own. It’s an extremely powerful function that will let you filter, sort, group, pivot, basically extract data from a table and present it in numerous ways. At first it can look daunting, with its own language and syntax, but once you dip your toe into the QUERY pool you’ll realise that things are not so complicated and that with just one function, you can extract and analyse your data with ease.
As always the best way to learn how to use it is through examples, and in this post we’re going to use two main sources of data, some questionnaire feedback, and some data from a HR department, building the complexity up step by step.
Analysing questionnaire feedback
Here we’ve used a Google Form to collect feedback on the teachers, the classrooms, and admin information at the end of every course in an academy. Each row is a student’s piece of feedback. They grade the various criteria from 1 to 5, 5 being ‘excellent’. Below is a snippet of that table of data.
In the next few examples, we’ll see how easy it is to analyse this data, each time with just one QUERY function.
Example 1a – Selecting the relevant data from the master data
The head of studies wants to look at the feedback for her teachers, and she doesn’t need to know the classroom feedback or the admin feedback. So, the info she needs is from column A to H, as shown below:
In cell A1 on a different page, I’ve written the following QUERY function:
There are 2 main parts to a QUERY function, 1) the data range, 2) the query
So, in the first part (in orange) we look at the page called “Questionnaire” and range A1 to column N (note this is an open-ended range as we will receive more entries in the future).
In the second part we tell the function what to select. So, in this example, we want columns A to H. We add the column letters followed by commas. The query part always needs to be within speech marks, so we put it before select and at the end before the bracket.
“select” is one of the keywords within the QUERY language which tells the function what to do. Here are some of the other ones, most of which we will see in the following examples.
Example 2a – Filter by a teacher’s name
Now the head has decided that he wants to look at the feedback of a particular teacher. She wants the following information:
Here’s the QUERY formula I entered in cell A2:
It’s the same as before except that at the end I’ve stated a condition:
This looks at column C and returns anything that matches “Fred”. I.e. it only returns the feedback for Fred.
Example 2b – Filter by a teacher’s name using a cell reference
The head has decided that she doesn’t want to have to change the teacher’s name within the formula every time she wants to look at the feedback of a different teacher, she wants to enter the teacher’s name in cell B1 and wants the formula to update accordingly.
That’s no problem, although the syntax looks a little ugly. Here’s the formula:
At the end, instead of the name “Fred” I’ve put the reference to the cell. In QUERY function you have to use this syntax: ‘”&B1&”‘ basically so it knows it’s a cell reference.
As you can see it produces the same information as before and now if the head wants to see another teacher’s feedback she only needs to change the name in cell B1.
Example 2c – Filter by a teacher’s name and sort the date in descending order
By default, the data is sorted from the oldest date to the most recent, but when there is a lot of data this means that to see the most recent and probably most relevant data, the person has to scroll down. We can remedy that easily by sorted the data by date in descending order, as we can see below:
To achieve this, I’ve used the same formula as before except at the end I’ve added an ORDER BY part:
This orders (or sorts) column A (the dates) and the ‘desc’ tells it to do it in descending order. If you want to tell it to do it in ascending order, write asc.
Carefully note the syntax, as one tiny error will stop this from working.
Also note that, the QUERY results aren’t formatted and the column widths aren’t automatically adjusted. This needs to be done manually either beforehand, or afterwards.
Example 3a – Filter the data between 2 dates
The head also wants to be able to filter the data for a particular period of time, e.g. september 16. As you can see below, the data has been filtered between 1/9/2016 and 30/9/2016.
Here’s the formula I’ve added in cell A2:
The new part is at the end. First, we tell it to look in column A for a date bigger than or equal to the date in inverted commas. Make sure you add the word date, to tell the function that you’re looking for a date and not some text.
Then in the next part, we add ‘and’ to tell it to look for 2 criteria. Then tell it to look for a date less than or equal to the date in inverted commas.
Note that with dates you need to write the date in the following format:
Even if the date format in your sheet is different as it is in my one.
So to summarise, it gets columns A to H, and returns rows that meet the 2 criteria, i.e. 1/9/2016 to 30/9/2016.
Example 3b – Filter between 2 dates using cell references
As we saw earlier we can replace the actual dates in the formula with cell references.
The only thing is that to do this we need quite a complex looking formula. Here’s the formula I’ve added in cell A2:
Here’s the formula broken down into its component parts:
To replace the actual date we need to use the following formula after the word ‘date’:
This gets the date in cell D1, puts it into the correct format. The same goes for the second date:
If anyone reading this knows of simplier syntax to do the same thing, I’d love to hear it, as I couldn’t find on-line a better way to do this, but I’m open all ears!
Example 3c – Filter between 2 dates and by teacher
Finally, the head wants to filter the feedback between the two dates, by a specific teacher, and order the feedback by date in descending order, as you can see below:
As you can see the formula is getting pretty long, but you can also see that it’s made up of parts and you can extract what you want by adding extra parts.
Here’s the query part broken down:
It gets columns A to H.
Finds rows where the date is greater than or equal to the one in cell D1.
And that also is less than or equal to the one in cell F1.
AND where the name in column C is the same as the name in cell B1.
Then order the results by the dates in column A in descending order.
Example 4 – Filter against various criteria
Here the admin manager wants to use the questionnaire feedback to see how good the information is that is given to the students when they sign up and how well the service was in the office. He particularly wants to know if there was any low feedback in any of the areas under his control, so wants to know if the course info, payment info, or office service was rated less than 3 by anyone.
To create the table above I’ve added the following formula in cell A1:
Here I’ve selected 5 columns and notice that I’ve put column B at the end. This shows that when selecting the columns, you don’t have to have them in the same order as the original data. This is extremely useful at times.
In the second part, I set the criteria, i.e. he’s looking for values which are less than 3 in each of the columns, L, M, and N. To include 3 different criteria, I’ve used the ‘or’ keyword, so that it will return rows if any one of them have a value of less than 3 in it.
As we can see in the table above, it’s found 3 results where the course information was rated poorly, and we can see from the level that it was related to level B1, so clearly some work is need there. Plus, there is one incident where the service in the office was rated poorly, which may need investigating.
Example 5a – Returning the averages of data
The director of the school wants to know if there are any classrooms that are rated more poorly than others. He wants to see if the average rating is different for any of the classrooms. In the table below, we can see that clearly, there is a problem with class A1, as it is rated poorly and much lower than the others.
To do this, I’ve added the following formula in cell A1:
This time I’m interested in the classrooms in column I and the average of the scores given in column J. First, I select column I, then select the average of column J, then I group them by classroom, in other words by column I.
We can return the average, count the number of entries, return a maximum or minimum in that column, or sum up the entries, using the following syntax:
Quite often these work with the ‘group by’ keyword, to be able to return the results.
Note, there is an empty row in row 2, as the data range is looking below the original data and into empty rows, and it will return one. This can be eliminated by stating the exact range of your data as we will see next, but the downside is that if more data is added the range will have to be updated.
Example 5b – Returning the averages of data and ordering them
Following on from the example above, we can adjust our returned information by sorting the feedback by the lowest to the highest, i.e. in ascending order. Here we can see class A1 is the lowest rated.
Here’s the formula I wrote in cell A1:
The first part is as before, then it’s followed by:
This orders the results by the average of column J (in column B), in ascending order.
Example 6 – Pivot information using QUERY not pivot tables
To finish off this first part, let’s look how we can pivot the information to see the averages of 2 criteria for each teacher. We want to look at the “is clear” and “is organised” categories. If you’re familiar with pivot tables, this works in a similar way, but with the bonus of doing everything right within the QUERY function.
Here the data has extracted the information below. It looks like Fred’s class organisation may need improving a little.
To get this output, I’ve written the following formula in cell A1:
This selects the average of column D (“is clear”) and the average of column E (“is organised”) as the criteria, then pivots it by teacher (column C), so that we see an average of each criteria for each teacher.
We could look at every criteria per teacher, just by adding the average for each criteria column, e.g. avg(F), avg(G), etc.
Analysing a HR database
OK let’s look at a different set of data now. Here we have employee database with some information about them.
Example 7 – Returning average salaries per department
The HR director wants to know what the average salary is per department from the data above. Here’s the end result:
In cell A1, I’ve written the following formula:
This selects the departments (column B), and the average of the salaries (column C) grouped by department.
Example 8a – Listing salaries per employee in descending order
Here he wants to see the salaries per person in descending order, without any of the other information.
In cell A1, I’ve written the following formula:
This selects columns A and C, and sorts column C in descending order.
Example 8b – Limiting the number of results
The HR director actually only wants to see the 5 highest salaries. We can use the formula and add a limit to it, to show the following:
Here’s the formula:
In the last part I’ve added ‘limit 5’. This returns the first 5 rows.
Example 9 – Ordering by more than 1 criteria
Let’s now look at how we can order our results by 2 or more criteria. Here the HR guy wants to see the employee names, their departments and salaries. He wants the data organised by department then by salary, with the salaries going from highest to lowest.
To do this, I’ve written the following formula:
This selects columns A, B, and C (employee, department, and salary), orders first by department (B), then by salary (C). Note the syntax, after ‘order by’ you just add the first column letter, then the second one after a comma.
Example 10 – Relabelling column headers
Finally, let’s look at how we can rename the column headers to something different from the original data. This can be useful, if the original data is from a computer output and the column headers aren’t in everyday English, or you may simply want to change them.
Here I’ve changed the column “Employee” to “Name” using the QUERY function.
To do this, yes you’ve guessed it, I’ve added the following formula in cell A1:
The new part is at the end, (label A ‘Name’). This tells it to rename column A with the word ‘Name’. To add more labels, just add a comma and the column letter and new name.
A couple of final comments about QUERY. Be careful where you place your QUERY function, as you need to make sure that there is nothing in the cells particularly below it, as otherwise it’ll throw an error.
The syntax is very exact, so make sure you notice in the examples, how the punctuation is used.
If you want to play around with the data in this post, here’s a link to the sheet, which will prompt you to make a copy of it:
Despite this being a long post, I’ve only scratched the surface as to what QUERY is capable of. To learn more go to Google’s page on the query language:
eBooks available on Drive, Forms, Sheets, Docs, Slides, and Sheet Functions:
- “Beginner’s Guide to Google Drive” – iBooks store / Kindle store
- “Beginner’s Guide to Google Forms” – iBooks store / Kindle store
- “Beginner’s Guide to Google Sheets” – iBooks store / Kindle store
- “Beginner’s Guide to Google Docs” – iBooks Store / Kindle store
- “Beginner’s Guide to Google Slides” – iBooks Store / Kindle store
- “Google Sheet Functions – A step-by-step guide” – iBooks Store / Kindle Store