Google Sheets Functions – QUERY

The QUERY function is in a category all on its own. It’s an extremely powerful function that will let you filter, sort, group, pivot, basically extract data from a table and present it in numerous ways. At first it can look daunting, with its own language and syntax, but once you dip your toe into the QUERY pool you’ll realise that things are not so complicated and that with just one function, you can extract and analyse your data with ease.

As always the best way to learn how to use it is through examples, and in this post we’re going to use two main sources of data, some questionnaire feedback, and some data from a HR department, building the complexity up step by step.


Analysing questionnaire feedback

Here we’ve used a Google Form to collect feedback on the teachers, the classrooms, and admin information at the end of every course in an academy. Each row is a student’s piece of feedback. They grade the various criteria from 1 to 5, 5 being ‘excellent’. Below is a snippet of that table of data.

functions20-1

In the next few examples, we’ll see how easy it is to analyse this data, each time with just one QUERY function.


Example 1a – Selecting the relevant data from the master data

The head of studies wants to look at the feedback for her teachers, and she doesn’t need to know the classroom feedback or the admin feedback. So, the info she needs is from column A to H, as shown below:

functions20-2

In cell A1 on a different page, I’ve written the following QUERY function:

functions20-3

There are 2 main parts to a QUERY function, 1) the data range, 2) the query

So, in the first part (in orange) we look at the page called “Questionnaire” and range A1 to column N (note this is an open-ended range as we will receive more entries in the future).

In the second part we tell the function what to select. So, in this example, we want columns A to H. We add the column letters followed by commas. The query part always needs to be within speech marks, so we put it before select and at the end before the bracket.

“select” is one of the keywords within the QUERY language which tells the function what to do. Here are some of the other ones, most of which we will see in the following examples.

functions20-33


Example 2a – Filter by a teacher’s name

Now the head has decided that he wants to look at the feedback of a particular teacher. She wants the following information:

functions20-4

Here’s the QUERY formula I entered in cell A2:

functions20-5

It’s the same as before except that at the end I’ve stated a condition:

functions20-37

This looks at column C and returns anything that matches “Fred”. I.e. it only returns the feedback for Fred.


Example 2b – Filter by a teacher’s name using a cell reference

The head has decided that she doesn’t want to have to change the teacher’s name within the formula every time she wants to look at the feedback of a different teacher, she wants to enter the teacher’s name in cell B1 and wants the formula to update accordingly.

That’s no problem, although the syntax looks a little ugly. Here’s the formula:

functions20-6

At the end, instead of the name “Fred” I’ve put the reference to the cell. In QUERY function you have to use this syntax: ‘”&B1&”‘ basically so it knows it’s a cell reference.

functions20-38

As you can see it produces the same information as before and now if the head wants to see another teacher’s feedback she only needs to change the name in cell B1.

functions20-4


Example 2c – Filter by a teacher’s name and sort the date in descending order

By default, the data is sorted from the oldest date to the most recent, but when there is a lot of data this means that to see the most recent and probably most relevant data, the person has to scroll down. We can remedy that easily by sorted the data by date in descending order, as we can see below:

functions20-8

To achieve this, I’ve used the same formula as before except at the end I’ve added an ORDER BY part:

functions20-9

functions20-39

This orders (or sorts) column A (the dates) and the ‘desc’ tells it to do it in descending order. If you want to tell it to do it in ascending order, write asc.

Carefully note the syntax, as one tiny error will stop this from working.

Also note that, the QUERY results aren’t formatted and the column widths aren’t automatically adjusted. This needs to be done manually either beforehand, or afterwards.


Example 3a – Filter the data between 2 dates

The head also wants to be able to filter the data for a particular period of time, e.g. september 16. As you can see below, the data has been filtered between 1/9/2016 and 30/9/2016.

functions20-31

Here’s the formula I’ve added in cell A2:

functions20-32

functions20-40

The new part is at the end. First, we tell it to look in column A for a date bigger than or equal to the date in inverted commas. Make sure you add the word date, to tell the function that you’re looking for a date and not some text.

functions20-41

Then in the next part, we add ‘and’ to tell it to look for 2 criteria. Then tell it to look for a date less than or equal to the date in inverted commas.

Note that with dates you need to write the date in the following format:

YYYY-MM-DD

Even if the date format in your sheet is different as it is in my one.

So to summarise, it gets columns A to H, and returns rows that meet the 2 criteria, i.e. 1/9/2016 to 30/9/2016.


Example 3b – Filter between 2 dates using cell references

As we saw earlier we can replace the actual dates in the formula with cell references.

functions20-10

The only thing is that to do this we need quite a complex looking formula. Here’s the formula I’ve added in cell A2:

functions20-11

Here’s the formula broken down into it’s component parts:

functions20-44

functions20-45

functions20-46

To replace the actual date we need to use the following formula after the word ‘date’:

functions20-42

This gets the date in cell D1, puts it into the correct format. The same goes for the second date:

functions20-43

If anyone reading this knows of simplier syntax to do the same thing, I’d love to hear it, as I couldn’t find on-line a better way to do this, but I’m open all ears!


Example 3c – Filter between 2 dates and by teacher

Finally, the head wants to filter the feedback between the two dates, by a specific teacher, and order the feedback by date in descending order, as you can see below:

functions20-12

As you can see the formula is getting pretty long, but you can also see that it’s made up of parts and you can extract what you want by adding extra parts.

functions20-13

Here’s the query part broken down:

functions20-44

It gets columns A to H.

functions20-45

Finds rows where the date is greater than or equal to the one in cell D1.

functions20-46

And that also is less than or equal to the one in cell F1.

functions20-14

AND where the name in column C is the same as the name in cell B1.

functions20-47

Then order the results by the dates in column A in descending order.


Example 4 – Filter against various criteria

Here the admin manager wants to use the questionnaire feedback to see how good the information is that is given to the students when they sign up and how well the service was in the office. He particularly wants to know if there was any low feedback in any of the areas under his control, so wants to know if the course info, payment info, or office service was rated less than 3 by anyone.

functions20-15

To create the table above I’ve added the following formula in cell A1:

functions20-16

Here I’ve selected 5 columns and notice that I’ve put column B at the end. This shows that when selecting the columns, you don’t have to have them in the same order as the original data. This is extremely useful at times.

In the second part, I set the criteria, i.e. he’s looking for values which are less than 3 in each of the columns, L, M, and N. To include 3 different criteria, I’ve used the ‘or’ keyword, so that it will return rows if any one of them have a value of less than 3 in it.

As we can see in the table above, it’s found 3 results where the course information was rated poorly, and we can see from the level that it was related to level B1, so clearly some work is need there. Plus, there is one incident where the service in the office was rated poorly, which may need investigating.


Example 5a – Returning the averages of data

The director of the school wants to know if there are any classrooms that are rated more poorly than others. He wants to see if the average rating is different for any of the classrooms. In the table below, we can see that clearly, there is a problem with class A1, as it is rated poorly and much lower than the others.

functions20-17

To do this, I’ve added the following formula in cell A1:

functions20-18

This time I’m interested in the classrooms in column I and the average of the scores given in column J. First, I select column I, then select the average of column J, then I group them by classroom, in other words by column I.

We can return the average, count the number of entries, return a maximum or minimum in that column, or sum up the entries, using the following syntax:

functions20-34

Quite often these work with the ‘group by’ keyword, to be able to return the results.

Note, there is an empty row in row 2, as the data range is looking below the original data and into empty rows, and it will return one. This can be eliminated by stating the exact range of your data as we will see next, but the downside is that if more data is added the range will have to be updated.


Example 5b – Returning the averages of data and ordering them

Following on from the example above, we can adjust our returned information by sorting the feedback by the lowest to the highest, i.e. in ascending order. Here we can see class A1 is the lowest rated.

functions20-19

Here’s the formula I wrote in cell A1:

functions20-20

The first part is as before, then it’s followed by:

functions20-48

This orders the results by the average of column J (in column B), in ascending order.


Example 6 – Pivot information using QUERY not pivot tables

To finish off this first part, let’s look how we can pivot the information to see the averages of 2 criteria for each teacher. We want to look at the “is clear” and “is organised” categories. If you’re familiar with pivot tables, this works in a similar way, but with the bonus of doing everything right within the QUERY function.

Here the data has extracted the information below. It looks like Fred’s class organisation may need improving a little.

functions20-21

To get this output, I’ve written the following formula in cell A1:

functions20-22

This selects the average of column D (“is clear”) and the average of column E (“is organised”) as the criteria, then pivots it by teacher (column C), so that we see an average of each criteria for each teacher.

We could look at every criteria per teacher, just by adding the average for each criteria column, e.g. avg(F), avg(G), etc.


Analysing a HR database

OK let’s look at a different set of data now. Here we have employee database with some information about them.

functions20-23


Example 7 – Returning average salaries per department

The HR director wants to know what the average salary is per department from the data above. Here’s the end result:

functions20-24

In cell A1, I’ve written the following formula:

functions20-52

This selects the departments (column B), and the average of the salaries (column C) grouped by department.


Example 8a – Listing salaries per employee in descending order

Here he wants to see the salaries per person in descending order, without any of the other information.

functions20-25

In cell A1, I’ve written the following formula:

functions20-53

This selects columns A and C, and sorts column C in descending order.


Example 8b – Limiting the number of results

The HR director actually only wants to see the 5 highest salaries. We can use the formula and add a limit to it, to show the following:

functions20-27

Here’s the formula:

functions20-54

In the last part I’ve added ‘limit 5’. This returns the first 5 rows.


Example 9 – Ordering by more than 1 criteria

Let’s now look at how we can order our results by 2 or more criteria. Here the HR guy wants to see the employee names, their departments and salaries. He wants the data organised by department then by salary, with the salaries going from highest to lowest.

functions20-29

To do this, I’ve written the following formula:

functions20-30

This selects columns A, B, and C (employee, department, and salary), orders first by department (B), then by salary (C). Note the syntax, after ‘order by’ you just add the first column letter, then the second one after a comma.


Example 10 – Relabelling column headers

Finally, let’s look at how we can rename the column headers to something different from the original data. This can be useful, if the original data is from a computer output and the column headers aren’t in everyday English, or you may simply want to change them.

Here I’ve changed the column “Employee” to “Name” using the QUERY function.

functions20-35

To do this, yes you’ve guessed it, I’ve added the following formula in cell A1:

functions20-36

The new part is at the end, (label A ‘Name’). This tells it to rename column A with the word ‘Name’. To add more labels, just add a comma and the column letter and new name.


A couple of final comments about QUERY. Be careful where you place your QUERY function, as you need to make sure that there is nothing in the cells particularly below it, as otherwise it’ll throw an error.

The syntax is very exact, so make sure you notice in the examples, how the punctuation is used.

Despite this being a long post, I’ve only scratched the surface as to what QUERY is capable of. To learn more go to Google’s page on the query language:

https://developers.google.com/chart/interactive/docs/querylanguage


eBooks available on Drive, Forms, Sheets, Docs, Slides, and Sheet Functions:

Baz Roberts (Flipboard / Twitter / Google+)


Google Sheets Functions – INDEX and MATCH (VLOOKUP alternative)

In a previous post, we looked at how we can quickly look up tables for certain information, using the VLOOKUP function. This function is great but it does have some limitations. Firstly, when you look up information in the table you always have to look to the right for the matching information. Secondly, if you add a new column to the table, this messes up the references and it returns the wrong information.

This is where the functions INDEX and MATCH come in. With a combination of these two, you can look either left or right in a table, and it adapts to any added columns.


INDEX

First, let’s see how the INDEX function works. Here we have 2 columns of data (columns A and B), we can pick out a certain cell’s data by referring to its position in the table of data.

functions19-1

First we state the table range (A1:B3), then state the row number we want (3), then the column number (2). This returns the value in the cell (300).

functions19-2


INDEX and MATCH

Example 1 – Finding the classroom a specific teacher is in

Usually we don’t already know the exact row and column we want, and this is where MATCH comes in.

Here we have a table which continues a list of teachers and which classroom they are in. I want to be able to type in cell D1 and find the classroom the teacher is in, and return it in cell D2.

functions19-3

First we need to understand the following MATCH function.

functions19-16

This looks for the teacher’s name in D1 within the list of names (A2:A9). The 0 at the end looks for an exact match. This returns the number 3, as that is the number of places down the list in which Barney is. Now we have the row, we need to use that to find the classroom.

We do this by adding the INDEX function. This looks down the classroom list (range B2:B9), and picks the value at the place returned by the MATCH function. This is 3 as we saw above. This is classroom A3.

So, to reiterate that, the first range (B2:B9) is the column you want to return the value from. The second range (A2:A9) is the column you want to initially search in.

functions19-4

In this example, we could have used the following VLOOKUP function, which as you can see returns the same result in cell E2.

functions19-5


Example 2 – Finding who is in a particular classroom (looking up to the LEFT in a table)

In the above example, we looked up a teacher and then moved RIGHT, to look up the classroom. But what happens if we want to do the opposite and look up the classroom and move LEFT and find out which teacher is in that room?

functions19-6

In cell E2, we have the following VLOOKUP formula and as we can see above, it returns a N/A error. This is because VLOOKUP cannot look to the left, as the column number is always positive, i.e. moves to the right.

functions19-8

To be able to do it, in cell D2 we can use the following INDEX and MATCH formula:

functions19-7

Here all we do is swap the formula we saw earlier around. We return the teacher in range A2:A9 by matching the classroom in range B2:B9.


Example 3 – The effect of inserting a column in a table on an INDEX/MATCH and a VLOOKUP formula

Another problem with using the VLOOKUP formula is that if a column is added and deleted from the table being used, it returns the wrong result, as the column reference is then incorrect.

Below we have started with the same table and formulas that we had in example 1. Then I have added a new column with the class year in those classrooms. As we can see, the INDEX/MATCH formula in cell E2, correctly returns that the Bob is in classroom B2. The VLOOKUP formula in cell F2, returns the class year and not the classroom.

functions19-9

As we can see, the VLOOKUP formula, still refers to the second column even though the range now covers 3 columns and we in fact now want the third column.

functions19-11

Whereas, the INDEX/MATCH formula, adapts and looks up the classroom now in column C.

functions19-10


Example 4 – Returning more than one column of information

We’re not limited to just returning a cell in one specific column, we can return multiple columns or an entire row if we want. Here I want to return both the class year (in E2) and classroom (in F2), when the teacher is entered in cell D2.

functions19-12

We use the following formula:

functions19-13

Here the output range is B2:C9, which includes both the year and classroom. Note, that this returns the information, in separate columns to the right of the cell with the formula in it, i.e. cells E2 and F2.


Example 5 – Matching a range and not an exact figure

So far, we’ve been looking for exact matches, but like VLOOKUP we can also match within ranges.

Here we have a level test where the score corresponds with the student’s level. If they get 5 or less they are a beginner, if they get between 12 and 15 they are intermediate, and so on. In cell D1, we’ll add the score, then in D2 we’ll see the level.

functions19-14

We use the following formula:

functions19-15

This is the same as we’ve seen before, except that there is a “1” at the end this time. This time it’s looking for 14 in column B and looks for the largest value that is the same or less than it. As 14 doesn’t appear, it finds that 12 is the largest value. Then it returns the corresponding level in column A. So, basically, you can imagine a range from 12-14 for Intermediate, and that our value falls within that range.

There are 3 number options at the end of the formula:

1: causes MATCH to assume that the range is sorted in ascending order and return the largest value less than or equal to search_key.

0: indicates exact match, and is required in situations where range is not sorted.

-1: causes MATCH to assume that the range is sorted in descending order and return the smallest value greater than or equal to search_key.

So, if the list was sorted in  descending order we would use a “-1” instead. By default, “1” is used, so it is possible to leave the number out if you want to search a list in ascending order, like we did in this example.

In many cases VLOOKUP will do the job, but as you can see above there are times when INDEX and MATCH is better. It’s less used partly because it looks more complicated but as we’ve seen above it’s easy to set up.


eBooks available on Drive, Forms, Sheets, Docs, Slides, and Sheet Functions:

Baz Roberts (Flipboard / Twitter / Google+)