On Average

loopspace

29th March 2016

# 1 Introduction

As an introduction to averages, I decided to give the students a data set to play with. To make it more interesting for them, they were to gather the data themselves. Thus each student provided one set of data, and then the data were amalgamated and distributed amongst the class. I used Google Sheets to gather and share the data.

A brief outline of the activities is as follows:

1. Each student had a spreadsheet in which to record their data.

The actual data they recorded was various measurements of their hands. I managed to find about 40 distinct measurements for them to take.

2. The different sets of data were gathered into a single spreadsheet and processed slightly. They were then randomised and anonymised and shared back with the class.

3. The students then worked on the shared data, using spreadsheet functions to calculate the three different types of average.

Although the spreadsheet has functions that calculate these directly, the students were led to develop them in slightly more detail. For example, instead of using the AVERAGE function for the mean, they first calculated the sum, using SUM, and the number, using COUNT, and then the division.

4. There was an extension activity which was to find extreme values, and then to find those students with at least one extreme value.

The twist here is that there were enough distinct measurements that the probability is that every student would be extreme on at least one of them.

# 2 Using Sheets

The strategy with using Google Sheets was as follows:

1. Start with a template sheet containing a list of all the measurements that were to be taken.

2. Make a copy of the template for each student and share it with that student.

Note that the sheet would remain owned by me. This made the next step easier, and meant that I could easily correct their data (some recorded it in the wrong column, some included units).

3. Make a master copy of all of the data by pulling in the relevant column from each of the students' sheets.

4. Process, anonymise, and randomise that data.

This involved producing a copy of the data where each student's name was replaced by a generic Student N, the measurements were normalised by dividing by hand length, and the order of the measurements was randomised.

5. Share the data with the students.

In actual fact, the data set was split in two and half shared with one half of the class and the other half with the other.

Again, a copy of the data was made for each student and shared with them. This meant that each student could work on the data without any other student seeing what they did.

It was important that I retained ownership of all of the documents. Not only did this make the workflow easiest, it also meant that if necessary I could quickly correct someone's work.

The interaction possible in Google Sheets meant that all of the above could be accomplished in such a way that when a student entered more data, the whole data set would update. This meant that those who finished their measurements first were not impeded in starting on the analysis by having only a small data set to work with.

# 3 The Great Share

The first step is to create a template spreadsheet. The students should record their measurements in a specific column, but it is up to you which column they should use.

If you put lengthy explanations in one column as to the measurements then be aware of the fact that the default setting for word wrapping is “overwrite". This means that if the next cell is empty, the text will overflow into it. To people not familiar with spreadsheets, this can give the erroneous impression that those cells have content in them. Thus it is better to set the word wrapping to “Wrap". To do this, select the column with your explanations, then from the “Format" menu choose “Text wrapping" and select the “Wrap" option.

Once the template is how you want it, you need to share it with your students. For this, you need a list of the students' email address and names. It is best to put this in a spreadsheet. For this particular situation, it is good to have a fresh spreadsheet as it is useful to be able to add more details. The following script assumes that the details are in two columns with names in the first column and emails in the second.

The following script will do the sharing. Note that the ids are the long strings of letters and numbers (and other symbols) in the URL when you open a file or folder in the Google ecosystem.

function createMeasurementSheets() {
var studentId = ""; // id of spreadsheet with students' names and emails
var folderId = ""; // id of folder where final spreadsheets should be put
var templateId = ""; // id of template spreadsheet
var sheetName = " Measurements"; // title of spreadsheet (to be appended to the students' name)
var studentsCell = 'A2'; // upper-left cell of student details block

// Top left cell of student data
var studentUL = studentSheet.getRange(studentsCell);
// Array of student data
var students = studentUL.offset(0,0,studentSheet.getLastRow() - studentUL.getRow()+1,2).getValues();

// Folder where the sheets should end up
var folder = DriveApp.getFolderById(folderId);

var ss, sfile, root, cell;
for (var i=0; i<students.length; i++) {
// Copy the template spreadsheet, naming it "<Student Name> <sheetName>"
ss = templateSS.copy(students[i][0] + sheetName);
// Get the corresponding file
sfile = DriveApp.getFileById(ss.getId());
// Get the current folder containing the file (this will be the root of your Drive folder)
root = sfile.getParents().next();
// Add the file to the folder where it should be put
// Remove it from the original folder
root.removeFile(sfile);

// Now set the global permissions, make it so that the students can't reshare this file
sfile.setShareableByEditors(false);
// Add the given student as editors

// We'll store the id in the third column of the student's details
cell = studentUL.offset(i,2,1,1);
cell.setValue(ss.getId());
}
}

The configuration variables are at the start:

• studentId This is the id of the spreadsheet holding the students' details.

• folderId This is the id of the folder where you want to store the created spreadsheets.

• templateId This is the id of the template spreadsheet.

• sheetName The created spreadsheets will have a name consisting of the student's name with this string appended to it (no space is added automatically).

• studentsCell This is the top left cell of the block containing the students' names and emails. This allows you to have a header row, or other details in the spreadsheet.

The script makes a third column in the details spreadsheet with the ids of the created spreadsheets. This is useful for referring to them later.

# 4 Amalgamating the Data

Amalgamating the data is quite straightforward and depends on a neat feature of Google Sheets: the ability to pull selected data from one sheet into another. This is not a one-time pull, it links the data so that updates to the master sheet are reflected in the copy (the link is only one-way, editing the copy doesn't work - in fact, putting anything in a cell that contains copied data breaks the copying process).

As we already have the ids in the details spreadsheet, it is easiest to work in that. To maintain a clean spreadsheet, it is best to make a new sheet for the data.

1. Click on the + in the lower left of the browser window to create a new sheet.

2. Right-click (or two-finger tap if using a touchpad) on the new sheet's name to rename it something meaningful.

Actually, it's probably a good idea to name the original sheet something meaningful as well. In the following I'll assume that it is called Class List.

The key to importing data is the importrange function. The syntax of this is:

It'll be useful to have the template imported as well to see what the measurements mean.

• In cell A1 of the new sheet, type:

=importrange("template id","A1:B")

Replace template id by the id of the template spreadsheet. In the range, I've typed A1:B. This effectively means “import the first two columns" as in my template I had two columns of explanations. If only one, use A1:A, if more then increment the B as far as you want.

• When you import data from one sheet to another, you need to explicitly authorise the link. After hitting return on the above, it will first say Loading... and then #REF!. Hover the mouse, or click on, the cell. A dialogue will pop-up asking you to authorise access. Click on the blue box to do this.

As I imported two columns, my first free one is the C column. So to import the data from the first student, I do the following.

• In cell C1 of the new sheet, type:

=importrange(indirect("'Class List'!R" & column()-2 & "C3",false),"C1:C")

If you are in cell B1 then in place of column()-2 you would type column()-1, if in cell D1 it would be column(), and so on.

Also, in this I'm assuming that the ids are in the 3rd column of the Class List sheet and start at the first row. If a different column, change the 3 in the "C3" to a different number (the C here refers to “column" not to the C column specifically). If a different row, adjust the column()-2 accordingly. Basically, column() expands to the column number of the given cell (with A begin $1$ and so on).

I'm also assuming that the data in the students' sheets is in the third column. If a different column, change the Cs in the C1:C part accordingly.

• Now copy that from cell C1 across the top row as many times as you need.

• As before, you will need to authorise access for each of these imports (unfortunately, there's no way that I know of to automate this).

And there you have your data! Moreover, as students take more measurements it will automatically update.

# 5 Processing the Data

You may want to process the data before resharing it with the class. One reason for this is to allow the students to focus properly on the task at hand, which is analysing the data. If they can identify whose measurements are whose they might focus more on that.

What I did (remember that my measurements were of their hands) was to divide each measurement by the hand length. Partly this was to normalise it, and partly to anonymise it. I also randomly sorted the measurements (my student list was already in a random order).

To accomplish these ends, I created a third sheet in the spreadsheet where I'd imported all the data. I created a list of random numbers in one column and then the numbers $1$ to $42$ in the second (I had $42$ measurements other than the hand length).

1. In cell A1 type =rand(). This will enable us to randomise the data by sorting on this column.

2. In cell B1 type =row(). This will mean that the teacher can still match the data to its original meaning.

3. Copy these down as many rows as you need.

4. Select the region you've just created (including the first row) and copy it.

5. Right-click (or two-finger tap) on cell A1 and from the menu select “Paste Special" and then “Paste values only". This replaces all of your formulae by their actual values. This will be important when sorting the data.

In each students' measurements, the first one was the hand length and thus the one I wanted to divide all the others by. Unfortunately, not all students recorded this measurement, and so sometimes dividing by it would give the inevitable “divide by zero" error. Also, even if a student did record this measurement, not all students recorded every measurement so some data would be missing, but spreadsheets often treat empty cells as being $0$. In either of those cases, I wanted the corresponding cell to be empty rather than an error or zero. The following formula accomplishes that end:

=if(indirect("Measurements!R" & $B1 & "C" & Column(),false) = 0, "", iferror(indirect("Measurements!R" &$B1 & "C" & Column(),false)/Measurements!C$2,"")) Let's look at that in more detail. • indirect("Measurements!R" &$B1 & "C" & Column(),false)

The bit inside this, "Measurements!R" & $B1 & "C" & Column() is a reference to the corresponding original measurement. We pick it out using the RC notation wherein R2C4 is the cell in the $2$nd row and $4$th column. So we pick out the row corresponding to the value in B1 (this makes it robust against reordering) and the column according to the current column. The outer indirect then gets the cell contents. • =if(... = 0, "", ...) This is a conditional. It tests if the cell contents are $0$. If so, it displays the empty string, "". If not, it goes on to the next part. • =iferror(...,"") The … in this are dividing the cell by the hand length. If that produces an error (i.e., if the hand length is zero) then the surrounding iferror catches this and displays just "". Otherwise it displays the result of the calculation. Note the dollars at various parts in this. We'll copy this into a block the size of our original data (less one row). Normally, when a formula is copied then it adjusts the cell references within it. The dollars inhibit this for various bits. At some point, we reorder the data based on the random numbers in column A. To do this, click on a cell in column A, and then from the “Data" menu select “Sort sheet by column A, A $\to$ Z". # 6 Sharing the Data Sharing the data is pretty much like sharing the original template: we make a copy for each student and share that with the individual student. But we don't want to copy the current spreadsheet as that contains the original data as well. So we make a new spreadsheet to share that contains just the processed data. (I actually split my data into two groups at this point.) 1. Create a new spreadsheet. 2. In cell B2 type: =importrange("id of data spreadsheet","Data!C1:Z") In this, use the id of the spreadsheet with the data. The Data!C1:Z refers to the processed data, here Data is the name of the sheet containing the processed data and C1:Z says that it starts in column C and goes on to column Z. I put this in B2 so that the first row can be a header row and the first column also a label column. 3. Do the “allow access" thing as before. 4. If you want to split the data, do the above for each grouping and only import that part you want for each group. In the following, I'll assume that we're using multiple sheets to show how that would work. Other than that, the script is very similar to the initial one. function createDataSheets() { var studentId = ""; // id of spreadsheet with students' names and emails var folderId = ""; // id of folder where final spreadsheets should be put var dataIds = [ "", "" ]; // array of ids of data spreadsheets var sheetName = " Data"; // title of spreadsheet (to be appended to the students' name) var studentsCell = 'A2'; // upper-left cell of student details block // Spreadsheet with students' usernames var studentSheet = SpreadsheetApp.openById(studentId); // Top left cell of student data var studentUL = studentSheet.getRange(studentsCell); // Array of student data var students = studentUL.offset(0,0,studentSheet.getLastRow() - studentUL.getRow()+1,2).getValues(); // Folder where the sheets should end up var folder = DriveApp.getFolderById(folderId); // Data spreadsheets var dataSS = []; for (var i=0; i<dataIds.length; i++) { dataSS.push(SpreadsheetApp.openById(dataIds[i])); } var ss, sfile, root, cell, n; for (var i=0; i<students.length; i++) { n=i%dataSS.length; // Copy the nth data spreadsheet, naming it "<Student Name> <sheetName>" ss = dataSS[n].copy(students[i][0] + sheetName); // Get the corresponding file sfile = DriveApp.getFileById(ss.getId()); // Get the current folder containing the file (this will be the root of your Drive folder) root = sfile.getParents().next(); // Add the file to the folder where it should be put folder.addFile(sfile); // Remove it from the original folder root.removeFile(sfile); // Now set the global permissions, make it so that the students can't reshare this file sfile.setShareableByEditors(false); // Add the given student as editors ss.addEditor(students[i][1]); // We'll store the id in the forth column of the student's details cell = studentUL.offset(i,3,1,1); cell.setValue(ss.getId()); } } The new configuration detail here is that we have a list of ids in dataIds instead of the single id in the previous script. The one irritation about this is that the new sheets have to be individually authorised to access the data. This means opening each one in turn and allowing access. Irritating as this is, it is a small price to pay for the fact that the data is now linked from start to finish and as students input more measurements, so the shared data sheets will update automatically. # 7 Calculating the Averages Although this has been primarily about how to use Google Apps Scripts, I may as well finish off with the instructions given to the students as to how to analyse the data. The aim of this activity was to introduce the students to how to use spreadsheets to calculate things like averages, but also to remind them of how those averages are actually calculated. Thus they aren't instructed to use functions like AVERAGE directly, but are led through the building blocks of each type of average and only after calculating it do they compare it with the in-built spreadsheet function. The shared data filled the region B2:M43, with the first row as a header row. These instructions can also be downloaded as a PDF. ## 7.1 Introduction There are three types of “average" that are used in statistics: 1. Mean: to calculate the mean, you add up all the values and divide by how many there are. This is useful when the actual values of the data are important. 2. Median: to find the median, you sort the values and pick the middle one. If there is no “middle one", you find the mean of the two middle values. This is useful if you want to divide your data into two equal groups. 3. Mode: to find the mode, you find the value that appears the most. This is useful if you want the most common value. In the following, you will calculate each of these averages for each row of your data. You will do this by calculating them for one row and then copying the formulae to the other rows. Not all your rows may contain data initially, so you might want to do the initial calculations on a row that has some data. In which case, in all of the following formulae change the 2 for the row that you are actually working with. ## 7.2 Calculating the Mean To calculate the mean, we add up the entries in a row and divide by the number of entries in that row. 1. In cell N1 type Mean. 2. In cell N2 type =sum(B2:M2) to calculate the total of the data in the second row. 3. In cell O2 type =count(B2:M2) to count the number of entries in that row. 4. In cell P2 type =N2/O2 to divide the total by the count. 5. In cell Q2 type =average(B2:M2) to display what the spreadsheet thinks the average is. 6. Copy the formulae from N2:Q2 into rows 3 to 43 to compute the means for the other rows. ## 7.3 Calculating the Median To calculate the median, we make a copy of each row with the data in order and then pick out the middle value (if there is one, if not the mean of the two middle values). 1. In cell R1 type Median. 2. Click on the + at the bottom left of the spreadsheet window to create a new sheet. 3. Click on the Sheet2 tab at the bottom to select it. 4. In cell B2 type =transpose(sort(transpose(Sheet1!B2:M2),1,true)) to make a sorted copy of the second row of data. 5. Copy this into cells B3 to B43 to make a sorted copy of all the rows of data. 6. Click on the Sheet1 tab at the bottom to go back to the first sheet. 7. In cell R2 type (all on one line): index(Sheet2!B2:M2,O2/2+1))/2) This picks out the middle value in the list, if the list has an odd number of entries, or the mean of the middle two if even. 8. In cell S2 type =median(B2:M2) to display what the spreadsheet thinks the median is. 9. Copy the formulae from R2:S2 into rows 3 to 43 to compute the medians for the other rows. ## 7.4 Calculating the Mode To calculate the mode we first put the entries into “bins" of width $0.1$. That is, we put all the values that are between, say, $0$ and $0.1$ together. 1. In cell T1 type Mode. 2. Click on the + at the bottom left of the spreadsheet window to create a new sheet. 3. Click on the Sheet3 tab at the bottom to select it. 4. In cell A1 type 0. 5. In cell B1 type =A1+.1 and copy this into cells C1 to U1. These define the “bins" into which we'll put our data. 6. In cell A2 type =countif(Sheet1!$B2:$M2,"<"&B$1)-countif(Sheet1!$B2:$M2,"<"&A$1) (note the dollars) and copy this into cells B2:T43. This counts how many entries in a given row lie between the value at the top of the column and the value at the top of the next. 7. Click on the Sheet1 tab at the bottom to go back to the main sheet. 8. In cell T2 type =max(Sheet3!A2:T2) to find the largest bin size. 9. In cell U2 type =match(T2,Sheet3!A2:T2,0) to find which bin this occurs in. 10. In cell V2 type =index(Sheet3!A$1:T$1,1,U2) to find the label on that bin, which is the mode for this data. 11. In cell W2 type =mode(arrayformula(if(isnumber(B2:M2),rounddown(B2:M2,1),))) to find what the spreadsheet thinks the mode of the data is. ## 7.5 Finding the Special People Let's say someone is special if one of their measurements is either the largest or smallest of that measurement. 1. In cell X1 type Maximum and in cell AA1 type Minimum. 2. In cell X2 type =max(B2:M2) and in cell AA2 type =min(B2:M2) to find the maximum and minimum values in the row. 3. In cell Y2 type =match(X2,B2:M2,0) to find which column the maximum lies in and in cell AB2 type =match(Y2,B2:M2,0). 4. In cell Z2 type =index(B$1:M$1,1,Y2) to get the corresponding person's “name" and in cell AC2 type =index(B$1:M$1,1,AB2). 5. Right-click (or two-finger tap) on cell B1 and select Conditional Formatting. 6. In the new panel, in the menu Format cells if... select Custom formula is and in the box underneath type =match(B1,$Z2:$Z43,0) > 0. Click Done. 7. Click Add new rule and do the same but in the box type =match(B1,$AC2:\$AC43,0) > 0.

8. Click again on cell B1 and use Ctrl+C to copy it. Then select cells C1:M1, right-click (two-finger tap), and hover over Paste special to bring up the second menu from which select Paste conditional formatting only. This will mean that the special people are highlighted.

How many people are special?