In class last week, we built a spreadsheet that took all the voting records (+1 for Yes, -1 for No, 0 for missing, etc.) and compared them pairwise. The comparison used the idea that if you multiply two voting records and sum up the resulting products, you get the "agree minus disagree" number. (If you take absolute values before multiplying, you get "agree + disagree" instead). This notion of sums of pairwise products comes up over and over again in mathematics, economics, etc. It's almost never done with pairs of rows from a table, but instead with a row from one table and a column from another. When we do this, each table (which should be a table of NUMBERS) gets called a MATRIX (plural: matrices), and the process of computing all possible row-column sum-products is called "matrix multiplication." You're going to play with that a little bit in this problem.
A
, in pink, whose rows are labeled with the names of people, and whose columns are labeled with items for purchase.
You'll also see matrix B
, a table of prices (in dollars per item) for the three items, for each of two months.
Notice that the number of COLUMNS of A
is the same as the number of ROWS of B
.
A
* B
:
B12
through F16
: the first one to the left, the second one up and to the right. We then fill in the space (marked in green) in the lower right. Each cell in the green space lies in the same row as some row of matrix A
and in the same column as some column of matrix B
. We place in this cell the SUMPRODUCT
of that row and that column.
When you use SUMPRODUCT, you can pass either two rows or two columns. To have the function to work with a row and a column, use the TRANSPOSE function in one of the arguments, transforming a row into a column or a column into a row. Your function should look like:SUMPRODUCT("row", TRANSPOSE("column"))
SUMPRODUCT(TRANSPOSE("row"), "column")
SUMPRODUCT
formula you wrote wasn't too complex, but there's a far simpler way to produce the same answer.
E23
, the formula
= MMULT(something, something else )
where the "something" is the range that defines the pink matrix, and the "something else" is the range defining the blue matrix.
The results should be the same as those in the green area above.
MMULT
doesn't actually require that its two arguments (the things inside the parentheses) be placed in the way we placed them for the SUMPRODUCT
computation: they can be anywhere. You'll see that in E7
to F8
, we've made another green box. Enter a formula in E7
that will multiply the two original matrices (just to the left of the green box) and produce a result that should be the same as the one you got for the last two steps.
If we write the ROW labels for matrix A
and the COLUMN labels for matrix B
near our green matrix, the top left corner is "Robin, July"
. What does the number 20 that's in that spot mean? Well, it was computed as
(1 x 2) + (2 x 3) + (3 x 3)
so it's the total cost for Robin to buy her basket of goods in july: 1 soap at $2, 2 eggs at $3, and 3 pears at $4.
Open the "Random Coin Flips" tab of the spreadsheet. Using the RANDBETWEEN()
function, we've made 25 rows of 8 random numbers each, representing coin-flips: 0 is heads, 1 is tails. You're going to make a summary of these results.
You might expect that if you flip a coin 8 times, on average it'll come up heads four times.
I
, a formula to compute how many heads you got in each 8-flip trial. It'd be best if you entered just one formula in cell I2
, and then filled down, of course. I
? (Hint: zero). The largest? Fill in your answers in the spaces provided.A31
, enter the word "Value", and beneath it enter a list of the possible values, from smallest to largest, starting with the smallest in cell A32.B31
, enter the label "Value Count", and in cell B32
, enter formula to count how many times, in the 25 experiments, you got exactly zero heads. (Please try not to put the numbers directly into the formula. Make reference to the data in A32:A40
). Fill that formula down, and see whether it gives you the counts of how many times you got 1, 2, 3, etc. heads. A31
to B40
(more or less), you've got some data. Try to figure out a way to make a graph of that data, with the "values" on the x-axis and the value-counts on the y-axis. It could be a line graph, or bar graph, or anything else. Insert the graph onto this sheet.Now we're going to go over some more data importing. We have been importing XML and CSV files so far, but IMPORTXML
also allows us to import HTML files.
The first thing that we need to do is to disable Javascript on our browser. Javascript is a programming language that generates fancy effects (like pages that change content without reloading, etc). However, IMPORTXML
does not understand Javascript. Therefore, we need to see the webpage just as IMPORTXML
would "see" it.
Note: We could disable Javascript completely in our browser, but the problem is that Google Sheets requires Javascript to run! Therefore, we need to do this:
https://docs.google.com
. Make sure that the corresponding behavior is "Allow". Click on "Done" to close the window.
Go to the ImportExamples
tab of the spreadsheet. At the top is an example of performing a google search and using importXML
to extract the data that was returned (i.e., all the web pages you could click on). The xpath part of the formula is more complex than anything we saw in class; it's
//h3[@class='r']/a/@href
Let's interpret that. The first part, //h3
, says to search for anything with the tag h3
, which is the style used to display results in a google search.
To see this, go to Google (go to www.google.com, do not search from the address bar), search for "PolyU", and when the results show up, note two things:
https://www.google.com/search?q=PolyU
somewhere within it.h3
tag, with the attribute class="r"
.
B3
and examine the XPath of the IMPORTXML
command. The first part says "look for h3 tags with the class attribute equal to 'r'". The second part says "and within there, look for 'a' tags, and tell me the value of the 'href' attribute." It turns out that "a" tags are hyperlinks, and the "href" attribute is the URL for the linked page.
B8
and note that it contains the IMPORTXML
command. The cells from B8:B21
contain the search results from the Google search. These search results were imported by the IMPORTXML
command.
data-href
attribute, but we cannot access these because of a bug in Google Spreadsheets (it cannot correctly parse hypenated names). To get around this, in cells C8:C21
, we use substitute(cell name or string, target, replacement) to replace every instance of "/url?q=" with "", effectively getting rid of it. We also use substitute to get the right urls for image or news searches google returns, since those behave a bit differently than regular url results.
B8:B21
, you will see that almost all of them contain &sa=U&ved
. Everything after this should be stripped away in order to get the correct URL. That's what the cells in D8:D21
do.
B5
to instead return the number of "results" listed at the top of the page (about 641,000 when we tried it!). Once again, use "Inspect Element" to help you. A BIG hint: if you right-click on the needed data in the "Elements" window that comes up, you will be able to copy the XPath directly. Compare that with the XPath that you would construct yourself.
In cell A1
, use the IMPORTDATA
command, using the following URL as the data source: "http://resource.data.one.gov.hk/mtr/data/mtr_lines_fares.csv"
Some words are more common than others; a quick way to tell is to do a quick google search. "Synchronicity" has about 5 million results, while "Diachronicity" has only about 9000.
Open a new tab on your spreadsheet. In that tab, build a spreadsheet that lets a viewer enter two words and then shows which one has more results in a Google search.
To find the URL, you'll do a Google search for some word and see what the URL is, and then try to alter that to handle other words. But beware: if you type your query in the address bar (as I often do), the resulting URL will look like this:
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=synchronicity
or possibly something even more complex. On the other hand, if you go to google.com and type your query there, the URL of the results page will be more like
https://www.google.com/search?safe=off&q=synchronicity
The part about safe=off
can be safely removed, so all you need is .../search?q=synchronicity
. Use this simpler form. Google Spreadsheets doesn't seem to be able to read from the more complicated URL for the results page.
Share the following spreadsheet with PolyuComp1D04@gmail.com
Name_StudentID_HW1-4
In the above file name, "Name" should be replaced with your official name and "StudentID" with your studentID. If it is not, we will take off points. Make sure every task has been completed.