COMP1D04

From Scratch to Apps: Foundations of Computational Thinking and Literacy for Problem Solving

Grace Ngai

Due Oct 19, 2015, 23:59

For the following problems you may discuss the concepts that will help solve these problems with classmates and course staff. You may *not* simply copy down the answers of your classmates as that is a violation of the collaboration policy. The one exception to this rule is Task 5, marked as “**(Independent)**”. You may discuss independent problems *with course staff only*.

Copy this spreadsheet to your Google Drive and name it Name_StudentID_HW1-4.

In class last week, we built a spreadsheet that took all the voting records (+1 for Yes, -1 for No, 0 for missing, etc.) and compared them pairwise. The comparison used the idea that if you multiply two voting records and sum up the resulting products, you get the "agree minus disagree" number. (If you take absolute values before multiplying, you get "agree + disagree" instead). This notion of sums of pairwise products comes up over and over again in mathematics, economics, etc. It's almost never done with pairs of rows from a table, but instead with a row from one table and a column from another. When we do this, each table (which should be a table of NUMBERS) gets called a MATRIX (plural: matrices), and the process of computing all possible row-column sum-products is called "matrix multiplication." You're going to play with that a little bit in this problem.

- In the "Matrix Practice" sheet, you'll see matrix
`A`

, in pink, whose rows are labeled with the names of people, and whose columns are labeled with items for purchase. You'll also see matrix`B`

, a table of prices (in dollars per item) for the three items, for each of two months. Notice that the number of COLUMNS of`A`

is the same as the number of ROWS of`B`

.

Here is a description of how to compute the matrix product`A`

*`B`

:

We place the matrices as shown in`B12`

through`F16`

: the first one to the left, the second one up and to the right. We then fill in the space (marked in green) in the lower right. Each cell in the green space lies in the same row as some row of matrix`A`

and in the same column as some column of matrix`B`

. We place in this cell the`SUMPRODUCT`

of that row and that column. When you use SUMPRODUCT, you can pass either two rows or two columns. To have the function to work with a row and a column, use the TRANSPOSE function in one of the arguments, transforming a row into a column or a column into a row. Your function should look like:

`SUMPRODUCT("row", TRANSPOSE("column"))`

or

`SUMPRODUCT(TRANSPOSE("row"), "column")`

where "row" and "column" are specified in the matrices. - Do that now for the upper left corner of the green area, and use fill handles to extend the definition to the rest of the green area.
The
`SUMPRODUCT`

formula you wrote wasn't too complex, but there's a far simpler way to produce the same answer. - In the second area (rows 20-24), enter, in cell
`E23`

, the formula`= MMULT(something, something else )`

where the "something" is the range that defines the pink matrix, and the "something else" is the range defining the blue matrix. The results should be the same as those in the green area above. `MMULT`

doesn't actually require that its two arguments (the things inside the parentheses) be placed in the way we placed them for the`SUMPRODUCT`

computation: they can be anywhere. You'll see that in`E7`

to`F8`

, we've made another green box. Enter a formula in`E7`

that will multiply the two original matrices (just to the left of the green box) and produce a result that should be the same as the one you got for the last two steps.

If we write the ROW labels for matrix `A`

and the COLUMN labels for matrix `B`

near our green matrix, the top left corner is `"Robin, July"`

. What does the number 20 that's in that spot mean? Well, it was computed as
`(1 x 2) + (2 x 3) + (3 x 3)`

so it's the total cost for Robin to buy her basket of goods in july: 1 soap at $2, 2 eggs at $3, and 3 pears at $4.

- At the bottom of the page, fill in the two yellow cells by looking up the appropriate values in any of the green arrays.

Open the "Random Coin Flips" tab of the spreadsheet. Using the `RANDBETWEEN()`

function, we've made 25 rows of 8 random numbers each, representing coin-flips: 0 is heads, 1 is tails. You're going to make a summary of these results.
You might expect that if you flip a coin 8 times, on average it'll come up heads four times.

- Test that conjecture by entering, in column
`I`

, a formula to compute how many heads you got in each 8-flip trial. It'd be best if you entered just one formula in cell`I2`

, and then filled down, of course. - What's the smallest number you could possibly get in column
`I`

? (Hint: zero). The largest? Fill in your answers in the spaces provided. - In cell
`A31`

, enter the word "Value", and beneath it enter a list of the possible values, from smallest to largest, starting with the smallest in cell A32. - In cell
`B31`

, enter the label "Value Count", and in cell`B32`

, enter formula to count how many times, in the 25 experiments, you got exactly zero heads. (Please try not to put the numbers directly into the formula. Make reference to the data in`A32:A40`

). Fill that formula down, and see whether it gives you the counts of how many times you got 1, 2, 3, etc. heads. - In cells
`A31`

to`B40`

(more or less), you've got some data. Try to figure out a way to make a graph of that data, with the "values" on the x-axis and the value-counts on the y-axis. It could be a line graph, or bar graph, or anything else. Insert the graph onto this sheet.

Now we're going to go over some more data importing. We have been importing XML and CSV files so far, but `IMPORTXML`

also allows us to import HTML files.

The first thing that we need to do is to disable Javascript on our browser. Javascript is a programming language that generates fancy effects (like pages that change content without reloading, etc). However, `IMPORTXML`

does not understand Javascript. Therefore, we need to see the webpage just as `IMPORTXML`

would "see" it.

*Note: We could disable Javascript completely in our browser, but the problem is that Google Sheets requires Javascript to run! Therefore, we need to do this:
*

- Follow the instructions on this page. Click on "Do not allow any site to run Javascript"
- Do not click "Done" yet. Click on "Manage Exceptions..." in the Javascript section.
- We are going to tell Google Chrome that we will allow Google Docs to run Javascript, even though nobody else is allowed to do so. In the window that comes up, in "Hostname Pattern", put in
`https://docs.google.com`

. Make sure that the corresponding behavior is "Allow". Click on "Done" to close the window. *Click on "Done" to close the Content Settings window. Javascript should now be disallowed on your browser. Note that this will "break" many sites. To allow it again when you're done with this hoemwork, you can go back into Settings and reenable it.*

Go to the ` ImportExamples`

tab of the spreadsheet. At the top is an example of performing a google search and using `importXML`

to extract the data that was returned (i.e., all the web pages you could click on). The xpath part of the formula is more complex than anything we saw in class; it's

`//h3[@class='r']/a/@href`

Let's interpret that. The first part, `//h3`

, says to search for anything with the tag `h3`

, which is the style used to display results in a google search.

To see this, go to Google (go to www.google.com, do not search from the address bar), search for "PolyU", and when the results show up, note two things:

- The URL has (probably) changed to a long, complicated string of text that contains
`https://www.google.com/search?q=PolyU`

somewhere within it. - Place your cursor above the first site returned (the PolyU official page), right click and select "Inspect Element". A new panel will open down below showing you all about this element: it has an
`h3`

tag, with the attribute`class="r"`

. - Look in cell
`B3`

and examine the XPath of the`IMPORTXML`

command. The first part says "look for h3 tags with the class attribute equal to 'r'". The second part says "and within there, look for 'a' tags, and tell me the value of the 'href' attribute." It turns out that "a" tags are hyperlinks, and the "href" attribute is the URL for the linked page. - Now click on cell
`B8`

and note that it contains the`IMPORTXML`

command. The cells from`B8:B21`

contain the search results from the Google search. These search results were imported by the`IMPORTXML`

command. - We need to do a little processing on the search results returned by Google:
- The URLs found by this XPath begin with a "/url..." because they are
*relative*URLs. In other words, all these links require "https://www.google.com" prepended to them in order to work. The absolute URL for each result is kept in the`data-href`

attribute, but we cannot access these because of a bug in Google Spreadsheets (it cannot correctly parse hypenated names). To get around this, in cells`C8:C21`

, we use substitute(cell name or string, target, replacement) to replace every instance of "/url?q=" with "", effectively getting rid of it. We also use substitute to get the right urls for image or news searches google returns, since those behave a bit differently than regular url results. - Another problem is that the URLs contain a command which instructs the browser to load up the
*cached*version (a pre-saved version of the page) if the original site is inaccessible for some reason. If you look closely at the strings in`B8:B21`

, you will see that almost all of them contain`&sa=U&ved`

. Everything after this should be stripped away in order to get the correct URL. That's what the cells in`D8:D21`

do.

- The URLs found by this XPath begin with a "/url..." because they are
- We want you to fill in the formula in cell
`B5`

to instead return the number of "results" listed at the top of the page (about 641,000 when we tried it!). Once again, use "Inspect Element" to help you. A BIG hint: if you right-click on the needed data in the "Elements" window that comes up, you will be able to copy the XPath directly. Compare that with the XPath that you would construct yourself. -
Let's try importing another kind of data. Create a new sheet called "MTR" and import the MTR routes and fares.
In cell

What we have just imported is online CSV data.`A1`

, use the`IMPORTDATA`

command, using the following URL as the data source:`"http://resource.data.one.gov.hk/mtr/data/mtr_lines_fares.csv"`

Some words are more common than others; a quick way to tell is to do a quick google search. "Synchronicity" has about 5 million results, while "Diachronicity" has only about 9000.

Open a new tab on your spreadsheet. In that tab, build a spreadsheet that lets a viewer enter two words and then shows which one has more results in a Google search.

To find the URL, you'll do a Google search for some word and see what the URL is, and then try to alter that to handle other words. But beware: if you type your query in the address bar (as I often do), the resulting URL will look like this:

`https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=synchronicity`

or possibly something even more complex. On the other hand, if you go to google.com and type your query there, the URL of the results page will be more like

`https://www.google.com/search?safe=off&q=synchronicity`

The part about `safe=off`

can be safely removed, so all you need is `.../search?q=synchronicity`

. Use this simpler form. Google Spreadsheets doesn't seem to be able to read from the more complicated URL for the results page.

- Clear instructions for use at the top of the sheet
- A title at the top of the sheet
- A clear location for the user to place his/her words
- A clear indication of which is more common; perhaps you can use conditional formatting to show this
- A "note" attached to any cell where a complex computation is happening, giving a brief explanation of what's going on.
- A nice name for the tab.

Share the following spreadsheet with `PolyuComp1D04@gmail.com`

`Name_StudentID_HW1-4`

In the above file name, "Name" should be replaced with your official name and "StudentID" with your studentID. If it is not, we will take off points. Make sure every task has been completed.