Activity 1-4

Adapted from Brown University CSCI0931. Used with Permission.

Task 1: Building a small similarity matrix

  1. Copy and save ACT1-4_starter to your Google Drive. Open it in your Google Drive.
  2. This sheet contains most of the work that you've done so far (except for the "compare with any legislator" thing that we did in the last activity). We're going to use it to build a table that shows, for any pair of legislators, how similar their records are. That means that we'll build a table where each row is labelled by a legislator, and each column is labelled by a legislator, and the cell in the 12th row, 3rd column will tell us how much the 12th legislator and the 3rd legislator are alike.
    The spreadsheet has many tabs; using the first few, you'll solve the "how similar" problem for a very small group of legislators: just six legislators and thirteen votes --- small enough that you can easily work out the similarity between two of them by hand. When you get the computations for this small group all worked out, you can extend to working with the full set of data.
    We've done a few things to make your life simpler: The sheet called 6LegislatorsPivotTable will be the starting point for your work today. Open that tab and look at the data. Which two legislators look most similar to you?
  3. Suppose we took the first two rows of the table -- Starry and Lee Cheuk-yan's votes -- and did a SUMPRODUCT with them. As discussed earlier, this would count the number of agreements minus the number of disagreements. What value do you get, doing this by hand? Check with your neighbor to be sure you agree.
  4. If we did the same thing with the absolute values of those two rows, we'd get the number of agreements PLUS the number of disagreements. What value do you get, doing this by hand?
  5. Because we're later going to want the sum you computed in the last problem, we're going to copy the pivot table, but replace the numbers with their absolute values. Do this in the tab called 6LAbsPivotTable.
  6. Now, in the 6LDifference tab, make a table whose row-labels and column-labels are both those from the pivot table (i.e., the rows are labelled by legislator names, and so are the columns). Use FORMULAS to create these labels rather than copying and pasting. Hint: "Transpose" may help you.
  7. Now use an MMULT formula to fill in the agree-minus-disagree numbers for all pairs of senators in that table.
  8. On the 6LSum tab, make a similar table, with the same row and column labels, but with the entries being the number of agreements-plus-disagreements for each pair of senators.
  9. Pick an entry of this table at random, and compare the value in the SUM table to the value in the DIFFERENCE table. Which should be a larger number? Is it?
  10. Finally, in the 6LSimilarityTable tab, once again use legislator-names for row and column labels, and for each legislator-legislator pair, put the ratio of agree-minus-disagree to agree-plus-disagree in the corresponding row and column. (Do this for the first legislator-legislator pair, i.e., Starry-Lee Cheuk-yan, and then use "fill" to do the others. Do you need to worry about absolute references versus relative references as you fill?)
  11. With your neighbor, calculate by hand what the three entries above the diagonal should be, and make sure that these are the answers you got.

Task 2: Building the big similarity matrix

  1. Repeat each of the tasks you just did for four legislators, but do it for the much larger group of legislators represented in the LegcoData tab. The pivot table for the legislators is in the FlippedPivotTable tab, and the space you need for the subsequent computations is provided by other tabs with parallel names. Be sure, in each tab, to put the legislators' names in the first row and column.
  2. When you're done, look at the SimilarityVisulization tab, where a copy of your table is shown, but with the cells colored by value. Dark green is strong agreement, dark red is strong disagreement. Explain why the diagonal is all dark green.
  3. The color coding was done using a kind of crude conditional formatting: values greater than 0.8 were made dark green; those greater than 0.6 we a less dark green, and so on. Click on cell B2 and go to Format > Conditional Formatting to see how this was done. The order of the rules is important: if a cell satisfies some rule, then a color gets used and the rest of the rules are ignored. What would happen if we swapped the first two rules and the value in the cell was 0.9? What color would be used?
  4. This kind of conditional formatting is not very sophisticated. Excel has cleverer formatting rules, which we'll use to better understand the data. For now, just look at the data and colors and ask "What does a dark green square that's NOT on the diagonal tell me? What about a dark red square?"
  5. (To get a better sense of the data, you can try to reorder the legislators so that the "similar" blocks are next to each other. The easiest way to do this is to make a new FlippedPivotTable (call it FlippedPivotTableOrig) and then to make FlippedPivotTable a values-only copy of FlippedPivotTableOrig. You can then swap rows and columns, and the parallel sheets should update to reflect the new ordering. And, if you ever get into trouble and mess up the sheet, all you need to do is to re-copy the values over, and you get the original sheet back.)