.csv
ending, although we added a .txt
so you could see what it actually looks like. (Files with a .csv
ending are automatically downloaded or opened up in Google Spreadsheets by Chrome, so you can't see the file contents directly.) You should be able to see how it correlates to the XML data you saw on the website. We have removed some of the excess data fields before creating the CSV file, so not every field from the XML is present in the CSV file.We now have a CSV file that encodes senate data. We want to get this data into a spreadsheet.
Go to Google Sheets. Then go to File > Import > Upload
.
Pick the Legco data file that you downloaded. You will be prompted to choose how the file will be imported. Select Insert new sheet(s)
as your import action, tab
as the separator character, and click import
. A new sheet should open up containing data from your CSV file.
=importXML(B3, "//vote[@number=1]/*/member/@name-ch")
. In a moment or two, the rest of the column should be filled with legislator names. importXML
to import each legislator's votes for vote number 1. Ask the TA to check it for you.
importXML
that will use the data in $B$3
and $E3
to import in the vote for Vote 1 for Jasper Tsang Yok Sing. Check that the vote is really correct by looking at the XML document. $
signs around importXML
expression? Would it be possible to make this even easier?
vote-date
number
motion-en
motion-ch
(optional, if you want to keep it)name
vote
motion-en
column. Some of the cells in this column contain the words "FIRST PROPOSED RESOLUTION", or "THE FIRST PROPOSED RESOLUTION..." or "MOTION MOVED BY...". We want to keep only the rows with similar text in the motion-en
column. Hint: this should only take a few minutes; it does not involve deleting the unwanted rows one by one. Also, it's possible to sort rows in alphabetical order. Hint 2: If you've learned elsewhere about "Filter"s for data, please don't use them; the results may not work well with later steps! Hint 3: You can select a range of rows by selecting top row with the Shift
key held down, and then select the bottom row with the Shift
key still held down.vote_id
column in cell F1
. In row 2 of the vote_id
column, write a formula to combine the “vote-date” for this row with the “number” for this row, placing a colon between them. Use Fill down
to apply this formula to all the other rows.
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | vote-date | number | motion-en | name | vote | vote-id | 2 | 20130320 | 1 | FIRST PROPOSED RESOLUTION... | 何俊仁 Albert HO | Present | 20130320:1 | 3 | 20130320 | 1 | FIRST PROPOSED RESOLUTION... | 何俊賢 Steven HO | Yes | 20130320:1 | 4 | ... | ... | ... | ... | ... | ... |
vote_id
, and title it numerical-vote
. In the second row of that column, enter a formula that produces 1
if a legislator voted “No,” and a 2
if s/he voted “Yes.”, and zero for anything else. Verify that nothing went wrong and that no cell contains the word “ERROR.” Hint: Try writing an expression that uses multiple IF
functions "nested" within each other. The table should now look like this:
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | vote-date | number | motion-en | name | vote | vote-id | numerical-vote | 2 | 20130320 | 1 | FIRST PROPOSED RESOLUTION... | 何俊仁 Albert HO | Yes | 20130320:1 | 2 | 3 | 20130320 | 1 | FIRST PROPOSED RESOLUTION... | 何俊賢 Steven HO | Yes | 20130320:1 | 2 | 4 | ... | ... | ... | ... | ... | ... | ... |
Data > Pivot Table
from the toolbar. This will open up a new sheet for a pivot table based on the information you selected. If you don't know how to use pivot tables, review the tutorial.Name
, the column labels be vote_id
, and the value in the cells be the numerical_vote
that you just created.Summarize by:
menu. Try to play with sorting your values and see just how many different ways you can sort your values - you'll probably be using more of these options later. Grand Total
row and column by unchecking Show totals
in the rows and columns boxes Format > Conditional Formatting
from the toolbar. This will open up a the toolbar that lets you add rules to format cells depending on some formula.