1 00:00:05,300 --> 00:00:11,500 [Music] 2 00:00:12,480 --> 00:00:16,320 right now a special pdns presenter 3 00:00:14,960 --> 00:00:18,640 anthony green 4 00:00:16,320 --> 00:00:20,320 anthony is a veritable national treasure 5 00:00:18,640 --> 00:00:22,640 he's best known as chief election 6 00:00:20,320 --> 00:00:24,480 analyst with the australian broadcasting 7 00:00:22,640 --> 00:00:26,160 corporation and is the face of 8 00:00:24,480 --> 00:00:27,359 television election coverages in 9 00:00:26,160 --> 00:00:29,119 australia 10 00:00:27,359 --> 00:00:31,359 during the pdns anthony will be 11 00:00:29,119 --> 00:00:32,559 presenting election night analysis art 12 00:00:31,359 --> 00:00:34,239 of science 13 00:00:32,559 --> 00:00:36,960 and if we have anthony there i would 14 00:00:34,239 --> 00:00:38,800 like to throw over to you 15 00:00:36,960 --> 00:00:40,640 thank you miles now just make sure 16 00:00:38,800 --> 00:00:41,520 everyone's hearing me good 17 00:00:40,640 --> 00:00:43,600 um 18 00:00:41,520 --> 00:00:45,840 election night analysis art or science 19 00:00:43,600 --> 00:00:47,920 well the answer answered i'll answer 20 00:00:45,840 --> 00:00:50,800 right at start the answer is science 21 00:00:47,920 --> 00:00:52,480 um it's not guesswork it's not you know 22 00:00:50,800 --> 00:00:55,039 hunches and feelings 23 00:00:52,480 --> 00:00:55,920 it's hardcore maths and science 24 00:00:55,039 --> 00:00:57,920 and 25 00:00:55,920 --> 00:00:59,280 you know at that terrible moment when 26 00:00:57,920 --> 00:01:00,239 it's looking really close and you're not 27 00:00:59,280 --> 00:01:02,160 sure what's going to happen there's a 28 00:01:00,239 --> 00:01:05,199 bit of experience and i've been here 29 00:01:02,160 --> 00:01:07,040 before type thing the art the feeling 30 00:01:05,199 --> 00:01:08,799 um but you know when it comes down to it 31 00:01:07,040 --> 00:01:10,560 it's maths now 32 00:01:08,799 --> 00:01:13,200 what the whole i'm not just start with 33 00:01:10,560 --> 00:01:14,560 some credits first um 34 00:01:13,200 --> 00:01:16,080 pictures in this presentation are all 35 00:01:14,560 --> 00:01:18,320 from the australian electoral commission 36 00:01:16,080 --> 00:01:20,479 have very useful flicker site pictures 37 00:01:18,320 --> 00:01:22,159 the graphs have been prepared by me 38 00:01:20,479 --> 00:01:24,159 a statistical credit for dr ross 39 00:01:22,159 --> 00:01:26,720 cunningham who's an adjunct professor at 40 00:01:24,159 --> 00:01:29,520 anu background in statistical analysis 41 00:01:26,720 --> 00:01:31,200 and he developed many of the statistical 42 00:01:29,520 --> 00:01:33,200 methods we used to analyze elections in 43 00:01:31,200 --> 00:01:35,200 australia back in the 1980s when there 44 00:01:33,200 --> 00:01:37,520 was a lot less computer power and a lot 45 00:01:35,200 --> 00:01:39,119 less data my thanks to the abc who 46 00:01:37,520 --> 00:01:41,200 originally hired me for six months as a 47 00:01:39,119 --> 00:01:43,119 researcher my background was writing as 48 00:01:41,200 --> 00:01:44,640 computer programmer back in the 80s and 49 00:01:43,119 --> 00:01:46,720 then a background in political science 50 00:01:44,640 --> 00:01:48,320 from study and so i was the right 51 00:01:46,720 --> 00:01:51,119 combination of different skills when the 52 00:01:48,320 --> 00:01:52,960 abc was looking for someone in 1989 and 53 00:01:51,119 --> 00:01:54,880 30 years later they think i'm still 54 00:01:52,960 --> 00:01:55,840 doing useful work 55 00:01:54,880 --> 00:01:57,200 so 56 00:01:55,840 --> 00:02:00,000 election night 57 00:01:57,200 --> 00:02:01,520 it's about trying to work out 58 00:02:00,000 --> 00:02:03,119 take what these people are writing on 59 00:02:01,520 --> 00:02:04,719 their bits of paper 60 00:02:03,119 --> 00:02:06,159 and turning into who's going to run 61 00:02:04,719 --> 00:02:07,360 government for the next three years 62 00:02:06,159 --> 00:02:09,759 that's what we're doing on election 63 00:02:07,360 --> 00:02:13,440 night we're reporting all those bits of 64 00:02:09,759 --> 00:02:15,120 paper added summed up sent through to us 65 00:02:13,440 --> 00:02:16,720 when can we work out 66 00:02:15,120 --> 00:02:18,959 who's won the election 67 00:02:16,720 --> 00:02:21,440 now the long history of reporting 68 00:02:18,959 --> 00:02:23,520 election nights in australia here's one 69 00:02:21,440 --> 00:02:25,599 from wa um something that western 70 00:02:23,520 --> 00:02:28,000 australia thinks was a terrible mistake 71 00:02:25,599 --> 00:02:29,599 which was joining federation in those 72 00:02:28,000 --> 00:02:31,840 days the media used to run these big 73 00:02:29,599 --> 00:02:33,280 tally boards and they would report the 74 00:02:31,840 --> 00:02:35,040 results like this people would come in 75 00:02:33,280 --> 00:02:37,519 to watch results going up 76 00:02:35,040 --> 00:02:38,959 that process eventually i mean in those 77 00:02:37,519 --> 00:02:40,959 days 78 00:02:38,959 --> 00:02:42,800 newspapers tend to have had big 79 00:02:40,959 --> 00:02:43,920 telephony services more than the 80 00:02:42,800 --> 00:02:46,000 government did 81 00:02:43,920 --> 00:02:47,120 so that's why the newspapers used to use 82 00:02:46,000 --> 00:02:49,360 do this sort of stuff they could get the 83 00:02:47,120 --> 00:02:51,200 telegrams in quicker 84 00:02:49,360 --> 00:02:54,000 than the government 85 00:02:51,200 --> 00:02:55,440 by the 1990s this was the 1990 tally 86 00:02:54,000 --> 00:02:57,599 room in canberra 87 00:02:55,440 --> 00:02:59,440 huge great board with numbers on it 88 00:02:57,599 --> 00:03:01,120 which people used to read the old 89 00:02:59,440 --> 00:03:02,800 hardcore had been doing it for decades 90 00:03:01,120 --> 00:03:04,640 would be down front looking at the 91 00:03:02,800 --> 00:03:06,239 numbers and they could tell from four 92 00:03:04,640 --> 00:03:07,920 digit numbers who was winning or not 93 00:03:06,239 --> 00:03:09,200 something i've never been able to do i'm 94 00:03:07,920 --> 00:03:11,680 afraid i have to use statistical 95 00:03:09,200 --> 00:03:11,680 analysis 96 00:03:11,840 --> 00:03:15,760 this is the back of the tally board in 97 00:03:13,280 --> 00:03:17,360 2001 there's a bunch of names of the 98 00:03:15,760 --> 00:03:18,800 candidates the same as on the front 99 00:03:17,360 --> 00:03:20,319 these little boards have all got little 100 00:03:18,800 --> 00:03:22,159 pins and they used to put little numbers 101 00:03:20,319 --> 00:03:23,360 on them and then fit the board around 102 00:03:22,159 --> 00:03:25,519 and that's what people used to read now 103 00:03:23,360 --> 00:03:27,519 when i came into television we'd stopped 104 00:03:25,519 --> 00:03:29,680 shooting the board it had just become a 105 00:03:27,519 --> 00:03:31,840 big backdrop but 106 00:03:29,680 --> 00:03:33,760 for another two decades after 1990 they 107 00:03:31,840 --> 00:03:36,239 were still doing this board no one was 108 00:03:33,760 --> 00:03:38,000 shooting it was of much less use it was 109 00:03:36,239 --> 00:03:40,080 entirely done as a backdrop for 110 00:03:38,000 --> 00:03:42,799 television 111 00:03:40,080 --> 00:03:44,560 now australian federal elections 112 00:03:42,799 --> 00:03:46,720 for those who don't know the election 113 00:03:44,560 --> 00:03:47,920 process in australia that greatly quick 114 00:03:46,720 --> 00:03:49,599 run through that 115 00:03:47,920 --> 00:03:51,360 um you have two chambers elected on the 116 00:03:49,599 --> 00:03:54,239 same day the house representatives in 117 00:03:51,360 --> 00:03:56,400 the senate the house is elected from 151 118 00:03:54,239 --> 00:03:58,159 single member districts the senate is 119 00:03:56,400 --> 00:04:00,640 elected by proportional representation 120 00:03:58,159 --> 00:04:02,640 from six states and two territories i 121 00:04:00,640 --> 00:04:04,239 won't do anything on the senate tonight 122 00:04:02,640 --> 00:04:07,040 each house division has about a hundred 123 00:04:04,239 --> 00:04:09,439 thousand voters voting is compulsory 124 00:04:07,040 --> 00:04:10,720 turnout is usually above ninety percent 125 00:04:09,439 --> 00:04:13,360 around the country there's seven 126 00:04:10,720 --> 00:04:16,479 thousand polling places plus five plus 127 00:04:13,360 --> 00:04:17,759 pre-poll centers 400 plus mobile 128 00:04:16,479 --> 00:04:19,680 voting 129 00:04:17,759 --> 00:04:22,000 teams not trends and i've got my first 130 00:04:19,680 --> 00:04:23,919 spelling error and all voting sentences 131 00:04:22,000 --> 00:04:26,240 are counted on election night postals 132 00:04:23,919 --> 00:04:28,720 and absent votes are after 133 00:04:26,240 --> 00:04:29,840 added after election day 134 00:04:28,720 --> 00:04:31,680 australia uses what's called 135 00:04:29,840 --> 00:04:33,520 preferential voting or rank ordering 136 00:04:31,680 --> 00:04:35,680 voting there's a ballot paper on the 137 00:04:33,520 --> 00:04:37,680 right for the lecturer higgins i think 138 00:04:35,680 --> 00:04:39,680 that's 2016. 139 00:04:37,680 --> 00:04:41,919 voters must number all squares on that 140 00:04:39,680 --> 00:04:42,880 ballot paper a consecutive sequence of 141 00:04:41,919 --> 00:04:44,639 numbers 142 00:04:42,880 --> 00:04:46,080 to win a candidate must receive fifty 143 00:04:44,639 --> 00:04:47,759 percent of the vote 144 00:04:46,080 --> 00:04:49,840 if no candidate receives more than fifty 145 00:04:47,759 --> 00:04:51,440 percent of the vote then the lowest 146 00:04:49,840 --> 00:04:53,040 candidate is excluded their ballot 147 00:04:51,440 --> 00:04:54,960 papers re-examined for the next 148 00:04:53,040 --> 00:04:56,880 available preference and the tally of 149 00:04:54,960 --> 00:04:58,560 those preferences are transferred to 150 00:04:56,880 --> 00:04:59,440 another candidate another candidate in 151 00:04:58,560 --> 00:05:01,759 the account 152 00:04:59,440 --> 00:05:04,000 all the preferences that are counted are 153 00:05:01,759 --> 00:05:05,759 what is written on the ballot papers 154 00:05:04,000 --> 00:05:07,680 there's no party control no candidate 155 00:05:05,759 --> 00:05:09,520 control over that it's what voters write 156 00:05:07,680 --> 00:05:10,880 on the ballot paper 157 00:05:09,520 --> 00:05:13,360 the process of excluding and 158 00:05:10,880 --> 00:05:14,639 transferring continues until only two 159 00:05:13,360 --> 00:05:17,199 candidates remain and one of the 160 00:05:14,639 --> 00:05:19,600 advantages in australia is we do come 161 00:05:17,199 --> 00:05:21,759 down to only two candidates which makes 162 00:05:19,600 --> 00:05:24,000 all the mathematics of the modeling much 163 00:05:21,759 --> 00:05:26,000 easier than in many other countries and 164 00:05:24,000 --> 00:05:28,000 the fact the full distribution of 165 00:05:26,000 --> 00:05:30,800 preferences doesn't play take place till 166 00:05:28,000 --> 00:05:32,160 two weeks after the election so up until 167 00:05:30,800 --> 00:05:34,400 two weeks after the election when they 168 00:05:32,160 --> 00:05:36,320 formed declare winners we're working off 169 00:05:34,400 --> 00:05:38,720 preliminary figures especially on 170 00:05:36,320 --> 00:05:41,120 election night now the process of 171 00:05:38,720 --> 00:05:43,360 counting begins with this they tip the 172 00:05:41,120 --> 00:05:45,039 ballot papers out of the boxes now 173 00:05:43,360 --> 00:05:46,400 scrutineers are able to observe the 174 00:05:45,039 --> 00:05:47,440 closing and the opening of the ballot 175 00:05:46,400 --> 00:05:49,440 boxes 176 00:05:47,440 --> 00:05:51,680 uh they're able to observe all this 177 00:05:49,440 --> 00:05:54,400 process then they just tip these ballot 178 00:05:51,680 --> 00:05:55,919 boxes onto the table or and that's when 179 00:05:54,400 --> 00:05:57,360 the counting starts 180 00:05:55,919 --> 00:05:59,360 the first thing they do on election 181 00:05:57,360 --> 00:06:00,720 night is they count 182 00:05:59,360 --> 00:06:02,000 each of them is counted on the night 183 00:06:00,720 --> 00:06:03,840 after 6 pm 184 00:06:02,000 --> 00:06:06,319 the tallying of first preferences by 185 00:06:03,840 --> 00:06:07,840 candidate is done by hand it is not 186 00:06:06,319 --> 00:06:10,000 scanned it is not the american 187 00:06:07,840 --> 00:06:12,400 electronic voting it is not scanning a 188 00:06:10,000 --> 00:06:13,919 ballot papers it is done by hand 189 00:06:12,400 --> 00:06:15,680 candidates appoint scrutineers to 190 00:06:13,919 --> 00:06:17,120 observe the count they are able to 191 00:06:15,680 --> 00:06:18,800 observe the ceiling and the opening the 192 00:06:17,120 --> 00:06:21,120 ballot boxes and they are able to 193 00:06:18,800 --> 00:06:22,560 observe the count at all time and see 194 00:06:21,120 --> 00:06:24,560 the ballot papers they're not allowed to 195 00:06:22,560 --> 00:06:26,560 touch the ballot papers but they are 196 00:06:24,560 --> 00:06:28,479 able to challenge votes though on 197 00:06:26,560 --> 00:06:30,479 election night they're more trying to 198 00:06:28,479 --> 00:06:32,000 look at preference flows 199 00:06:30,479 --> 00:06:35,520 and get the tallies for their own 200 00:06:32,000 --> 00:06:37,039 internal purposes as a political party 201 00:06:35,520 --> 00:06:39,919 the number of ballot papers also varied 202 00:06:37,039 --> 00:06:42,160 against verified against the number of 203 00:06:39,919 --> 00:06:43,840 ballot papers issued and then once this 204 00:06:42,160 --> 00:06:44,639 is all done the numbers are phoned 205 00:06:43,840 --> 00:06:46,800 through 206 00:06:44,639 --> 00:06:48,479 to wherever the data entry operates for 207 00:06:46,800 --> 00:06:50,720 the electoral commissioner and the 208 00:06:48,479 --> 00:06:52,960 tallies for that polling place are 209 00:06:50,720 --> 00:06:56,160 entered so this is how the process 210 00:06:52,960 --> 00:06:58,639 occurs it's done the old-fashioned way 211 00:06:56,160 --> 00:07:00,639 bits of ballot paper put onto piles 212 00:06:58,639 --> 00:07:02,560 and at the end of that process 213 00:07:00,639 --> 00:07:04,479 they have a whole bunch of bundles of 214 00:07:02,560 --> 00:07:05,599 votes usually all bundled up into lumps 215 00:07:04,479 --> 00:07:07,120 of 100 216 00:07:05,599 --> 00:07:08,880 so they've got a tally of first 217 00:07:07,120 --> 00:07:11,680 preferences in every polling place and 218 00:07:08,880 --> 00:07:13,199 that's phone through 219 00:07:11,680 --> 00:07:15,919 what they've done now in australia since 220 00:07:13,199 --> 00:07:16,960 1993 and thankfully i didn't work before 221 00:07:15,919 --> 00:07:18,560 then because this would have been much 222 00:07:16,960 --> 00:07:20,319 harder 223 00:07:18,560 --> 00:07:22,000 is they do what's called an indicative 224 00:07:20,319 --> 00:07:23,520 preference count 225 00:07:22,000 --> 00:07:24,720 beforehand the electoral commission 226 00:07:23,520 --> 00:07:25,759 nominates 227 00:07:24,720 --> 00:07:27,919 um 228 00:07:25,759 --> 00:07:29,919 two candidates in every contest who will 229 00:07:27,919 --> 00:07:31,520 be the final pairing so they don't do 230 00:07:29,919 --> 00:07:33,199 the full distribution of preferences 231 00:07:31,520 --> 00:07:35,280 what they do is they nominate two 232 00:07:33,199 --> 00:07:36,880 candidates at the start 233 00:07:35,280 --> 00:07:38,960 they're in an envelope they open them so 234 00:07:36,880 --> 00:07:41,440 they know this after 6pm this is you 235 00:07:38,960 --> 00:07:43,280 can't know this information before 6pm 236 00:07:41,440 --> 00:07:46,560 it's only an indicative count so it's 237 00:07:43,280 --> 00:07:48,240 not important for anybody else to know 238 00:07:46,560 --> 00:07:50,560 then they examine all the bundles of 239 00:07:48,240 --> 00:07:52,240 ballot papers for the other candidates 240 00:07:50,560 --> 00:07:54,319 they go through all those and they work 241 00:07:52,240 --> 00:07:57,120 out from that list of candidates which 242 00:07:54,319 --> 00:07:58,879 of the final two gets the 243 00:07:57,120 --> 00:08:01,280 lower numbered preference on that ballot 244 00:07:58,879 --> 00:08:03,199 paper which candidate receives the 245 00:08:01,280 --> 00:08:04,960 preference from that ballot paper they 246 00:08:03,199 --> 00:08:06,879 tally all those preference flows for 247 00:08:04,960 --> 00:08:08,800 each candidate they add them to the 248 00:08:06,879 --> 00:08:10,960 first preference votes for the two final 249 00:08:08,800 --> 00:08:13,440 two candidates and then they phone those 250 00:08:10,960 --> 00:08:14,879 numbers through so from each polling 251 00:08:13,440 --> 00:08:17,599 place and each pre-poll center on 252 00:08:14,879 --> 00:08:21,039 election night we get two totals the 253 00:08:17,599 --> 00:08:22,479 first one is that first preference tally 254 00:08:21,039 --> 00:08:24,879 and the second one 255 00:08:22,479 --> 00:08:26,960 is the is the two-party preferred or two 256 00:08:24,879 --> 00:08:28,639 candidate preferred two-party preferred 257 00:08:26,960 --> 00:08:30,639 tends to prefer labor versus the 258 00:08:28,639 --> 00:08:32,560 coalition the two major parties 259 00:08:30,639 --> 00:08:34,080 sometimes the major part is excluded and 260 00:08:32,560 --> 00:08:36,159 we have what's called two candidate 261 00:08:34,080 --> 00:08:38,240 preferred in the end it all comes down 262 00:08:36,159 --> 00:08:39,760 to two candidates in every seat and it's 263 00:08:38,240 --> 00:08:41,599 worked out beforehand who in the most 264 00:08:39,760 --> 00:08:42,959 likely final two for this count if they 265 00:08:41,599 --> 00:08:44,880 get account wrong they do it again after 266 00:08:42,959 --> 00:08:47,040 the election but this is just done for 267 00:08:44,880 --> 00:08:48,959 extra information to help us understand 268 00:08:47,040 --> 00:08:50,720 this came about after 1990 federal 269 00:08:48,959 --> 00:08:52,240 election when there's a huge vote for 270 00:08:50,720 --> 00:08:54,560 the democrats nobody had preference 271 00:08:52,240 --> 00:08:56,160 counts and it was unclear who'd won so 272 00:08:54,560 --> 00:08:58,399 they introduced this process for the 273 00:08:56,160 --> 00:08:59,760 future and as the proportion of vote 274 00:08:58,399 --> 00:09:01,519 that's gone to minor parties has 275 00:08:59,760 --> 00:09:03,279 increased this has become ever more 276 00:09:01,519 --> 00:09:04,880 important 277 00:09:03,279 --> 00:09:06,000 what happens then is these numbers are 278 00:09:04,880 --> 00:09:08,480 phone through 279 00:09:06,000 --> 00:09:10,080 data entered into the ac's computer 280 00:09:08,480 --> 00:09:13,120 system transmitted to their central 281 00:09:10,080 --> 00:09:15,120 server put in a database and that runs 282 00:09:13,120 --> 00:09:16,720 the aec's virtual tally room if you've 283 00:09:15,120 --> 00:09:18,880 used it online 284 00:09:16,720 --> 00:09:21,279 there's also an xml strip published 285 00:09:18,880 --> 00:09:24,080 every two minutes of all the data 286 00:09:21,279 --> 00:09:25,279 so it's a total for every candidate in 287 00:09:24,080 --> 00:09:26,959 every electorate 288 00:09:25,279 --> 00:09:29,040 both first preference and two candidate 289 00:09:26,959 --> 00:09:31,040 preferred and those first preferences 290 00:09:29,040 --> 00:09:33,120 and two candidate also reported by 291 00:09:31,040 --> 00:09:34,880 polling place so we have this whacking 292 00:09:33,120 --> 00:09:36,640 great file with all that information in 293 00:09:34,880 --> 00:09:38,480 there there's various versions of it 294 00:09:36,640 --> 00:09:40,080 there's what's called the both versions 295 00:09:38,480 --> 00:09:42,240 which have all the strings like names 296 00:09:40,080 --> 00:09:43,920 and these other ones that just have the 297 00:09:42,240 --> 00:09:46,640 votes 298 00:09:43,920 --> 00:09:48,240 the files also contain historical data 299 00:09:46,640 --> 00:09:50,720 for each polling place and for each 300 00:09:48,240 --> 00:09:52,959 candidate and that is made available to 301 00:09:50,720 --> 00:09:54,959 the to the media as well beforehand but 302 00:09:52,959 --> 00:09:57,040 you can use it directly from the xml 303 00:09:54,959 --> 00:09:59,200 file if you don't have a database but 304 00:09:57,040 --> 00:10:00,880 for for various reasons we in the media 305 00:09:59,200 --> 00:10:05,519 prefer to have the data in our own 306 00:10:00,880 --> 00:10:08,000 system and or do our own calculations 307 00:10:05,519 --> 00:10:10,160 the abc computer is pre-loaded with all 308 00:10:08,000 --> 00:10:12,560 the electorates and candidate details 309 00:10:10,160 --> 00:10:14,320 polling place details including history 310 00:10:12,560 --> 00:10:16,640 estimated preference formulas that can 311 00:10:14,320 --> 00:10:17,680 be used until actual preference counts 312 00:10:16,640 --> 00:10:19,279 received 313 00:10:17,680 --> 00:10:21,680 and where we judge the aacs pick the 314 00:10:19,279 --> 00:10:23,200 wrong pairing of candidates 315 00:10:21,680 --> 00:10:24,959 as well as calculation information the 316 00:10:23,200 --> 00:10:27,600 data blaze space includes attributes 317 00:10:24,959 --> 00:10:30,079 that determine party colors ordering 318 00:10:27,600 --> 00:10:32,160 details for graphics for online and for 319 00:10:30,079 --> 00:10:34,240 picture names the abc's database 320 00:10:32,160 --> 00:10:36,000 provides the structure through which 321 00:10:34,240 --> 00:10:38,079 we're able to analyze the data we're 322 00:10:36,000 --> 00:10:39,920 just getting raw data we have to have 323 00:10:38,079 --> 00:10:42,880 our own structure which we use to 324 00:10:39,920 --> 00:10:46,000 analyze the data and aggregate the data 325 00:10:42,880 --> 00:10:48,000 now the abc passes the aec data file 326 00:10:46,000 --> 00:10:50,000 strips out the results by polling place 327 00:10:48,000 --> 00:10:52,480 an electorate and stores in our system 328 00:10:50,000 --> 00:10:54,240 we check it for obvious calculation 329 00:10:52,480 --> 00:10:56,160 errors and then our abc computer 330 00:10:54,240 --> 00:10:58,399 performs its predictive calculations on 331 00:10:56,160 --> 00:11:01,200 all seats and every up after every 332 00:10:58,399 --> 00:11:03,600 update and after any internal parameters 333 00:11:01,200 --> 00:11:05,120 change the computer also generates json 334 00:11:03,600 --> 00:11:07,200 output which is used for television 335 00:11:05,120 --> 00:11:08,880 graphics and for publishing our abc 336 00:11:07,200 --> 00:11:10,320 online system now i'm going to turn our 337 00:11:08,880 --> 00:11:12,240 camera off because we've been um we've 338 00:11:10,320 --> 00:11:16,320 having some camera problems but i'll 339 00:11:12,240 --> 00:11:18,240 continue with the the slide presentation 340 00:11:16,320 --> 00:11:19,279 um 341 00:11:18,240 --> 00:11:21,440 now 342 00:11:19,279 --> 00:11:22,320 let's call this camera 343 00:11:21,440 --> 00:11:24,560 now 344 00:11:22,320 --> 00:11:26,320 post election counting um 345 00:11:24,560 --> 00:11:28,880 this is 346 00:11:26,320 --> 00:11:30,399 after election night so this is i'm 347 00:11:28,880 --> 00:11:31,680 raising these issues just because people 348 00:11:30,399 --> 00:11:33,600 always have doubts of how the election 349 00:11:31,680 --> 00:11:35,519 accounting works and stuff this is just 350 00:11:33,600 --> 00:11:37,600 to tell you that the system has lots of 351 00:11:35,519 --> 00:11:39,200 other checks in all votes counted on 352 00:11:37,600 --> 00:11:40,640 election night are transferred overnight 353 00:11:39,200 --> 00:11:41,760 to the returning officer for the 354 00:11:40,640 --> 00:11:43,600 district 355 00:11:41,760 --> 00:11:45,279 all first preference and two candidate 356 00:11:43,600 --> 00:11:47,440 tallies are check counted over several 357 00:11:45,279 --> 00:11:49,920 days the account conducted by different 358 00:11:47,440 --> 00:11:51,360 staff and again watched by scrutineers 359 00:11:49,920 --> 00:11:53,200 not necessarily the same ones that 360 00:11:51,360 --> 00:11:55,519 washed on the night the indicative 361 00:11:53,200 --> 00:11:57,519 preference counts are redone 362 00:11:55,519 --> 00:12:00,000 and they're redone entirely if the wrong 363 00:11:57,519 --> 00:12:02,000 candidates were chosen postals absence 364 00:12:00,000 --> 00:12:04,240 and provisionals are added up added over 365 00:12:02,000 --> 00:12:06,079 for over fortnight after the election 366 00:12:04,240 --> 00:12:08,480 after a fortnight a full distribution of 367 00:12:06,079 --> 00:12:11,760 preferences is done which is effectively 368 00:12:08,480 --> 00:12:14,160 a third count of all non food of all 369 00:12:11,760 --> 00:12:16,800 votes for minor parties and if not 370 00:12:14,160 --> 00:12:18,320 already done a formal win it is declared 371 00:12:16,800 --> 00:12:19,120 at this point 372 00:12:18,320 --> 00:12:20,959 now 373 00:12:19,120 --> 00:12:23,120 just to explain how we've got to this 374 00:12:20,959 --> 00:12:26,000 point over the years 375 00:12:23,120 --> 00:12:28,160 the ac's first computers uh started to 376 00:12:26,000 --> 00:12:29,839 use them in 1980s we had to be in the 377 00:12:28,160 --> 00:12:31,120 tally room to get the data by exonic 378 00:12:29,839 --> 00:12:33,600 soft feed 379 00:12:31,120 --> 00:12:36,160 um they began to add historical data to 380 00:12:33,600 --> 00:12:39,519 the feed in 1990 um preference counts 381 00:12:36,160 --> 00:12:41,200 were added in 1993 they switched to xml 382 00:12:39,519 --> 00:12:42,160 as the export format rather than the old 383 00:12:41,200 --> 00:12:44,880 feed 384 00:12:42,160 --> 00:12:47,440 published to an ftp site 385 00:12:44,880 --> 00:12:50,160 in the mid 2000s the last physical room 386 00:12:47,440 --> 00:12:53,040 tally room was in 20 2010. 387 00:12:50,160 --> 00:12:54,959 the abc had originally a pdp-11 system 388 00:12:53,040 --> 00:12:57,680 when i started it was turned into a pc 389 00:12:54,959 --> 00:12:59,440 network running unix later linux 390 00:12:57,680 --> 00:13:01,519 um 391 00:12:59,440 --> 00:13:04,480 it was a memory map it you know there 392 00:13:01,519 --> 00:13:07,839 was um written in c first gui interface 393 00:13:04,480 --> 00:13:10,000 was 2008 uh it was rewritten in dot net 394 00:13:07,839 --> 00:13:11,440 c plus plus in about 2010 395 00:13:10,000 --> 00:13:14,160 and most recently last year was 396 00:13:11,440 --> 00:13:16,399 rewritten to run on an amazon server um 397 00:13:14,160 --> 00:13:18,639 with various other bits and pieces 398 00:13:16,399 --> 00:13:20,480 so it's been a long history of changes 399 00:13:18,639 --> 00:13:22,880 which has been sort of mirroring the way 400 00:13:20,480 --> 00:13:23,760 the internet has changed largely 401 00:13:22,880 --> 00:13:25,600 now 402 00:13:23,760 --> 00:13:27,279 election night the statistical problem 403 00:13:25,600 --> 00:13:29,120 let me start with a basic statistics 404 00:13:27,279 --> 00:13:30,560 question if you have a large bag full of 405 00:13:29,120 --> 00:13:32,320 a thousand black and white ping pong 406 00:13:30,560 --> 00:13:34,480 balls the balls have been thoroughly 407 00:13:32,320 --> 00:13:35,920 mixed what size sample do you need to 408 00:13:34,480 --> 00:13:38,480 draw to be certain of the ratio of 409 00:13:35,920 --> 00:13:40,720 blacks and white balls now the election 410 00:13:38,480 --> 00:13:42,399 night question is similar at 6 pm on 411 00:13:40,720 --> 00:13:44,800 election night each electorate has 100 412 00:13:42,399 --> 00:13:46,720 000 ballot papers sealed in ballot boxes 413 00:13:44,800 --> 00:13:48,480 and envelopes how many have to be 414 00:13:46,720 --> 00:13:51,360 counted and reported before we can be 415 00:13:48,480 --> 00:13:54,079 certain of the ratio of labor to liberal 416 00:13:51,360 --> 00:13:56,079 votes now the two problems are the same 417 00:13:54,079 --> 00:13:58,000 except with election night we have to 418 00:13:56,079 --> 00:13:59,839 take steps to account for the samples 419 00:13:58,000 --> 00:14:01,279 not being random 420 00:13:59,839 --> 00:14:03,519 we also have the advantage that the 421 00:14:01,279 --> 00:14:05,920 figures are progressive so each update 422 00:14:03,519 --> 00:14:07,839 is a larger sample of effectively but 423 00:14:05,920 --> 00:14:08,720 it's a cluster sample not a random 424 00:14:07,839 --> 00:14:10,320 sample 425 00:14:08,720 --> 00:14:12,959 but we do have the advantage of these 426 00:14:10,320 --> 00:14:15,040 clusters we know what the voting was in 427 00:14:12,959 --> 00:14:17,920 them was last time so it's the same 428 00:14:15,040 --> 00:14:19,839 basic statistics with a whole bunch of 429 00:14:17,920 --> 00:14:20,800 statistical methods to try and overcome 430 00:14:19,839 --> 00:14:22,560 the fact 431 00:14:20,800 --> 00:14:25,440 that um you have to account for the data 432 00:14:22,560 --> 00:14:27,680 not being random 433 00:14:25,440 --> 00:14:29,519 now predicting every electorate 434 00:14:27,680 --> 00:14:31,440 um 435 00:14:29,519 --> 00:14:33,360 every every electorate will finish with 436 00:14:31,440 --> 00:14:35,920 a final value p which is the winning 437 00:14:33,360 --> 00:14:37,519 candidate's proportion of all votes we 438 00:14:35,920 --> 00:14:39,600 often call this two-party preferred or 439 00:14:37,519 --> 00:14:41,519 two candidate preferred and we express 440 00:14:39,600 --> 00:14:43,680 it as a percentage 55 441 00:14:41,519 --> 00:14:45,279 to candidate preferred at all times in 442 00:14:43,680 --> 00:14:47,519 the count we have a point estimate a 443 00:14:45,279 --> 00:14:49,279 little p and that's our current estimate 444 00:14:47,519 --> 00:14:51,680 of the final result 445 00:14:49,279 --> 00:14:53,600 now our techniques are to minimize 446 00:14:51,680 --> 00:14:55,680 statistical bias 447 00:14:53,600 --> 00:14:57,440 that is things that make p an unreliable 448 00:14:55,680 --> 00:14:59,120 estimator 449 00:14:57,440 --> 00:15:00,800 and we also want to use methods that 450 00:14:59,120 --> 00:15:02,880 minimize the amount of variance the 451 00:15:00,800 --> 00:15:04,320 amount of variability the amount of 452 00:15:02,880 --> 00:15:05,279 moving up and down that the graphs will 453 00:15:04,320 --> 00:15:07,360 do 454 00:15:05,279 --> 00:15:08,639 and all points in the count at all 455 00:15:07,360 --> 00:15:10,639 points in account we construct a 456 00:15:08,639 --> 00:15:12,560 confidence interval for p having 457 00:15:10,639 --> 00:15:14,480 minimized bias and variance 458 00:15:12,560 --> 00:15:16,320 and calculate the probability that p is 459 00:15:14,480 --> 00:15:18,560 greater than 50 percent and if it's 460 00:15:16,320 --> 00:15:20,800 greater than 50 percent with a 99 461 00:15:18,560 --> 00:15:22,720 probability we make the binary solution 462 00:15:20,800 --> 00:15:24,480 to give it one side or the other 463 00:15:22,720 --> 00:15:25,680 and the overall result of the election 464 00:15:24,480 --> 00:15:27,760 is just simply the sum of the 465 00:15:25,680 --> 00:15:30,320 probabilities in individual seats with 466 00:15:27,760 --> 00:15:32,160 an error margin so this sounds 467 00:15:30,320 --> 00:15:33,680 all the effort goes into the individual 468 00:15:32,160 --> 00:15:35,519 seat results not into the overall 469 00:15:33,680 --> 00:15:37,600 prediction the overall prediction just 470 00:15:35,519 --> 00:15:39,839 falls out of the mathematics in each 471 00:15:37,600 --> 00:15:41,360 seat 472 00:15:39,839 --> 00:15:42,880 now we have a number of underlying 473 00:15:41,360 --> 00:15:44,480 assumptions one is that it tends to be a 474 00:15:42,880 --> 00:15:46,720 uniform swing 475 00:15:44,480 --> 00:15:48,720 from polling place to polling place and 476 00:15:46,720 --> 00:15:50,560 as i'll show in a while there is there 477 00:15:48,720 --> 00:15:51,920 you can make that assumption 478 00:15:50,560 --> 00:15:53,600 and um 479 00:15:51,920 --> 00:15:55,279 we also assume most people vote the same 480 00:15:53,600 --> 00:15:57,360 election as last time though that's 481 00:15:55,279 --> 00:15:59,440 begun to change in recent years with the 482 00:15:57,360 --> 00:16:00,639 increase in pre-poll voting 483 00:15:59,440 --> 00:16:02,320 just to comment 484 00:16:00,639 --> 00:16:04,800 comparing with the us we have much 485 00:16:02,320 --> 00:16:06,160 higher quality data here we have a much 486 00:16:04,800 --> 00:16:08,160 simpler issue because we have a two 487 00:16:06,160 --> 00:16:10,560 candidate contest in the u.s you haven't 488 00:16:08,160 --> 00:16:12,959 try and predict when the gap between the 489 00:16:10,560 --> 00:16:15,519 two candidates stabilizes we can work on 490 00:16:12,959 --> 00:16:16,560 this figure of being over 50 491 00:16:15,519 --> 00:16:19,360 um 492 00:16:16,560 --> 00:16:21,759 now what are the sources of error 493 00:16:19,360 --> 00:16:24,639 that we've got here we've got bias 494 00:16:21,759 --> 00:16:27,279 in elections that mainly many rural and 495 00:16:24,639 --> 00:16:29,199 especially mixed rural urban seats 496 00:16:27,279 --> 00:16:31,120 display a strong positive correlation 497 00:16:29,199 --> 00:16:33,279 between booth size and labor two-party 498 00:16:31,120 --> 00:16:35,519 preferred vote small rural polling 499 00:16:33,279 --> 00:16:37,839 places record lower labor vote than 500 00:16:35,519 --> 00:16:39,600 large urban ones now this matters 501 00:16:37,839 --> 00:16:42,160 because small polling places are quicker 502 00:16:39,600 --> 00:16:44,959 to count and first the report and 503 00:16:42,160 --> 00:16:47,199 therefore there's a resultant labor vote 504 00:16:44,959 --> 00:16:50,160 low labor vote early in the evening 505 00:16:47,199 --> 00:16:51,759 which is statistical bias in in terms of 506 00:16:50,160 --> 00:16:53,440 our point estimator one thing i'll say 507 00:16:51,759 --> 00:16:55,519 about this was um if you're familiar 508 00:16:53,440 --> 00:16:57,680 with the plate don's party that all 509 00:16:55,519 --> 00:17:00,000 began with labor vote high and falling 510 00:16:57,680 --> 00:17:03,839 through the night that used to happen 511 00:17:00,000 --> 00:17:05,919 before 1987 in 1987 they introduced the 512 00:17:03,839 --> 00:17:08,000 change so that all polling places 513 00:17:05,919 --> 00:17:09,439 counted at the start of the night before 514 00:17:08,000 --> 00:17:11,439 then because of lack of phones and the 515 00:17:09,439 --> 00:17:12,799 things a lot of country polling places 516 00:17:11,439 --> 00:17:14,559 were brought to town before they were 517 00:17:12,799 --> 00:17:17,120 counted and that meant the labor vote 518 00:17:14,559 --> 00:17:18,559 was high and fell through the night 1987 519 00:17:17,120 --> 00:17:20,480 was the first election where that trend 520 00:17:18,559 --> 00:17:21,760 reversed and that caused a lot of 521 00:17:20,480 --> 00:17:23,839 embarrassment for a lot of people who 522 00:17:21,760 --> 00:17:26,000 went on past trends and said that john 523 00:17:23,839 --> 00:17:27,520 howard had won the 87 election 524 00:17:26,000 --> 00:17:29,280 which he didn't 525 00:17:27,520 --> 00:17:31,440 now the other thing is variance past 526 00:17:29,280 --> 00:17:33,280 results can be used to calculate a range 527 00:17:31,440 --> 00:17:35,840 of variance of polling place results in 528 00:17:33,280 --> 00:17:38,080 each c seats with lower variance can be 529 00:17:35,840 --> 00:17:40,720 given away more quickly 530 00:17:38,080 --> 00:17:42,559 and and we also aim to adopt a 531 00:17:40,720 --> 00:17:43,760 statistical method 532 00:17:42,559 --> 00:17:47,280 where 533 00:17:43,760 --> 00:17:48,480 the variance of the vote the variance is 534 00:17:47,280 --> 00:17:50,640 smaller 535 00:17:48,480 --> 00:17:52,960 and that's happens to be with swing 536 00:17:50,640 --> 00:17:55,200 rather than vote so rather than using 537 00:17:52,960 --> 00:17:56,240 the vote the current vote to predict the 538 00:17:55,200 --> 00:17:58,160 result 539 00:17:56,240 --> 00:17:59,840 we use the change in vote 540 00:17:58,160 --> 00:18:01,520 from the same polling place as last time 541 00:17:59,840 --> 00:18:03,360 as a way i'll explain in a moment the 542 00:18:01,520 --> 00:18:05,360 americans always do this as statics they 543 00:18:03,360 --> 00:18:07,919 start they report the numbers report the 544 00:18:05,360 --> 00:18:09,840 numbers we in australia report the swing 545 00:18:07,919 --> 00:18:11,760 and show the projected value and it's 546 00:18:09,840 --> 00:18:13,679 much more reliable and it's why 547 00:18:11,760 --> 00:18:15,280 um you'll have that bloke running the 548 00:18:13,679 --> 00:18:17,280 screens in the united states talking 549 00:18:15,280 --> 00:18:19,360 about well the democrats are ahead but 550 00:18:17,280 --> 00:18:21,440 when this area comes in the panel handle 551 00:18:19,360 --> 00:18:23,200 of florida comes in the republicans will 552 00:18:21,440 --> 00:18:25,280 lead we don't have to talk about numbers 553 00:18:23,200 --> 00:18:27,679 here because we do that projection in 554 00:18:25,280 --> 00:18:30,080 how we present our data now to explain 555 00:18:27,679 --> 00:18:32,160 how all this works i'll move on to the 556 00:18:30,080 --> 00:18:33,200 example of an electorate 557 00:18:32,160 --> 00:18:35,120 and 558 00:18:33,200 --> 00:18:36,640 this is the electorate of braddon 559 00:18:35,120 --> 00:18:38,080 in tasmania 560 00:18:36,640 --> 00:18:39,520 just let me find my notes because i had 561 00:18:38,080 --> 00:18:41,360 a bit of scurrying around there for a 562 00:18:39,520 --> 00:18:42,559 few minutes 563 00:18:41,360 --> 00:18:44,320 brandon's up in the northwest of 564 00:18:42,559 --> 00:18:47,039 tasmania you can see it's a it's got a 565 00:18:44,320 --> 00:18:49,679 lot of blue dots a few red dots 566 00:18:47,039 --> 00:18:51,520 it's uh traditional swings cedar swung 567 00:18:49,679 --> 00:18:53,440 changed parties six of the last election 568 00:18:51,520 --> 00:18:55,039 eight elections a lot of big urban 569 00:18:53,440 --> 00:18:57,520 booths a lot of small 570 00:18:55,039 --> 00:19:00,240 country booths which have strong liberal 571 00:18:57,520 --> 00:19:02,320 votes and a smaller number of small 572 00:19:00,240 --> 00:19:05,120 labor boos on the west coast form a 573 00:19:02,320 --> 00:19:06,400 mining town so this is a uh this is a an 574 00:19:05,120 --> 00:19:09,120 electorate where you want to know what 575 00:19:06,400 --> 00:19:11,840 the data is like and where it's from 576 00:19:09,120 --> 00:19:14,799 this is a scatter plot from the 2019 577 00:19:11,840 --> 00:19:16,720 election the the grey dots are the 578 00:19:14,799 --> 00:19:19,600 two-party preferreds 579 00:19:16,720 --> 00:19:22,640 um for each uh 580 00:19:19,600 --> 00:19:25,360 each polling place now the yellow area 581 00:19:22,640 --> 00:19:26,720 is the two standard 582 00:19:25,360 --> 00:19:28,559 standard error confidence interval 583 00:19:26,720 --> 00:19:31,039 ninety-five percent confidence interval 584 00:19:28,559 --> 00:19:33,280 the variance on these polling places is 585 00:19:31,039 --> 00:19:35,280 eight point nine percent so two standard 586 00:19:33,280 --> 00:19:37,200 areas there's a seventeen percent range 587 00:19:35,280 --> 00:19:39,120 of results to have ninety-five percent 588 00:19:37,200 --> 00:19:40,880 of the polling places now i'll show you 589 00:19:39,120 --> 00:19:42,320 the two attributes with my mouse the 590 00:19:40,880 --> 00:19:44,559 first is you'll notice there's a bigger 591 00:19:42,320 --> 00:19:46,880 variance at the start and that's because 592 00:19:44,559 --> 00:19:48,480 a lot more small polling places small 593 00:19:46,880 --> 00:19:50,720 polling places are usually from small 594 00:19:48,480 --> 00:19:52,480 towns which are more homogenous than 595 00:19:50,720 --> 00:19:54,000 larger urban centers so there's a bit 596 00:19:52,480 --> 00:19:55,679 more variance particularly in a seat 597 00:19:54,000 --> 00:19:57,600 like this you'll also notice there's 598 00:19:55,679 --> 00:20:01,520 some bias there's a lot more of these 599 00:19:57,600 --> 00:20:03,840 dots above the 53.1 which was the final 600 00:20:01,520 --> 00:20:06,320 result of the election so on those early 601 00:20:03,840 --> 00:20:08,000 figures if the small birds come in first 602 00:20:06,320 --> 00:20:09,360 you're going to have a lot more 603 00:20:08,000 --> 00:20:10,640 variability 604 00:20:09,360 --> 00:20:12,240 you're also getting a lot of bias 605 00:20:10,640 --> 00:20:14,720 because there's a lot more liberal votes 606 00:20:12,240 --> 00:20:16,960 there in those early figures so if i go 607 00:20:14,720 --> 00:20:18,880 on to the next graph this graph is 608 00:20:16,960 --> 00:20:21,159 the same dots it's the biggest scale but 609 00:20:18,880 --> 00:20:22,799 it's the same dots and over it i've 610 00:20:21,159 --> 00:20:25,600 superimposed 611 00:20:22,799 --> 00:20:27,600 the progressive two candidate preferred 612 00:20:25,600 --> 00:20:29,440 two-party preferred from those polling 613 00:20:27,600 --> 00:20:30,320 places as they come in and as you can 614 00:20:29,440 --> 00:20:32,000 see 615 00:20:30,320 --> 00:20:35,039 in this area here there's a lot of 616 00:20:32,000 --> 00:20:36,720 variability it bounces around it's also 617 00:20:35,039 --> 00:20:38,559 biased towards the liberal party which 618 00:20:36,720 --> 00:20:41,120 is the area above 50 619 00:20:38,559 --> 00:20:43,600 so for quite a long while and these 620 00:20:41,120 --> 00:20:45,520 these graphs are in 621 00:20:43,600 --> 00:20:47,679 in polling place size order i've 622 00:20:45,520 --> 00:20:50,080 arranged them from smallest to largest 623 00:20:47,679 --> 00:20:51,919 which is not how it comes in i'll show 624 00:20:50,080 --> 00:20:54,400 you another example later but this is 625 00:20:51,919 --> 00:20:57,679 the worst case scenario and it takes a 626 00:20:54,400 --> 00:20:59,440 long time for this figure to settle down 627 00:20:57,679 --> 00:21:00,640 uh just let me consult my notes if 628 00:20:59,440 --> 00:21:01,679 there's anything else i've got to stay 629 00:21:00,640 --> 00:21:03,919 here 630 00:21:01,679 --> 00:21:06,320 no that's why that's the key thing that 631 00:21:03,919 --> 00:21:08,640 you've always got to watch for is this 632 00:21:06,320 --> 00:21:10,320 early figure now 633 00:21:08,640 --> 00:21:12,559 this next graph actually i'll stay on 634 00:21:10,320 --> 00:21:14,240 this graph um 635 00:21:12,559 --> 00:21:16,080 dr ross cunningham back in the 80s 636 00:21:14,240 --> 00:21:18,400 worked out a method to correct for this 637 00:21:16,080 --> 00:21:20,080 bias if you looked at the every one of 638 00:21:18,400 --> 00:21:22,080 these electrodes has a characteristic 639 00:21:20,080 --> 00:21:24,000 curve like that 640 00:21:22,080 --> 00:21:26,240 he went away and he worked out what's 641 00:21:24,000 --> 00:21:28,480 called a bias corrective method he he 642 00:21:26,240 --> 00:21:30,960 plotted what that characteristic curve 643 00:21:28,480 --> 00:21:33,840 normally looked back like and corrected 644 00:21:30,960 --> 00:21:35,520 for it he then built in for many seats 645 00:21:33,840 --> 00:21:37,200 and built in an overall regression model 646 00:21:35,520 --> 00:21:39,280 to pick the winner he was less concerned 647 00:21:37,200 --> 00:21:40,960 about picking individual seats than 648 00:21:39,280 --> 00:21:42,320 correcting for bias for an overall 649 00:21:40,960 --> 00:21:44,400 regression model to pick the winner of 650 00:21:42,320 --> 00:21:46,000 the election we've we've adapted his 651 00:21:44,400 --> 00:21:48,480 model to be more about picking 652 00:21:46,000 --> 00:21:50,159 individual seats and for the newer 653 00:21:48,480 --> 00:21:51,600 methods which are now available we don't 654 00:21:50,159 --> 00:21:53,280 need to do we're not as reliant on 655 00:21:51,600 --> 00:21:55,600 regression now 656 00:21:53,280 --> 00:21:58,159 this is the same data from 2019 and 657 00:21:55,600 --> 00:21:59,120 against it i've plotted the progressive 658 00:21:58,159 --> 00:22:01,520 numbers 659 00:21:59,120 --> 00:22:02,559 for 2016 which is the green line at the 660 00:22:01,520 --> 00:22:04,480 bottom 661 00:22:02,559 --> 00:22:05,679 it's the same data 662 00:22:04,480 --> 00:22:07,679 now 663 00:22:05,679 --> 00:22:09,039 what you can see here is if you can know 664 00:22:07,679 --> 00:22:11,200 what that graph is going to look like 665 00:22:09,039 --> 00:22:13,280 from last time you can plot where this 666 00:22:11,200 --> 00:22:15,360 regression's gone where this figure is 667 00:22:13,280 --> 00:22:18,320 going to end up there's a very good 668 00:22:15,360 --> 00:22:19,679 match between those two numbers uh that 669 00:22:18,320 --> 00:22:22,400 we're seeing here 670 00:22:19,679 --> 00:22:24,159 let me get my graphs yes um if you look 671 00:22:22,400 --> 00:22:26,480 at the gap between the two lines you can 672 00:22:24,159 --> 00:22:28,080 see that this this gap is stable 673 00:22:26,480 --> 00:22:29,760 if you know where this green line is 674 00:22:28,080 --> 00:22:32,240 going you know where the black line is 675 00:22:29,760 --> 00:22:34,400 going and that's that's 676 00:22:32,240 --> 00:22:36,000 um that's the method we use to call the 677 00:22:34,400 --> 00:22:37,440 election now the other thing that's to 678 00:22:36,000 --> 00:22:39,600 say 679 00:22:37,440 --> 00:22:40,720 is the gap between the two lines is the 680 00:22:39,600 --> 00:22:43,679 swing 681 00:22:40,720 --> 00:22:46,240 it's the change of vote at every point 682 00:22:43,679 --> 00:22:47,840 on that graph you've got a current total 683 00:22:46,240 --> 00:22:50,320 and you've got a historical total of the 684 00:22:47,840 --> 00:22:52,559 same figures that difference in between 685 00:22:50,320 --> 00:22:55,039 the two numbers is the swing and this 686 00:22:52,559 --> 00:22:57,679 graph shows the gap between those two 687 00:22:55,039 --> 00:23:00,000 lines doesn't have a lot of variability 688 00:22:57,679 --> 00:23:01,200 it has a lot less variability 689 00:23:00,000 --> 00:23:03,600 than the 690 00:23:01,200 --> 00:23:04,880 the the first preference line and that 691 00:23:03,600 --> 00:23:06,320 can be seen 692 00:23:04,880 --> 00:23:09,679 let me look now 693 00:23:06,320 --> 00:23:11,919 so so let me i'll come back to how we 694 00:23:09,679 --> 00:23:13,200 use this but the key point to make is 695 00:23:11,919 --> 00:23:15,919 that gap 696 00:23:13,200 --> 00:23:17,440 is the swing and if you can you rely on 697 00:23:15,919 --> 00:23:20,880 the swing you've got something really 698 00:23:17,440 --> 00:23:24,320 useful to use now this next graph 699 00:23:20,880 --> 00:23:27,760 this is the um graph of the swings by 700 00:23:24,320 --> 00:23:30,320 polling place again arranged by um 701 00:23:27,760 --> 00:23:32,000 by polling place size but just let me go 702 00:23:30,320 --> 00:23:33,919 back this has got the same range on the 703 00:23:32,000 --> 00:23:36,080 y-axis of 60 704 00:23:33,919 --> 00:23:37,840 if i go back to that previous one 705 00:23:36,080 --> 00:23:41,120 there's a huge 706 00:23:37,840 --> 00:23:43,919 um 34 point range in the variance and 707 00:23:41,120 --> 00:23:45,760 the the dots are all over the place 708 00:23:43,919 --> 00:23:48,159 the swing from polling place to polling 709 00:23:45,760 --> 00:23:50,240 place it doesn't have this it has a 710 00:23:48,159 --> 00:23:52,960 little bit of a cluster at the start but 711 00:23:50,240 --> 00:23:55,279 it's not biased it's not above or below 712 00:23:52,960 --> 00:23:57,039 in any particular order so we've got 713 00:23:55,279 --> 00:23:59,440 here something which doesn't have an 714 00:23:57,039 --> 00:24:01,840 early bias and has only got half of the 715 00:23:59,440 --> 00:24:04,400 standard deviation so you've got a more 716 00:24:01,840 --> 00:24:06,559 reliable estimator to use if you can 717 00:24:04,400 --> 00:24:08,720 operate on the swing and that's what we 718 00:24:06,559 --> 00:24:11,360 do the simple two candidate preferred 719 00:24:08,720 --> 00:24:14,000 which was that first black line 720 00:24:11,360 --> 00:24:15,919 is just the current total and the method 721 00:24:14,000 --> 00:24:18,799 has the problem that the bias and the 722 00:24:15,919 --> 00:24:21,440 large variance is built into that number 723 00:24:18,799 --> 00:24:24,880 the simple swing is the current 2cp 724 00:24:21,440 --> 00:24:28,480 minus the final 2cp from last time 725 00:24:24,880 --> 00:24:30,720 but because the 2cp you're using is bias 726 00:24:28,480 --> 00:24:32,640 both the swing and the 2cp are going to 727 00:24:30,720 --> 00:24:34,720 be biased and have various problems in 728 00:24:32,640 --> 00:24:36,720 the same manner what we do is what's 729 00:24:34,720 --> 00:24:39,760 called a match two candidate preferred 730 00:24:36,720 --> 00:24:41,840 analysis which uses the unbiased and low 731 00:24:39,760 --> 00:24:44,640 variance polling place swings 732 00:24:41,840 --> 00:24:46,799 at every point on the count 733 00:24:44,640 --> 00:24:50,720 our current count compares the current 734 00:24:46,799 --> 00:24:53,360 2cp to this historical 2cp subtracts one 735 00:24:50,720 --> 00:24:55,440 from the other and gets a match to swing 736 00:24:53,360 --> 00:24:57,120 which that match swing is the gap 737 00:24:55,440 --> 00:24:59,120 between those two graphs i showed you a 738 00:24:57,120 --> 00:25:00,480 moment ago and then what you do with 739 00:24:59,120 --> 00:25:02,799 that match swing 740 00:25:00,480 --> 00:25:04,080 is you ask add that swing to the two 741 00:25:02,799 --> 00:25:05,200 candidate preferred for the last 742 00:25:04,080 --> 00:25:07,360 election 743 00:25:05,200 --> 00:25:09,360 and what you get on this next graph the 744 00:25:07,360 --> 00:25:12,559 black graph is the same as on the 745 00:25:09,360 --> 00:25:15,200 previous chart but the red graph is is 746 00:25:12,559 --> 00:25:16,320 prediction based on the red based on the 747 00:25:15,200 --> 00:25:18,320 matte swing 748 00:25:16,320 --> 00:25:20,559 and this is what you get there's that 749 00:25:18,320 --> 00:25:22,400 black line which takes it to about 30 to 750 00:25:20,559 --> 00:25:24,880 settle down this is the red line this is 751 00:25:22,400 --> 00:25:27,360 the match figure this is stable this 752 00:25:24,880 --> 00:25:29,360 isn't bouncing around this is like 10 of 753 00:25:27,360 --> 00:25:30,799 the counted and within one percent of 754 00:25:29,360 --> 00:25:34,080 the final result 755 00:25:30,799 --> 00:25:36,640 the low variance and the lack of bias 756 00:25:34,080 --> 00:25:37,919 means that this is an accurate predictor 757 00:25:36,640 --> 00:25:40,559 and it's why 758 00:25:37,919 --> 00:25:42,799 we that's the method we use we use this 759 00:25:40,559 --> 00:25:44,640 comparative swing and that removes 760 00:25:42,799 --> 00:25:46,960 nearly all the bias and a lot of 761 00:25:44,640 --> 00:25:48,559 variance from the early figures 762 00:25:46,960 --> 00:25:51,440 now 763 00:25:48,559 --> 00:25:52,880 this is actually the same graph 764 00:25:51,440 --> 00:25:55,120 but what i've done here is i've actually 765 00:25:52,880 --> 00:25:57,600 used the time stamped data from the last 766 00:25:55,120 --> 00:26:00,559 election so this isn't on the order of 767 00:25:57,600 --> 00:26:03,360 polling places it's on timestamp now 768 00:26:00,559 --> 00:26:05,520 this performs slightly better so the the 769 00:26:03,360 --> 00:26:07,600 black line took to about 30 percent to 770 00:26:05,520 --> 00:26:09,520 settle down on the other graph using 771 00:26:07,600 --> 00:26:12,480 real-life data it's settling down about 772 00:26:09,520 --> 00:26:14,720 20 percent but the key point is the the 773 00:26:12,480 --> 00:26:16,799 red light is still it's just way more 774 00:26:14,720 --> 00:26:20,400 stable and this is the same in every 775 00:26:16,799 --> 00:26:23,039 election the swing is always more stable 776 00:26:20,400 --> 00:26:24,799 than the the two candidates essentially 777 00:26:23,039 --> 00:26:26,320 there is a wide range of results from 778 00:26:24,799 --> 00:26:28,240 polling place to polling place and if 779 00:26:26,320 --> 00:26:30,480 you want to use the raw numbers 780 00:26:28,240 --> 00:26:32,960 you've got all that wide range 781 00:26:30,480 --> 00:26:35,520 if you use the swing you're measuring 782 00:26:32,960 --> 00:26:37,679 change from the last election and that 783 00:26:35,520 --> 00:26:39,120 change will always be less than the two 784 00:26:37,679 --> 00:26:41,520 than the two candidate preferred 785 00:26:39,120 --> 00:26:43,120 variability in an individual electorate 786 00:26:41,520 --> 00:26:45,840 two candidate preferred right results 787 00:26:43,120 --> 00:26:47,919 can range from 20 to 80 percent you know 788 00:26:45,840 --> 00:26:50,080 they have a huge range 789 00:26:47,919 --> 00:26:51,440 swings will be clustered around the 790 00:26:50,080 --> 00:26:53,520 swing and they're going to be a lot 791 00:26:51,440 --> 00:26:55,440 smaller and the variance on average is 792 00:26:53,520 --> 00:26:58,080 between one-third just through the 793 00:26:55,440 --> 00:27:00,000 standard deviation between one-third and 794 00:26:58,080 --> 00:27:02,720 half and that's why we operate on the 795 00:27:00,000 --> 00:27:04,640 swing now 796 00:27:02,720 --> 00:27:07,200 on election night this is the way the 797 00:27:04,640 --> 00:27:08,480 data comes to us the blue line is first 798 00:27:07,200 --> 00:27:10,480 preferences 799 00:27:08,480 --> 00:27:12,080 and the second line is the two-party 800 00:27:10,480 --> 00:27:14,000 preferred which comes in later or two 801 00:27:12,080 --> 00:27:15,760 candidates preferred you can see that 802 00:27:14,000 --> 00:27:17,279 one lags the other and then they catch 803 00:27:15,760 --> 00:27:20,799 up later in the evening the key thing to 804 00:27:17,279 --> 00:27:23,279 watch for is 7 p.m i've only got 10 805 00:27:20,799 --> 00:27:25,120 less than 10 of the first preference 806 00:27:23,279 --> 00:27:27,440 felt only about three percent of the 807 00:27:25,120 --> 00:27:30,480 two-party preferred by 7 30 i've got to 808 00:27:27,440 --> 00:27:32,960 10 two-party preferred and 809 00:27:30,480 --> 00:27:35,440 remember i mean this this is 10 this but 810 00:27:32,960 --> 00:27:37,440 we haven't got the data in that order 811 00:27:35,440 --> 00:27:38,960 we've got it in time order so this is 812 00:27:37,440 --> 00:27:41,360 what the next graph is what the graph 813 00:27:38,960 --> 00:27:43,360 looks like if i plot this by time 814 00:27:41,360 --> 00:27:45,039 and you've got the graph is much further 815 00:27:43,360 --> 00:27:46,960 over to the right we're spending a lot 816 00:27:45,039 --> 00:27:48,720 more time earlier in the evening at 7 30 817 00:27:46,960 --> 00:27:50,799 in the evening talking about early 818 00:27:48,720 --> 00:27:52,320 figures because we haven't got a lot 10 819 00:27:50,799 --> 00:27:53,840 percent of the two-party preferred 820 00:27:52,320 --> 00:27:56,080 candidate but 821 00:27:53,840 --> 00:27:57,520 the key point to make there is this is 822 00:27:56,080 --> 00:27:59,840 more stable 823 00:27:57,520 --> 00:28:01,279 than the the number up here and that's 824 00:27:59,840 --> 00:28:03,919 what i want at 7 30 i want to know what 825 00:28:01,279 --> 00:28:05,600 the numbers are and i always say at 7 30 826 00:28:03,919 --> 00:28:07,360 on the night i usually know the result 827 00:28:05,600 --> 00:28:09,039 of the election if it's clear 828 00:28:07,360 --> 00:28:11,279 if i don't know the result i know we 829 00:28:09,039 --> 00:28:12,799 have to wait for more data you know if 830 00:28:11,279 --> 00:28:14,559 you ever know the result by 7 30 you 831 00:28:12,799 --> 00:28:16,240 might know it later if it's a really 832 00:28:14,559 --> 00:28:17,760 close election you won't know the 833 00:28:16,240 --> 00:28:19,279 results on the night 834 00:28:17,760 --> 00:28:20,799 but that's 835 00:28:19,279 --> 00:28:22,320 that's essentially what i'm doing on 836 00:28:20,799 --> 00:28:25,120 election night so 837 00:28:22,320 --> 00:28:27,840 what do we do next we've got that curve 838 00:28:25,120 --> 00:28:29,919 what do i do it's this red curve that's 839 00:28:27,840 --> 00:28:32,559 the line i showed earlier the dotted 840 00:28:29,919 --> 00:28:34,799 lines on either side that's the 99 841 00:28:32,559 --> 00:28:36,559 confidence interval once that confidence 842 00:28:34,799 --> 00:28:39,039 interval is above 50 843 00:28:36,559 --> 00:28:40,480 i'm confident that my prediction 844 00:28:39,039 --> 00:28:41,840 is over 50 845 00:28:40,480 --> 00:28:44,480 it is not going to fall back onto the 846 00:28:41,840 --> 00:28:46,640 other side it might bob back and 847 00:28:44,480 --> 00:28:49,440 forwards but it's not going to disappear 848 00:28:46,640 --> 00:28:50,960 this is the figure that i want to use i 849 00:28:49,440 --> 00:28:52,960 want something which gives me a stable 850 00:28:50,960 --> 00:28:54,720 prediction very early and so that's the 851 00:28:52,960 --> 00:28:56,080 method to use now i've got two vertical 852 00:28:54,720 --> 00:28:58,640 lines there 853 00:28:56,080 --> 00:29:01,440 because you do get variability in early 854 00:28:58,640 --> 00:29:03,279 figures we have two cutoffs 855 00:29:01,440 --> 00:29:05,919 three percent is we have a bottom of 856 00:29:03,279 --> 00:29:07,440 frame total on television no seat is 857 00:29:05,919 --> 00:29:09,279 included in that figure so i've got 858 00:29:07,440 --> 00:29:11,279 three percent counted that just 859 00:29:09,279 --> 00:29:13,760 minimizes the number of times the seats 860 00:29:11,279 --> 00:29:15,760 go down and that always makes people 861 00:29:13,760 --> 00:29:17,120 worried when the tally goes down now 862 00:29:15,760 --> 00:29:18,799 they say how can you have given the seat 863 00:29:17,120 --> 00:29:20,240 away and then take it back 864 00:29:18,799 --> 00:29:22,240 well if you're doing it by hand in 865 00:29:20,240 --> 00:29:25,279 absolute confidence you would be more 866 00:29:22,240 --> 00:29:27,760 cautious we have automated this so it 867 00:29:25,279 --> 00:29:29,679 will sometimes go down 868 00:29:27,760 --> 00:29:31,919 but the alternative is you do it 869 00:29:29,679 --> 00:29:33,600 manually and you get updates every two 870 00:29:31,919 --> 00:29:35,600 minutes and you're constantly behind if 871 00:29:33,600 --> 00:29:37,360 you adopt this statistical method you 872 00:29:35,600 --> 00:29:39,279 will always be up with the data and so 873 00:29:37,360 --> 00:29:41,279 that's what we choose to do now just a 874 00:29:39,279 --> 00:29:44,399 hint on this next graph because i'm not 875 00:29:41,279 --> 00:29:45,919 showing giving away too many secrets um 876 00:29:44,399 --> 00:29:47,919 this explains how this two-party 877 00:29:45,919 --> 00:29:50,480 preferred looks like if you plot the 878 00:29:47,919 --> 00:29:52,080 confidence interval um basically and 879 00:29:50,480 --> 00:29:54,080 this is not brandon this is an entirely 880 00:29:52,080 --> 00:29:55,679 different electorate and i've removed 881 00:29:54,080 --> 00:29:57,919 all the numbers so you can't read it but 882 00:29:55,679 --> 00:30:01,279 basically that confidence interval 883 00:29:57,919 --> 00:30:03,200 interval turns into a downward sloping 884 00:30:01,279 --> 00:30:05,279 downward curving line 885 00:30:03,200 --> 00:30:07,440 which approaches 886 00:30:05,279 --> 00:30:09,600 the the it's asymptotic the line 887 00:30:07,440 --> 00:30:13,840 approaches the 50 line 888 00:30:09,600 --> 00:30:15,840 when you get to um all the votes counted 889 00:30:13,840 --> 00:30:18,799 once that red line crosses that we give 890 00:30:15,840 --> 00:30:20,240 the seat away now the seat might drop 891 00:30:18,799 --> 00:30:22,000 back 892 00:30:20,240 --> 00:30:23,279 it generally drops back into leaning 893 00:30:22,000 --> 00:30:25,200 that way as you can see that that never 894 00:30:23,279 --> 00:30:27,440 dropped below 50 percent 895 00:30:25,200 --> 00:30:29,679 there's no way that that's once it gets 896 00:30:27,440 --> 00:30:32,480 close to that line it is not suddenly 897 00:30:29,679 --> 00:30:35,120 going to revert to the other side of 50 898 00:30:32,480 --> 00:30:37,120 and you've got a a reflective curve for 899 00:30:35,120 --> 00:30:39,120 the other candidate in the final race so 900 00:30:37,120 --> 00:30:40,960 that's that's what that confidence 901 00:30:39,120 --> 00:30:42,640 interval looks like i've not shown you 902 00:30:40,960 --> 00:30:45,279 because otherwise people sit and figure 903 00:30:42,640 --> 00:30:46,880 out you know a pseudo version of what we 904 00:30:45,279 --> 00:30:49,039 do and i'm just not going to do that but 905 00:30:46,880 --> 00:30:50,720 that's what you end up with and out of 906 00:30:49,039 --> 00:30:52,799 all this comes this box which is from 907 00:30:50,720 --> 00:30:53,760 the south australian election this gives 908 00:30:52,799 --> 00:30:55,279 me 909 00:30:53,760 --> 00:30:57,200 the um 910 00:30:55,279 --> 00:30:59,760 the number of seats won by each party 911 00:30:57,200 --> 00:31:01,840 and that's the whole game that we're 912 00:30:59,760 --> 00:31:03,600 doing here and on these numbers from 913 00:31:01,840 --> 00:31:05,840 quite early on we've got the liberal 914 00:31:03,600 --> 00:31:07,760 party on 25 seats 915 00:31:05,840 --> 00:31:09,919 and i have an error margin plus or minus 916 00:31:07,760 --> 00:31:12,000 three seats you need 23 seats for 917 00:31:09,919 --> 00:31:13,600 majority if i'm looking at these numbers 918 00:31:12,000 --> 00:31:16,000 on the night that's not close enough to 919 00:31:13,600 --> 00:31:18,000 call but this looks like magic this 920 00:31:16,000 --> 00:31:20,080 looks like this is all under so you know 921 00:31:18,000 --> 00:31:22,080 i'm sort of making a guess or something 922 00:31:20,080 --> 00:31:24,000 all the mathematics in each of those 923 00:31:22,080 --> 00:31:25,919 seats is being done constantly and it's 924 00:31:24,000 --> 00:31:27,840 being checked and we have alternatives 925 00:31:25,919 --> 00:31:29,760 for the preference formulas are wrong we 926 00:31:27,840 --> 00:31:31,360 have different ways we can if we think 927 00:31:29,760 --> 00:31:33,919 the formula is over predicting we can 928 00:31:31,360 --> 00:31:36,080 pull the seat back into doubt um i don't 929 00:31:33,919 --> 00:31:38,480 manually give seats away i will manually 930 00:31:36,080 --> 00:31:40,320 push them into wind out if i want to but 931 00:31:38,480 --> 00:31:42,000 that all then produces this total at the 932 00:31:40,320 --> 00:31:44,320 end so everyone thinks i'm you know 933 00:31:42,000 --> 00:31:46,640 making some guess or not i'm not it's a 934 00:31:44,320 --> 00:31:49,200 science this is all science all 935 00:31:46,640 --> 00:31:51,760 mathematics and that's the way it's done 936 00:31:49,200 --> 00:31:53,760 so um that's my little presentation but 937 00:31:51,760 --> 00:31:56,399 uh as i said 938 00:31:53,760 --> 00:31:59,600 if i go back to this graph 939 00:31:56,399 --> 00:32:00,640 um this is the magic if you use the 940 00:31:59,600 --> 00:32:02,720 swing 941 00:32:00,640 --> 00:32:05,200 you get a red line prediction like that 942 00:32:02,720 --> 00:32:06,960 which is stable from very early on if 943 00:32:05,200 --> 00:32:08,159 you use the black line 944 00:32:06,960 --> 00:32:09,600 um 945 00:32:08,159 --> 00:32:11,360 you're all over the place waiting for 946 00:32:09,600 --> 00:32:12,799 the figures to stabilize and i'll say 947 00:32:11,360 --> 00:32:14,240 one further thing 948 00:32:12,799 --> 00:32:16,240 it is getting slightly harder at the 949 00:32:14,240 --> 00:32:18,480 moment the rise in pre-poll voting and 950 00:32:16,240 --> 00:32:20,480 postals the assumption that people vote 951 00:32:18,480 --> 00:32:22,559 in the same place as last time is 952 00:32:20,480 --> 00:32:23,760 starting to be undermined and we've i 953 00:32:22,559 --> 00:32:25,440 know we did the queensland election 954 00:32:23,760 --> 00:32:27,279 there was a huge increase in pre-polar 955 00:32:25,440 --> 00:32:29,519 and postal voting it was a closer 956 00:32:27,279 --> 00:32:31,760 election we just had to wait longer and 957 00:32:29,519 --> 00:32:33,760 in fact in recent years we've begun to 958 00:32:31,760 --> 00:32:35,600 sort of wind out the variance formula so 959 00:32:33,760 --> 00:32:37,919 that the system gives a little bit more 960 00:32:35,600 --> 00:32:42,000 wiggle room in the predictions 961 00:32:37,919 --> 00:32:43,519 so anyway that's that's my presentation 962 00:32:42,000 --> 00:32:44,799 you for that anthony that was really 963 00:32:43,519 --> 00:32:46,320 really interesting i do like the 964 00:32:44,799 --> 00:32:48,320 statistical magic you've pulled off 965 00:32:46,320 --> 00:32:50,480 there it is quite incredible 966 00:32:48,320 --> 00:32:52,000 thank you very much 967 00:32:50,480 --> 00:32:54,080 so hopefully you'll be you'll be quite 968 00:32:52,000 --> 00:32:56,559 busy for the um the first half of this 969 00:32:54,080 --> 00:32:58,880 year i imagine anthony you'll be yes i 970 00:32:56,559 --> 00:33:00,640 have a um south australian election on 971 00:32:58,880 --> 00:33:02,320 the 19th of march 972 00:33:00,640 --> 00:33:03,919 and it's looking pretty clear that the 973 00:33:02,320 --> 00:33:06,080 federal election will be in may not 974 00:33:03,919 --> 00:33:07,360 march at the moment so that's my working 975 00:33:06,080 --> 00:33:10,399 assumption of course it could be wrong 976 00:33:07,360 --> 00:33:12,000 but that's my working assumption 977 00:33:10,399 --> 00:33:12,720 yes yes so you know who can predict 978 00:33:12,000 --> 00:33:16,919 these 979 00:33:12,720 --> 00:33:16,919 these politicians what would they do