1 00:00:12,375 --> 00:00:16,480 MODERATOR: Hello, everyone, and welcome back from  the break. I hope you had good hot chocolate,   2 00:00:16,480 --> 00:00:23,680 tea and/or a nice walk outside. Next up we have a  talk from Simon Prickett, the talk is called "No,   3 00:00:23,680 --> 00:00:30,480 Maybe and Close Enough! Probabilistic Data  Structures with Python and Redis". Simon is   4 00:00:30,480 --> 00:00:35,440 a senior software developer who enjoys  project that fuse hardware and software   5 00:00:35,440 --> 00:00:44,720 especially with Arduino and Raspberry Pi. >> Hello. My name is Simon Prickett   6 00:00:44,720 --> 00:00:51,560 and this is "No, Maybe and Close Enough!  Probabilistic Data Structures with Python and   7 00:00:52,480 --> 00:00:59,120 Redis." The problem we are going to look at is  related to counting things. Counting things seems   8 00:00:59,120 --> 00:01:04,000 quite easy on the face of it. We just maintain  a count of the things that we want to count and   9 00:01:04,000 --> 00:01:09,280 every time we see a new and different thing we  add one to that count. How hard can this be?   10 00:01:11,120 --> 00:01:17,440 Let's assume you want to count sheep. I want to  count sheep and I am doing that in Python. I have   11 00:01:17,440 --> 00:01:23,920 here a simple Python program that does that.  Python has a set data structure built into it.   12 00:01:24,480 --> 00:01:30,080 These are great for this problem because you can  add things to a set and if we add them multiply   13 00:01:30,080 --> 00:01:38,160 times we can duplicate and ask how many things  are in the set. We can answer how many sheep   14 00:01:38,160 --> 00:01:50,960 have I seen with a set. I am declaring a set  and adding tags to it that ID. You will see   15 00:01:50,960 --> 00:01:58,920 I have 1934 again. That is re-duplicated so when  we ask how many sheep are in the set of sheep we   16 00:01:58,920 --> 00:02:05,440 have seen by using the land function it will not  count that one twice. That's perfect. We have got   17 00:02:06,000 --> 00:02:12,480 an exact number of how many sheep we have seen.  Another question that I might want to ask when   18 00:02:12,480 --> 00:02:19,120 I am counting things is not just how many sheep  have I seen but have I seen this particular sheep.   19 00:02:21,440 --> 00:02:28,560 I need to be able to retrieve data from my set or  structure I am using to determine if we have seen   20 00:02:28,560 --> 00:02:38,720 this before. Is sheep 1934 in the set of sheep we  have seen, for example. Here again, I am using a   21 00:02:38,720 --> 00:02:44,880 Python set and this seems to be a great fit for  this problem. I declare the set with sheep tags   22 00:02:44,880 --> 00:02:55,920 we have seen. 1934, 1201. I have a function saying  is the sheep ID passed in the set of sheep we are   23 00:02:55,920 --> 00:03:02,960 seen and it either is or isn't. That's going to  work perfectly and be reliable 100% of the time.   24 00:03:04,000 --> 00:03:12,560 When I call have I seen 1934 it is going to say  yes. Have we seen 1283 and it is going to say no,   25 00:03:12,560 --> 00:03:19,440 we haven't seen that one before or at least not  yet. When we count things we want to answer two   26 00:03:19,440 --> 00:03:26,960 questions. How many things distinct things have  I seen and have I seen this particularly distinct   27 00:03:28,560 --> 00:03:33,680 set and then the set is covered and done  everything we need. That's it. We are done.   28 00:03:35,200 --> 00:03:44,000 Problem solved. But well are we? The set  works great. It is a 100% accurate. But   29 00:03:44,000 --> 00:03:51,760 we had a relatively small dataset. We had a few  sheep and we remembered the ID of all the sheep   30 00:03:51,760 --> 00:03:58,800 and the IDs were short. Remembering all of those  in a set and storing all that data is not a huge   31 00:03:58,800 --> 00:04:05,520 problem. But if we need to count huge numbers of  distinct thinks, so operating at internet scale,   32 00:04:05,520 --> 00:04:12,320 or operating at Australian New Zealand sheep farm  scale, then we might need to think about this   33 00:04:12,320 --> 00:04:19,600 again because we might have issues here. So, at  scale, when things get big with counting things,   34 00:04:20,320 --> 00:04:25,440 we start to hit problems with, for example,  memory usage. Remembering all those things   35 00:04:25,440 --> 00:04:32,480 in the set starts to get expensive in terms of  the amount of memory that set requires to be   36 00:04:32,480 --> 00:04:38,240 stored. It also gives a problem of horizontal  scaling. If we count lots and lots of things,   37 00:04:38,800 --> 00:04:43,200 the chances are it is not just going to be one  person or one process out there counting things   38 00:04:43,200 --> 00:04:49,600 and using a local in memory process variable to do  it. We will have several processes counting things   39 00:04:49,600 --> 00:04:55,840 and they will want to count together and maintain  a common count. We need a way of sharing counters   40 00:04:56,720 --> 00:05:04,320 and making sure it updates so we do not get  false counts or we don't have a problem of if   41 00:05:04,320 --> 00:05:09,360 one counter goes down we have lost some of the  data or can't fit all of the data in a single   42 00:05:09,360 --> 00:05:17,840 processors memory. Once we get to scale, counting  things exactly starts to get very expensive   43 00:05:17,840 --> 00:05:28,240 in terms of memory usage, potentially time  performance and concurrency. One way we might   44 00:05:28,240 --> 00:05:35,280 want to resolve that would be to move the counting  problem out of memory and into, say, a database.   45 00:05:37,200 --> 00:05:43,040 Here I am using a database. I am using the Redis  database for the reason that it has a set data   46 00:05:43,040 --> 00:05:49,280 structure so we can take the set we will be using  in Python and we can move that out of the Python   47 00:05:49,280 --> 00:05:56,240 and into Redis. This is a fairly simple code  change. We now just create a Redis connection   48 00:05:56,240 --> 00:06:03,440 using the Redis module and we basically tell it  what the key name of a Redis set is that we want   49 00:06:03,440 --> 00:06:12,640 to store sheet counts in. We just S add things.  In Python we were doing dot add Redis is this.   50 00:06:15,760 --> 00:06:20,000 We say what set we want to put it in because  we are a database and can store multiple   51 00:06:20,000 --> 00:06:28,640 of these. We give it the tags. The same behavior  happens so when I had 1934 a second time,   52 00:06:29,440 --> 00:06:36,160 1934 will be reduplicated. We are using  Redis for this and it is out of the   53 00:06:36,160 --> 00:06:41,600 Python process and accessible across the  network we can connect multiple counters to it.   54 00:06:42,480 --> 00:06:46,880 We can solve a couple of problems here. We can  solve the problem of what if I have a lot of   55 00:06:46,880 --> 00:06:53,120 people out here counting the sheep and we want to  maintain a centralized overall count and we have   56 00:06:53,120 --> 00:07:01,520 solved the problem of the memory limitations in a  given process. The process is no longer becoming a   57 00:07:01,520 --> 00:07:07,520 memory hog with all these cheap IDs in a set.  We moved it out to a database. In this case,   58 00:07:07,520 --> 00:07:20,720 Redis. We still have the problem here of overall  size. Ask we add more and more sheep, the dataset   59 00:07:20,720 --> 00:07:25,040 is still going to take up a reasonable amount  of space and that's going to grow according to   60 00:07:25,040 --> 00:07:33,840 how we add sheep. If we use longer tags it will  go every time because we are storing the items.   61 00:07:37,200 --> 00:07:42,400 Let's have a look at how we can determine if we  have seen the sheep before when using a database.   62 00:07:44,480 --> 00:07:47,920 Here, again, we are using Redis.  Imagine we put all of the data   63 00:07:47,920 --> 00:07:53,760 into that set and we have shared counters and  lots of people can go out and count the sheep.   64 00:07:54,880 --> 00:08:00,240 To know if we have seen the sheep before, we  then basically have a new have I seen function   65 00:08:00,240 --> 00:08:05,200 and some preamble before it that clears out  any old set in Redis and sets sample data.   66 00:08:07,680 --> 00:08:14,320 Instead of using an if-sheep tag is in  the set like we did with the Python set,   67 00:08:14,320 --> 00:08:20,880 we are going to use a Redis command called S is  member so set is member. We will say if this sheep   68 00:08:20,880 --> 00:08:27,600 ID is in the set then we have seen it otherwise  we haven't. As we expect that works the same   69 00:08:27,600 --> 00:08:34,240 as it does in Python with set but we solved the  concurrency problem and we solved the individual   70 00:08:34,240 --> 00:08:41,600 process memory limit problem. We really just moved  that memory problem into the database itself. So   71 00:08:42,720 --> 00:08:47,760 to solve that and to enable counting at large  scale without chewing through a lot of memory we   72 00:08:47,760 --> 00:08:55,520 will need to make tradeoffs which involve giving  up one thing in exchange for the other. The sheep   73 00:08:55,520 --> 00:09:01,120 on the left has its fleece and the sheep on  the right gave up the fleece to be cooler.   74 00:09:02,240 --> 00:09:08,320 We can determine, this is the key  thing, we have been storing the whole   75 00:09:08,320 --> 00:09:13,840 dataset and all of the data to determine which  sheep we have seen, but can we get away with   76 00:09:13,840 --> 00:09:19,280 storing something about the data, or bits of the  data, and still know that it is that sheep? The   77 00:09:19,280 --> 00:09:23,600 sheep on the left and right we can still tell  they are sheep even though one lost its fleece.   78 00:09:26,240 --> 00:09:38,800 This is where something called probabilistic  dataset -- data structures into into play.   79 00:09:41,040 --> 00:09:44,160 We can save a lot of memory by  giving up a lot of accuracy.   80 00:09:44,880 --> 00:09:50,320 We might also tradeoff some functionality. As  we will see, we can save a lot of memory by not   81 00:09:50,320 --> 00:09:55,840 actually storing the data meaning we can no longer  get a list back of what sheep we have seen but we   82 00:09:55,840 --> 00:10:01,600 can determine whether we have seen a sheep with  reasonable accuracy. The other tradeoff involved   83 00:10:01,600 --> 00:10:06,560 with probabilistic data structures is performance.  We will mostly be looking at these three.   84 00:10:09,520 --> 00:10:16,080 We have two questions we wanted to ask.  How many sheep have I seen is the first one   85 00:10:16,880 --> 00:10:22,160 and a data structure or algorithm that we can use  for that that comes from these probabilistic data   86 00:10:22,160 --> 00:10:29,920 structure families is called the hyperloglog.  That approximates distinct items. It basis   87 00:10:31,520 --> 00:10:40,640 at the cardinality of a set and not storing the  data but hashing the data and storing information   88 00:10:40,640 --> 00:10:47,360 about it. There are some pros and cons to this.  The way the hyperloglog works is it will run   89 00:10:47,360 --> 00:10:55,200 all of the data through hash functions and look  at the longest number of leading zeros and what   90 00:10:55,200 --> 00:11:02,080 results hashes to a zero non binary sequence and  we look at the longest sequence first. There is   91 00:11:02,080 --> 00:11:08,720 a formula we need to look at but don't need to  understand but it will allow us to determine   92 00:11:08,720 --> 00:11:20,400 if we seen the items before and guess the  cardinality of the set. The hyperloglog   93 00:11:20,400 --> 00:11:26,160 has a similar interface it a set, we can add  things and ask how many things are in there.   94 00:11:27,120 --> 00:11:30,720 It is going to save a lot of space because  we are using a hashing function so it will   95 00:11:30,720 --> 00:11:34,880 be a thick size data structure no  matter how much data we put in there.   96 00:11:36,400 --> 00:11:42,160 We can't retrieve the items back again unlike  with there set. That's a benefit and tradeoff.   97 00:11:42,800 --> 00:11:47,680 We can't retrieve them. That great in some places  where we want a count and don't want the overhead   98 00:11:47,680 --> 00:11:52,240 of storing had information. For example, if  it is personally identifiable information.   99 00:11:59,360 --> 00:12:04,720 We can use hyperloglog why want to count but  don't necessarily need the actual information   100 00:12:04,720 --> 00:12:10,320 back again. The other tradeoff involved is it is  not built into the Python language so we will need   101 00:12:10,320 --> 00:12:21,120 a library implementation and something to store  it into a data store. Here is the algorithm for   102 00:12:21,120 --> 00:12:32,480 hyperloglog. This is on Wikipedia. It is a lot of  math to do with hashing things down to 0s and 1s,   103 00:12:32,480 --> 00:12:37,880 looking at how many leading 0s there are, keeping  a count of the greatest number of leading 0s we   104 00:12:37,880 --> 00:12:43,360 have seen and then you can approximate  the size of the dataset based on that.   105 00:12:45,440 --> 00:12:48,800 The takeaway here is we don't need do  that. We are going to use a library or   106 00:12:48,800 --> 00:12:52,960 another implementation built into a data  store and we will look at both of those.   107 00:12:56,640 --> 00:13:02,160 The hyperloglog doesn't answer the question of  how many sheep have I seen for us. It is going to   108 00:13:02,160 --> 00:13:08,160 answer the question approximately how many sheep  have I seen which may well be good enough were   109 00:13:08,160 --> 00:13:16,720 our dataset and save us memory. I am using the  hyperloglog module in the Python program and I am   110 00:13:17,520 --> 00:13:21,680 declaring a set as well for comparison  so we will see how a set compares.   111 00:13:23,440 --> 00:13:27,280 I am declaring my hyperloglog and  giving it an accuracy factor which   112 00:13:27,280 --> 00:13:33,040 something you can tune and tradeoff  the mount of data bits it will take.   113 00:13:37,040 --> 00:13:42,800 When we come to look at that with a data store  we will see how the sizes actually compare.   114 00:13:44,640 --> 00:13:52,800 We then have a loop and we will add a 100,000  sheep and ask both how many do you have.   115 00:13:55,120 --> 00:14:02,000 Why -- we will see as we expect the set is  a 100% correct. We have a 100,000 sheep in   116 00:14:03,520 --> 00:14:09,760 the set. The hyperloglog slighty  overcounted. It is within a good   117 00:14:10,800 --> 00:14:16,640 margin of error and the tradeoff is the set has  taken up way more memory than the hyperlog has.   118 00:14:17,200 --> 00:14:20,000 We will put numbers on it when  we look at it in the database.   119 00:14:23,920 --> 00:14:29,200 One of the reasons I picked Redis as the data  store is because it has sets and hyperloglogs   120 00:14:29,200 --> 00:14:36,720 as data types. I have a small Python program. It  is going to do the same thing. Store sheep in a   121 00:14:36,720 --> 00:14:46,000 Redis set and a Redis hyperloglog. We delete those  and loop over a 100,000 sheep and add IDs to Redis   122 00:14:46,880 --> 00:14:53,200 and we put them into a set and we use the PFF  command down there to add them to the hyperloglog.   123 00:14:54,000 --> 00:15:01,520 PF is the French mathematician who partly  came up with the hyperloglog algorithm so   124 00:15:01,520 --> 00:15:08,480 Redis commands are named after him. We will  ask Redis what's the cardinalty of the set   125 00:15:12,400 --> 00:15:16,000 and the approximation with the  hyperloglog so we can compare.   126 00:15:18,240 --> 00:15:24,480 Here we can see in the Redis implementation  we got a 100,000 sheep and it took about   127 00:15:27,840 --> 00:15:39,680 that memory. With the hyperloglog we got 99, 500  sheep but it only took 12K of memory. We can keep   128 00:15:39,680 --> 00:15:45,760 adding sheep to that and it will only take 12K of  memory whereas the set would have to keep growing.   129 00:15:49,680 --> 00:15:53,840 We are getting an approximate  account with saving a lot of memory.   130 00:15:56,880 --> 00:16:04,480 The second probabilistic data structure I want to  look out is the Bloom Filter. The Bloom Filter is   131 00:16:04,480 --> 00:16:11,040 used for our other question we wanted to ask which  is have I seen this sheep. That's a set membership   132 00:16:11,040 --> 00:16:17,520 type of question. Is the sheep 1, 2, 3, 4 in the  set of sheep we have seen. When we are using a   133 00:16:17,520 --> 00:16:24,400 set, we will get an absolute answer of yes or no.  When we are using a Bloom Filter, we will get an   134 00:16:24,400 --> 00:16:30,560 approximated answer. We will get absolutely  no, it is not in the set. Or we will get   135 00:16:30,560 --> 00:16:38,960 maybe it is. There is a high likelihood that it  is in the set. Again, that uncertainty comes from   136 00:16:38,960 --> 00:16:45,600 not storing the data in the Bloom Filter. We will  hash the data and trade that memory setting for a   137 00:16:45,600 --> 00:16:54,880 little accuracy. The way the Bloom Filter works  and I have one laid out. You have a bit array   138 00:16:54,880 --> 00:17:01,040 and that's however many bits you want to make it  wide. We can configure the width of the bit array   139 00:17:01,600 --> 00:17:14,720 and how much memory it will take. I have 15 bits  as a simple example. Every time we put a new   140 00:17:14,720 --> 00:17:21,360 sheep ID or data item in the Bloom Filter, we are  going to run it through those hash functions and   141 00:17:21,360 --> 00:17:27,200 they all have to return a result that varies  between 0 and the length of the bit array.   142 00:17:28,960 --> 00:17:35,200 Essentially, they are going to identify positions  in the bit array that that sheep ID hashes to. We   143 00:17:35,200 --> 00:17:40,480 will use three in our example. Each sheep ID  is hashed to three different bits and we will   144 00:17:40,480 --> 00:17:48,640 see how that enables us to answer whether we have  seen that sheep before in a no or maybe style. If   145 00:17:48,640 --> 00:17:57,440 we start out with adding the ID 1009 and we have  three hash functions and it is first hashed to   146 00:17:57,440 --> 00:18:05,760 position one, second one hashes it to position  six, and the third one to position eight.   147 00:18:08,960 --> 00:18:18,400 Then what is going to happen here is that  each bit in that filter is then set to one.   148 00:18:19,040 --> 00:18:25,520 We know that a hash function has landed  on that. Similarly, when we add more   149 00:18:26,720 --> 00:18:32,960 sheep, so we add sheep 9107 here, the three hash  functions result in these positions and we can see   150 00:18:32,960 --> 00:18:40,560 in this case that 9107 generated two new positions  that were previously unset in our bit away and one   151 00:18:40,560 --> 00:18:48,720 existing one. There is potential here for as with  a lot of hashing clashes. The more hash filters we   152 00:18:48,720 --> 00:18:56,160 use, the wider the bit array, we can dial that out  but in this simple example we will get batches.   153 00:18:57,920 --> 00:19:04,080 Adding more we get 1458. That hashes to  three things that were already taken.   154 00:19:05,440 --> 00:19:09,840 We don't set any new items. Or  don't set any new bits to one bit.   155 00:19:11,680 --> 00:19:16,720 Now, when went to look something up, what we  do is the same thing but look at the value   156 00:19:17,520 --> 00:19:25,600 in the bit array. When I look up sheep 2045 and  have we seen that sheep, the first hash functions   157 00:19:26,320 --> 00:19:31,280 to a position where we have a one stow  is possible. The second one hash do is a   158 00:19:31,280 --> 00:19:37,120 position where we have a zero meaning we  haven't seen the ship. We can stop and not   159 00:19:37,120 --> 00:19:41,120 continue with the third hashing function  but for completeness I have shown it.   160 00:19:41,760 --> 00:19:47,680 As soon as we get one that returns a zero we  know we have absolutely definitely not seen it.   161 00:19:49,200 --> 00:19:56,720 9107 is a sheep we have seen before. All of the  hash functions land on a position that already   162 00:19:56,720 --> 00:20:01,920 has a one in it. We can say there is a strong  likelihood that we have seen this sheep before.   163 00:20:03,200 --> 00:20:11,360 The reason we can't say that with absolute  certainty is if we look at sheep 2989 that is not   164 00:20:11,360 --> 00:20:20,720 in the set we added at the top and not one we have  seen before but it is number hashes to positions   165 00:20:20,720 --> 00:20:26,240 that are all set to one. The Bloom Filter in  this case is going to lie to us. It will say 2989   166 00:20:26,880 --> 00:20:35,680 there is a strong likelihood that sheep exists  but it doesn't. We trading off a lot of memory use   167 00:20:36,240 --> 00:20:43,520 because we are getting down to just this bit array  and some computational time because we are doing   168 00:20:43,520 --> 00:20:47,840 hashing across a number of functions but  we are going to save a lot of memory.   169 00:20:49,600 --> 00:20:58,960 If we want to know if we have seen the ship a no  or strong possibility is an OK answer then we can   170 00:20:58,960 --> 00:21:07,520 use this and save ourselves a lot of memory. So  here some Python code that uses this. We will use,   171 00:21:07,520 --> 00:21:20,000 again, library code for Bloom Filter using  pyproables. This works out how many hashes   172 00:21:20,000 --> 00:21:27,360 and bit array size and we say we want to store  that many items. 200,000 items and we can dial   173 00:21:27,360 --> 00:21:33,360 in a false positive rate acceptable to us. That  will figure out the memory size used and that's   174 00:21:33,360 --> 00:21:38,000 part of our tradeoffs. The more accurate we  get, the more memory, the less accurate the less   175 00:21:38,000 --> 00:21:50,160 memory. We add a 100,000 sheep to Bloom Filter and  the have I seen function is the same. We have a   176 00:21:50,160 --> 00:21:57,840 function saying have I seen the sheep and it  will say I might or no, I definitely haven't.   177 00:21:58,400 --> 00:22:03,120 This is a good drop in for a set. The interface  is similar but we are saving a lot of memory.   178 00:22:05,600 --> 00:22:12,240 And when we run this, you get the answers  that we come to expect. Might have seen 9108   179 00:22:12,240 --> 00:22:21,760 and hasn't seen the other. We can do this in  data stores also. I picked Redis as the data   180 00:22:21,760 --> 00:22:27,840 store for the talk because it has installable  module and implementation of a Bloom Filter.   181 00:22:29,200 --> 00:22:35,920 Similarly, I can create a Redis Bloom Filter,  the Bf reserve command is doing the same.   182 00:22:41,520 --> 00:22:48,640 Then I can add sheep into the Bloom Filter and  ask does the sheep exist in the Bloom Filter and   183 00:22:48,640 --> 00:22:56,240 we will get the same sort of result as we did  before. I might have seen it or I have not seen it   184 00:22:56,240 --> 00:23:03,600 but we are saving a lot of memory. Now instead of  using a set that will grow, we have a bit array   185 00:23:03,600 --> 00:23:07,600 that is not going to grow but it is going  to fill up as we add more sheep to it.   186 00:23:07,600 --> 00:23:13,600 There are strategies for stacking Bloom Filters  so as one fills putting another one on top of it.   187 00:23:14,160 --> 00:23:18,080 That's beyond the scope of this talk  but it is a problem that can be solved.   188 00:23:20,320 --> 00:23:28,400 So when should you use probabilistic data  structures? Well, tradeoffs. If an approximate   189 00:23:28,400 --> 00:23:34,240 count is good enough a hyperloglog is great.  It doesn't really matter we know exactly how   190 00:23:34,240 --> 00:23:38,080 many people read the article on Medium  as long as we are in the right ballpark.   191 00:23:39,520 --> 00:23:47,520 You could use a Bloom Filter when it is OK to have  false positives. For example, have I recommended   192 00:23:47,520 --> 00:23:53,920 this article on Medium to this user before?  Doesn't matter if we occasionally get this wrong   193 00:23:55,440 --> 00:23:59,600 and we are saving a lot of memory especially  in the cases where we need to maintain   194 00:23:59,600 --> 00:24:06,160 a data structure per user. It might be  advantageous to use these where you don't   195 00:24:06,160 --> 00:24:14,640 need to store or retrieve the original  data. If it is personal or neverending   196 00:24:19,040 --> 00:24:23,760 which leads me to the last point which is  when you are working with huge datasets   197 00:24:23,760 --> 00:24:30,880 where exact strategies aren't going to work out  for you, you are going to have to make tradeoffs.   198 00:24:32,080 --> 00:24:39,280 This family of data structures offers a good set  of tradeoffs for between memory and accuracy.   199 00:24:42,000 --> 00:24:47,360 So, that was everything I had. The code  that you have seen in the talk I put in   200 00:24:47,360 --> 00:24:56,080 a small GitHub repo. You can play with it in  Python in memory and play with it in Redis   201 00:24:56,080 --> 00:25:01,360 and have it inside a data store. Hope you enjoyed  this and have a great time at the conference.   202 00:25:04,000 --> 00:25:08,080 MODERATOR: Thank you so much for your talk,  Simon. Unfortunately, Simon isn't live to answer   203 00:25:08,080 --> 00:25:15,520 questions but you are very welcome to contact him  through his website and Twitter to ask questions.   204 00:25:17,600 --> 00:25:26,240 Next up we have got a talk at 5:30 AEST called  "Setting up a machine translation service for   205 00:25:26,240 --> 00:25:41,840 Timor-Leste" with Raphael Merx and Mel Mistica.  See you back then. Thank you very much. Goodbye.