That is, all of the letters in our string shift to the right, and the front-most character gets pushed onto the back .We do this for each symbol in our stream, rotating one character to the right, so we end up with a table whose number of rows equals the length of the stream. The next step is to sort the rows of this table lexographically, giving us a sorted table form jackpotcity New Zealand.
Now, this is where some of the dark magic happens. You can see that each row is effectively a permutation of the word "banana," but also, each column is a permutation as well. The first column, of course, is the sorted permutation of our word. But really, what we want is the last column, in general. This N-N-B-A-A-A has some interesting characteristics. Notice it's got better symbol clustering than any of the other columns, especially our input stream and, thus, is what we output as our encode. Now, before you run off, there's one other piece of data that we need to grab as well. Notice that in our sorted table, the input string actually sits at index 4. We'll need this row index during the decode phase of the Burrows-Wheeler Transform, because it will allow us to move from our more compressible permutation back to the source string. Now, you may ask yourself, why do we choose this last column? I mean, the one right next to it seems just as good, seems to have very close clustering. Why not that one? Well, this is more BWT magic. See, this last column has an interesting feature. If we have only this, N-N-B-A-A-A, we can recover the rest of our sorted table entirely. The other columns don't possess the same characteristic, which is highly important when we're trying to invert our transform. Banana. MAGNUS HYTTSTEN: That's great! COLT MCANLIS: All right. The remarkable thing about Burrows-Wheeler Transform is not that it generates a more compressible output-- any ordinary sort would do that-- but that this particular transform is reversible with minimal data overhead. And to help us demonstrate that is my good friend Magnus, who, turns out, is training for the World Sorting Championship. MAGNUS HYTTSTEN: Yeah! Sorting is my thing. Yeah. [LAUGHS] COLT MCANLIS: Yeah. This guy right here, he's going to take it all. All right. So at the start of our decode step, we're given the string N-N-B-A-A-A and the row index of 4. If you recall, N-N-B-A-A-A represents the last column of our table. So Magnus, can you put an N-N-B-A-A-A up on the board for us? MAGNUS HYTTSTEN: Like that? COLT MCANLIS: Just like that. MAGNUS HYTTSTEN: Great! COLT MCANLIS: DO it. Now, when he's doing this, I want you to watch his technique. Watch the way his hand flows through the process. MAGNUS HYTTSTEN: Yeah! Finished. COLT MCANLIS: Fantastic. Yeah. Young kids, you're afraid of this guy. Now, once we have this data in our table, the next step is to sort it, which is where Magnus comes in for us. All right, Magus, you ready? MAGNUS HYTTSTEN: Absolutely. COLT MCANLIS: We're going to do a sort. Nice warm up. Don't pull anything. MAGNUS HYTTSTEN: OK. COLT MCANLIS: All right. When you're ready, go. Excellent, good job.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
November 2017
Categories |