Transcript of RSM Episode 4

RSM project page

00:01 fourth part third part one two and three you can
00:05 find on the youtube channel there's also a webpage on my homepage that you can go
00:08 to um that that has all the videos you can
00:12 download the videos if you wanna watch them offline okay so where we left off the the last
00:16 time kind of wrapping up the instruction
00:25 encoding for this this um kind of virtual machine instructions
00:31 uh and what i'm going to work on now in this fourth part is
00:35 to recap sorry no work on first i'm going to recap a little bit the changes that i made since the the third part this is i don't know
00:44 15 hours to go or something like that it's a couple of small changes and then we're going to have a look at
00:50 uh three different ways by which we can
00:55 evaluate or execute um the this code right sort of like the actual virtual machine that kind of runs the code
01:02 so i hope that sounds good i'm going to have a little eye on the shot here
01:08 there are thoughts on dino yeah you know it's super cool it's called a it has this function where you can just kind of bundle things together into a portable
01:17 ish executable it's kind of cool um let's see stream looks good thank you will this be a regular thing yeah
01:32 maybe it will no matter the situation you know at home uh it's just nice to be
01:35 able to you know to speak with another office is
01:38 usually in there so it's a little awkward to you know talk in the middle of the day if there's not a person here having meetings and such
01:45 um some more love for sublime that's cool
01:51 someone regretting up in big sur uh i hope i said it right yeah it is mono i
02:07 love it i use it everywhere from a lot of space text now um it's super show
02:12 okay look um i wish you guys can that we can talk with each other i think it's awkward that's like this text chat but you know
02:21 it is what it is part four what is what have i done uh since part
02:26 three so this project is about coming up with this little um computer
02:33 right or this virtual computer a little little language for it
02:37 a little assembly language for it little virtual machine you can run it something that's kind of portable um and just kind of quickly reiterate the
02:45 project goals here uh primarily it's just something i want
02:49 to do because i i think it's it's going to be fun and i want to you know learn
02:53 stuff uh secondly i hope it can be a sort of
02:57 substrate um sort of like a foundation that i can
03:01 build other things on like i enjoy making like programming languages and compilers and stuff like that and there's usually a point where you want to generate machine code or you want to
03:10 generate some some sort of target code like webassembly or whatever um and that can get kind of meaty um it's
03:17 really cool if this could just be one of those targets so you just make a little thing anyhow and in
03:24 longevity it would be really neat to be able to write programs um at rich programs like multimedia programs not just text programs that can
03:34 survive a decade of two of bit rots like today like i go and look at like a
03:40 web thing particular web things that i wrote just a few years ago and they don't work anymore in like a modern browser i'm like oh yeah well i'm i'm just gonna stop maintaining them right
03:49 um so i think that there's some at least for uh for enjoyable software for like
03:53 personal personal scale software for fun things i think that there's real value
03:58 in making that um have a long life so probably goes
04:03 what changed since last time uh there was a there was a file with uh
04:09 instruction encoding that we created last time um i decided that was just like bad and
04:14 i was sort of like going as my own advice actually in you know in a way to keep it simple
04:19 so i undid that i just lifted all that back into the one-handed file so there's just this one header file there's this prelude file to these to find some like basic stuff because we're dealing with c here but it's it's not project specific
04:31 so anyhow so lift those in here so there he is back into the one header file
04:35 um oh hey edward
04:43 um i hope you're doing well uh so uh so i did that i lifted this back in here
04:49 and that was that was a good decision now there's just this one file to kind of care about these little helper macros um and
04:56 constants they've just sort of made a way down here to the the the constant section of this uh
05:02 header file and uh and some of these function otherwise there there is uh there is
05:12 much to do so that was this to do point right so that's done
05:15 then the second point and this is what we'll be working on or i'll be working on here in a moment um it's the building evaluator
05:25 sorry i'm listening to music and like there's this amazing buggy mac os
05:29 it's long standing i don't know if you guys so this happens sometimes when you
05:34 reboot your system it just does this but i like that little music in the
05:42 background um so we gotta build this evaluator executed for the virtual machine instructions and last night um after
05:53 dinner i felt inspired so i started putting together just a stun for that
05:56 and i think that's a good idea because that saves us some some time to see me
06:01 just knocking out some some boilerplate so this is uh this is where i'm currently at and this is where we'll
06:08 keep going okay
06:12 so here we have a similar little uh program and on the right side here if i
06:16 hit save here we can see it's sort of like recompiling and then re-running and
06:21 we see the output from our formatting function that we put together in a previous part this shows us the the instruction offset so 0 1 2 3 that's
06:31 the instruction offset or program counter if you were running this program
06:35 and here we have the instructions along with their arguments all right so we have a move instruction and this implements a factorial function is a
06:43 it's a simple example here
06:48 okay now i'm going to show you it looks like if we if you run this thing
06:53 so here's a new function that i wrote last night called eval
06:58 it needs some registers or registers so we just allocate those there are
07:07 32 in this isa which is documented in the in the readme file here there are
07:12 32 integral registers and 32 floating point registers now this code using floating point none of that i'm not
07:18 bothered with any of that yet so don't need that and yeah we used allocate 32 of those
07:24 for virtual registers on the stack here and we just passed this into the eval function along with the program so this
07:29 is the set of virtual machine instructions and pc here has a different meaning down
07:35 here it's essentially just how many instructions are there in this in this array um and here we are setting up so we're just a zero in this i say
07:48 is the first argument passed along to a function so if we look at this up here
07:55 factorial function it takes one argument so this will be in radius zero and it
08:00 produces one output which will also eventually end up being reduced to zero
08:04 so before we call this factorial function we need to set up its import arguments and uh we do that here and we
08:10 give it number three and i think that should produce the result night i think
08:17 now this this virtual machine is not it doesn't work yet we're going to make this work so this is not going to be
08:23 the um the correct result here but we can print this out just for now
08:26 um oh god um oh that's
08:34 what's that there we go crap so the output here is 11 that's not
08:40 correct i believe uh okay now let's have a look at the uh the
08:46 way by which we we could evaluate these virtual machine intersections let's just comment this out i think this is confusing for me
08:55 okay so here's some new code that you haven't seen before if you've been following along um there are generally at least in my experience now it's not like i'm a
09:05 i'm a you know professional or expert on virtual machines like you know
09:09 i've read about them i've made some toy attempts at them and stuff like that so by no means am i an expert
09:17 here um but anyhow what up well gathered there are three general ways to implement this in a programming language like c the first one which is the
09:25 current one that we have enabled is with one big switch uh statement so now i get some some macros here so we
09:32 can switch between the three without having to have a copy of the code but so essentially what we're doing here is like we do switch and then these are cases so switch on the uh on the
09:45 current operation and then we do you know if the operation
09:50 is moved then do this thing uh if the operation is loaded out then do that thing and so on right it's just a switch case um the other the other way of doing things is with a jump table
10:04 that has the the address of like a shankar code so this would traditionally be what most um
10:12 vm implementations implemented in assembly with you so you just build a
10:18 table that maps and we got one here uh that maps the uh the operation to
10:25 some offset in code so clearing and gcc allows you to do is super yankee syntax
10:29 but like this basically takes the address of um a certain offset in your code right
10:36 which is expressed as a label in c uh and third there is this is probably a
10:41 more sort of like a less common thing to see and see but um
10:46 just as this is effective in many cases uh similar to to a gem
10:53 like a jump table of addresses you use a table of functions that you do tail
10:58 calls so you know for people into functional programming you know you have continuation passing style right where
11:03 like instead of returning a value and using a stack to push pop on a stack you
11:08 call a function with arguments and then that function instead of returning just calls the next function it calls an x function and so on
11:15 i'll obviously if you just do that with a traditional stack and you push something and push something if you do that enough times right you can run out of stack memory you got a stack overflow
11:23 um and so the there is this optimization called a tail call where uh at the end of uh you
11:31 know at the end of a function when you're returning from it rather than returning uh i should qualify that better at the end of function when you're calling another function um but rather than just like pushing on
11:43 the stack calling that function right and then returning and popping immediately you can say you know what we
11:48 don't need the stack frame anymore because there's there's nothing else after this function call
11:54 so you can just pop the stack right there and
11:58 and just jump to that function you don't even need to set up any sort of stack
12:03 space for it anyhow if you're curious about what what tail calls are go check
12:07 it out and learn more about it anyhow it's it essentially produces similar code and i got a little setup here and we're going to see what different code these
12:16 will generate for x86 we can look at arm64 too
12:21 okay so we got one big switched uh statement that's currently the current strategy we can just make sure that all of these work so strategy two is going to produce
12:29 a jump table and now we get the same the same effect here kind of works i've only implemented the logic for two
12:39 instructions so far which is the first two that are being issued in this program and we'll we can uh sort of hammer out the rest of them
12:48 so in the second strategy what happens down here is that these instead of these becoming case statements these now become labels
12:56 so these are now transformed into if we do
13:00 this these are now transformed to something that looks like this there is labels and essentially uh what will will be added
13:09 into the table is the address offset for this instruction right here whatever is this
13:16 corresponds to a machine code um
13:20 and then our third strategy and that code is a little different so that isn't in the latter part of the file so okay look at that so i broke that that
13:32 is lovely um
13:36 okay let's see if we can quickly fix this so that we can have a look at this
13:42 t ctmo is some suspicion that's the let's see here
13:55 so what is happening here is uh oh
14:08 yeah this obviously is dumb so i need to add some code here and we got to see this in a minute when we start looking at what uh what code that that the
14:16 compiler generates for free these different strategies you kind of need to put something in here otherwise the the compiler is too clever and realizes that
14:23 all of these do the same thing and it can just sort of skip generating code for it and it makes it harder to compare so i'm kind of doing this to make sure that it's or something so the instruction here gets
14:35 too large and it overflows the the space we allocated for registers and so
14:40 uh what we can do we can just do modulus here and we have here so we're looking at the um
14:47 the way to evaluate the virtual machine program and we're looking at the three different strategies uh a big switch statement um sort of a address jump table and a
15:01 typical function table um and now we're sort of like trying to
15:05 figure out like which one will give the most ergonomic experience for um
15:10 for writing this program that's short and concise that allows us to play around with
15:16 instructions and stuff like that and present concept that
15:20 and we can also look at the code that get a better sense of what the uh the
15:28 the technical trade-offs are for these different approaches
15:34 so maybe we should start there uh so a switch case this is the current strategy
15:40 uh we have that running on the site uh if uh if we have a look at i put together a
15:46 little script here let's see there there it is and we can just open it up you can see what it does so we run the script
15:52 um it uses ghost names compiles this one
15:57 file the eval file here to uh used assembly for the for x86
16:01 um and that is removed some uh some decorations in the assembly i put the clang generates but not no actual like
16:10 data or instructions and then um we just look at it and i count the
16:14 lights at the bottom here okay i also have this file open here so we can switch to and have a look
16:22 so this first strategy then we have a this one function that's just one function of the strategy um and none of this is being used right so
16:31 this is this is not there essentially then we have this state for a loop so
16:36 exec here is like a label that next here will jump back to exec and that increases the you know it grabs
16:43 the next instruction from the the array of instructions increases the program counter and just grabs the first argument because all of these are going to need it and then it just jumps to one of these right and then it repeats so there's a loop going on until it hits red
16:57 the red instruction in which case this eval function is returns okay
17:04 um so what's going on as we uh we're going to look at this rsm eval
17:09 function so we're going to scroll up here and this is actually a relatively small
17:14 code now all of these dlog statements are just excluded from this build otherwise there would be a lot of a lot of stuff here so this does x86 um so this just sets up the
17:26 stack for this function um and then it we are doing let's see where is
17:33 that yep uh so
17:39 this uh shift bit shift uh right um is what's happening in here this macro
17:46 if we jump over to these guys here you're going to see a bit shift right so that's what we got and we're going to have an and um so that's the the end operation here
17:56 and the mask it just you know the compiler has figured out that that's 31.
18:00 so there's there's nothing else there so we have two operations happening that's where they come from sorry if alpha
18:08 and the next thing it does so like this is
18:13 interesting so um when you use a switch statement right you have something like
18:17 you know switch uh like a thing right
18:22 and now we have a bunch of cases so case you know one do this case two three four do this and that you
18:29 know it's sort of like a um like a fork in a row right you you walk
18:35 out in the woods and suddenly the the road splits in three ways and you have
18:40 to choose which way to go that's that's kind of like a switch switch statement in a nutshell right so it can it can be really useful
18:47 for a lot of things it's a really ergonomic construct i think many program
18:51 languages offer them some you know with discriminated unions and stuff like that is kind of neat we just in this case like really basic with uh with just uh
19:00 constants snc what's neat about it though is that it is a high level construct and the compiler will do really clever stuff
19:07 like trademark at the end if you have a if you have a like maybe two or three cases
19:14 uh and the code is kind of small um ish for each of these cases it's
19:18 probably just gonna do comparison things like equivalent to if you use if else if
19:22 else if and if you have a lot of case statements or a lot of code in these case statements um it might uh and and this
19:30 is what it's done here for us using clang let's produce this because what is
19:35 done now is actually implementing a jump table for us which is the second strategy we looked at so what this this means the load
19:42 effective address um this instructions this just goes and
19:47 grabs the it loads the offset for the jump
19:53 table that it made for us and it gave it gave it this like kind of if you look at
19:56 just this it's kind of fun it's like a face but it it produced this young
20:00 temple and gave it this kind of yankee name and we gotta find it at the end oh
20:05 i think i kind of accidentally stripped that out um
20:11 luckily i did save oh i did save
20:15 the unstripped version here okay so oh long okay not a quad
20:21 um so yeah here it had generated a table so this is a table it starts upside zero
20:26 uh um and there's gonna be a long it's gonna eight bytes on this so by eight
20:32 um and we can have 16 here right and so on and then these ones are uh are labels
20:39 and these here's constants that this is assembly again this is not machine code right so the assembler will
20:47 chunk these into the you know the the the read only it depends elf or mac or
20:52 whatever it'll chunk these things into just data uh read all the data into an
20:57 executable if you were to build an executable here
21:01 and it will and these are gonna do the same thing essentially so what it does is for for for this for example which is
21:10 down here this is a um an address of an instruction and it does this fancy thing lbb06 is a
21:17 audio generated name for a label that is up here and this is going to be
21:25 see we can just look at this so lbb03 corresponds to
21:30 this condition here and zero four to this since you're five and so on right so jumping back to um to the code i'm
21:38 gonna stop here and we can look at the right side as that now we know my little
21:43 um graph macro here just grab thing is to erase those constants unfortunately but everything else is there
21:50 so it did generate a jump table for us it's kind of interesting um
21:56 and gcc will do the same thing you can use things like you know godbolt.org and stuff to explore these things if you don't want to do it like this in terminal um okay so at the at the end so what it
22:06 does here is just loading uh the uh the address of the table of
22:13 the jump table into register 8. and the next thing that it does
22:17 uh so we can follow along up here right it increments the program counter
22:24 i'm pretty sure that's what it does oh i should and i'm pretty rusty at x867
22:33 and written in a long time but let's just get past that i don't know i might be lying and i want to say something that i'm not confident about the next thing it does is just to uh to
22:45 oh shoot oh yeah it would be the uh it would be this thing with this thing
22:50 so it would essentially uh load the uh the operation
22:55 that's what it's doing here okay it's the ar and that's the operation
23:02 and then it now has an offset so that's that's why it's adding this okay obviously that's what it's
23:07 it wasn't obvious to me for a second so i shouldn't say that but in the yum table it added um it added on the value in register c here
23:19 or cx which would be the uh where we have uploaded the current upcode
23:24 all right so up here we've done get up and they
23:29 get loaded into register cx and then adds that on to the offset of the um table and now you got the address in the um table i think that's what's going on
23:39 um and then just does jump uh now x36 you know it's an it's an um legacy laden instruction set there's you know 10 000 pages of manuals and
23:49 stuff you can read about it so you know some of these have a little quirky things like q it's like a quality you know it's like uh you know you have a different letter here it's just uh the the size of the
24:00 argument so that's what i'm saying is jumping at you okay so jump here this now
24:06 will move there if you imagine that the program does this down like this right it comes to this jump instruction now what it's going to do is like not just go here it might but
24:16 it's probably not going to do that so we'll jump to whatever uh
24:21 executable address is loaded from register cx still asterisks means
24:26 the load the value from it so uh so that's the jump that we're
24:31 doing right so if we're if we now found an operation called uh drz
24:37 right then what's happening here is that we'll do a jump pass all those things down to brc
24:42 so now it would go down here and then it might jump down here right if this is
24:49 brz and now it will do a thing so this is the actual this is the actual code
24:54 or an example of the actual code that one of our virtual instructions would would um equate to right this is what the virtual
25:02 machine would translate over to x86 in this case or some other architecture like arm and when it's done in this case just you know we can see it's very efficient the only overhead here for uh for having
25:15 an extra instruction operation here um is really is this uh jampatian this
25:20 is essentially zero overhead in this case which is very neat right
25:23 um so it does the thing it does uh in this
25:29 case we just fill these with uh just some some assignments to this this
25:34 register set uh we just do this to make sure that the
25:37 code that the compiler doesn't eliminate this code so we can look at it uh so the the the
25:45 move l we see here which is a move like a you know copy
25:49 copy this to that um that's essentially what's happening right we copied this value to this location
25:56 um and then it does what it does next right it's kind of interesting so now it
26:03 which i think the com if we look at this yeah so it's interesting so uh the
26:09 compiler has now generated roughly the same code twice
26:14 but it's skipping this load effective address instruction but otherwise it does something very similar we can see the add cue here
26:23 being reflected here right the same thing and uh it is
26:31 setting something to two yeah let's skip that um and then it's jumping again and now
26:37 jim's done other thing it does another instruction and then jumps back and here what
26:42 essentially what what all of this code does it's the same stuff that we just walked through up here it's just again the conveyor has decided that it's got to be more effective to duplicate this code which i'm not sure is true but that's what it decided to do so it does the same thing um it you know increments
26:58 our program counter after it's loaded the next instruction and then it does
27:04 you know a shift and an end to extract the the value of the argument a
27:10 again you can look up this here if you're curious what that means
27:14 um and then it does the the looks up the
27:18 upper the the operation of the instruction so this macro corresponds to
27:23 this thing right and then it does a a load from the table and now we're at
27:28 this this point here it loads from a table it adds the the operation right
27:35 and then it jumps to the next level so real efficient okay uh so it's kind of cool we've seen here that you know if you just write a switch statement in um in many cases the
27:49 uh the compiler will just be really smart and modern compilers are just like
27:55 you know is mind-blowingly good or smart maybe it's it's a stretch but mind-blowingly good at um generating really efficient code so it's
28:03 almost like the the higher level constructs you use the better code you're gonna get uh you know it
28:11 could feel it you know exciting or enticing to try to do what we're gonna
28:16 try to do next right to implement this ourselves and we're going to see that we
28:20 we're probably going to get a worse code of at least the same code okay so let's
28:26 stretch switch to strategy 2 with strategy 2 uh and i'm going to x this uh excess out
28:36 um just to show you that it works right okay so strategy two
28:41 now the corresponding markers here are gonna change now we're going to instead of having this high level switch we're now going to explicitly
28:49 load the address of a an instruction i'm going to build a
28:55 table so we generate that with a preprocessor macro down here
29:00 this thing uh i mentioned this earlier this is sort
29:03 of like a quick little way to just grab the address the effective address of a label
29:09 this is something that the compatible will do for you it's a bit tricky since you have a shaken an egg like you know
29:15 where's the where's the address of it you don't know until you actually like assemble everything into machine code
29:21 um so that's how we can build a table here
29:25 i'm not sure i don't think this is c11 but i think it is some sort of you know extension that's been supported by clang and most high-end c composers
29:34 for a long time in jesus here for sure it's it most of these weird things are
29:37 gcc inventions so i'm guessing that's what it is uh so we're building a little
29:44 table instead of doing switch we used to go to you know a specific address
29:49 and then instead of doing a case um uh sort of a case thing we just
29:55 we just enter a label so that's in c again how you know you will uh you will
30:00 tell the compiler that like whatever follows this whatever instruction follows this this label in the code that you generate give it this name right so that can reference that in other places go to's and stuff and um and the the rest of it is unchanged there's nothing else that differs it's only the way by which we dispatch the next
30:24 you know the dispatch the or root rather uh the work we're gonna do right so
30:28 we're doing different work for different instructions so we see that it works the same way where we run this and if we run this
30:36 thing now again uh that just composite assembly and we inspect assembly we're
30:41 gonna see that we're actually getting worse code and i think what is happening actually is by pattern recognition i think it is
30:48 doing some it is doing some inlining perhaps
30:51 but um that doesn't really matter what it's doing we're just getting a lot more code
30:59 uh if this actually runs faster so this is why earlier i was saying we we're not
31:03 concerned about performance at least i'm not concerned about performance maybe i will never be with rsm
31:11 and definitely absolutely not at this stage like this is not how you get
31:15 something to run really fast and efficiently it's not by like trying to
31:20 figure out like which instructions it generates before you even put it to good
31:23 use so that's not at all what we're doing here this might very well be much faster
31:28 than the other thing it doesn't really matter what i wanted to show you here is
31:36 that the there is a there is an intimate connection between the way like we structure programs right
31:43 and the what what code is actually generated from this high level c right
31:49 so in this case we're like well you know we did some we have to do extra work right we have to like generate this table we have to use this possibly extension that's not standard right um to make this work right and what is the
32:03 upside is the code better no it's just like more code right uh it worked just as well it's not like we can do anything new so that doesn't seem to give us anything over a switch statement now on with some compilers
32:17 um it's not true for uh for the for the big mono ones anymore
32:21 but for some compilers like one big switch statement is a challenge for
32:25 things like register allocation um like you have you know a limited number of registers in x86 you have just a couple of them on arm64 you have a lot
32:36 of them on some systems like x86 now 32 bits x86 you have just a couple you know
32:40 is it four or five or something like that anyhow so and and most compilers
32:44 what they will do is like they will do register allocation per function a function basis function for function basis and so if you have a bunch of little small functions here and there
32:54 it can you know just and then just a few locals a few variables right uh and let's say that those are
33:01 you know uh the same or fewer numbers than the number of reducers register allocation is pretty straightforward you just hand them out and you get right
33:10 a regis a good register allocator will um uh
33:15 will be able to make really good use of that even if you have more variables than registers anyhow that is a huge big
33:21 topic very interesting if you want to go read about reduce your allocation the gist of it is with one big switch
33:27 statement you're going to have all of this code and some compilers will struggle to do
33:33 efficient register allocation for that and your compile times are higher and your code quality is lower and stuff like that now i don't at least from my
33:41 explorations and my experience with with clang that's that's not true in this case so
33:47 for us a switch statement so far seems like a really good option okay so the third one is to use tail calls um
33:55 so i already talked a little bit earlier about what tail calls uh i keep looking
34:00 at my other camera that's if i'm looking out over here yeah i forget you guys over there um yeah macros for the win edward
34:07 um in an earlier part i was i was briefly mentioning that um i am not shy uh for you know about for with
34:21 macros i think when um when macros can improve like readability and the understandability of code i think it's fantastic but it's a really thin line to balance there between obfuscating something and
34:34 and making something clear right so like next here as a macro like what does this
34:39 do right if you're if you didn't write this code and you see next year you have to go look that up right to understand what this what this line of code does so
34:46 you know it there's definitely a cost to it but sometimes it's it's pretty neat
34:50 um i think macros in particular for stuff that we're doing right now experimenting and figuring things out um can be a really helpful tool to just
34:59 like get stuff done you have to explore things right so
35:04 if if i weren't using macros here right i would have had to sprinkle these if uh
35:10 if statements pre-press a preprocessor if statements throughout the code down here and it would be much harder for us to read this and and see that you know
35:18 that these things are actually the same anyhow so the the third strategy here
35:24 with uh tail calls now i think i already saved this so we already got a sneak
35:27 peek of the code he generates and surprise surprise it generates really
35:34 surprisingly good good code again now we're looking at assembly this is not actually machine code there uh
35:41 there might be some final optimizations down here like push q for example like
35:45 we'll um will uh
35:50 let's see here if i remember correctly so push q will increment the uh the
35:55 stack pointer um by the uh the rm
36:02 the reducer size right of the so q it bytes so what it does is that it takes the
36:08 value you give it it um it stores it into memory at the current
36:15 address of uh the stack pointer register so this essentially is like it's an x86
36:19 thing some other architectures have a push um thing too but really it's like a it's
36:24 like a bundle of instructions so it does a couple of things um
36:29 so um but it does it more efficiently like this it's an actual you know it's an actual instruction not not like a an
36:37 assembly construct so it can be a little bit more effective than than doing like three or four instructions that would accomplish the same goal so that's something to notice here because each of these
36:48 functions so now instead of having switch cases right we're dealing with functions so one function per op code
36:55 what's nice about this from an ergonomic perspective is that we now have these little like
37:00 unit these little functions that are easy we can just kind of you know we can we can move them around right
37:06 just trying to like get into the move state we can move them around uh the
37:11 order here is not important i guess it's we can
37:19 sort of like you know deal with these in the same ways that you will deal with functions we can take the we can take the address of the function uh in a uh standard way that is not
37:30 non-standard now i'm keep making assumptions this is non-standard if someone knows if double ampersand to take the addressable label is a standard or it's not like drop that into the chat it would be interesting to know i'm just assuming that it's not standard anyhow so we can do that in standard way take the address of a function um and the tail call part here it's important
37:55 so there's a couple of things that that that needs to be true for a tail call to
38:00 be guaranteed in a compiler like clang and gcc the the number of arguments and the type of the arguments uh between the thing doing a tail call
38:11 and the thing it is calling to must be exactly the same um so for that reason
38:19 is this is not entirely true like so so the so this is for the guarantee for
38:24 things to be a tail call to be true uh then in practice the up the
38:29 optimizing part of these compilers well like you know even if you give it just a few arguments it will it will do a tail
38:35 call for you or optimize things to tell call as far as it can you know with
38:39 unless there's any undefined behavior or anything like that but to get a guarantee for a tail call
38:44 this has to be true the argument has to be the same for the color and the quality so for that reason uh i'm setting this
38:52 up that there are uh and this is is inspired by a silly widget maybe that
38:56 does this or maybe it's historically i can remember but again the power of uh of macros here
39:03 right so define the parameters and this makes it just just saves me from typing
39:08 out this on every place where it says params so every every handler function right takes these
39:15 parameters and then when it calls to the nexthander function um these are the arguments and he has
39:20 kind of typed those out next to each other uh since you know um if we add an
39:26 argument here or change a name to it it's much less likely that
39:30 i'll screw up and forget one about it the next
39:40 call here so up here our next thing we do a go to to get up to this label right
39:44 that did the next uh that extracted the next instruction increased the pairing
39:48 counter in uh in this version sorry for the
39:51 scrolling in this version uh next is a function
39:56 call just like all the other things so it's all function calls and calls to this function which does the same thing so it extracts the
40:05 the next instruction from the our list of instructions it increases the program counter
40:11 and then it just jumps to the into this table right
40:18 based on the outcome and this table has been generated down here this code looks almost identical to our
40:29 and this is the entry function right so we had one up here it's just a single
40:33 function with a loop in it down here instead we have a function that is the starting point and then it passes the torch on so to speak right
40:41 it's like okay now you go right and now you go um
40:49 and so at this point we do eval next um this code runs right we do what we just
40:55 talked about and it jumps to the next function so it jumps to one of these functions if we look at the
41:02 uh the assembly that uh clang spit out we have uh
41:11 the entry function on here so this just sets things up this does the same stuff we talked about before it loads you know the the offset
41:19 of this table when you have to talk about that and then it just does a jump
41:22 to eval next right so this is where we
41:26 begin our little journey and and in this case i think it added it to the end it just you know it just
41:32 moves code around tries to be clever about that that's what it added it to the end and so eval next
41:38 will so here's where we start seeing some of the potential downsides in code quality
41:45 between tail calls and using a switch so it for some reason pushes
41:54 the uh the stack base pointer onto the stack
41:59 and then it copies the uh the base pointer to the stack
42:04 pointer and essentially it does like stack work but it doesn't actually use
42:07 the stack um this function right so none of these
42:12 functions use the stack so it's just like this this is just busy
42:19 work it's not accomplishing anything so so the commodity kind of failed there unless i'm like totally missing something but uh i don't i don't think this is used in the stack at all so anyhow it it it
42:28 takes the hit of doing that work that doesn't need to do and then it does the
42:33 work of you know loading the instruction increasing the the problem counter and
42:37 looking like loading the address getting to upload the address and into uh it
42:45 choose this register x and that does um to uh whichever function not just that
42:50 right so in our little example program let's see if we can scroll up to one of the printouts we first have a move and then after that we got a load and
43:01 then we got a brz and so on all right so uh the first thing it's going to do
43:06 is going to jump to here right so we'll start executing code here and here again we see it again does work uh to use to
43:15 save the stack state essentially um and it does the the the balancing work
43:21 down here to restore the stack state even though it does not use the stack
43:25 uh so maybe maybe there's a way to say tell the compiler um i guess a lot and then you know
43:31 to tell the compiler as you're finished my thought to tell the compiler um uh
43:38 not to bother with those stack things or rather to say like hey this this function is never used anywhere else than here but i
43:45 think static should accomplish that that you know it's just local to this this file this compilation unit and uh the rest of the code here is like
43:55 pretty step forward it's just what we're seeing here so it
43:59 gets the argument a um and it loads another registers and
44:04 move thing is essentially just copying uh the value in one register to the
44:11 to another register and then it jumps and it keeps going
44:17 so this is a totally valid and viable solution for us this tail call
44:23 approach uh it will have roughly the same runtime characteristics it will have the same limitations which are
44:30 uh which are non since it does not use the stack right it can just keep
44:34 it it won't sort of uh grow the stack or anything like that even if it it might appear too right
44:40 this is doing a call and call and call a call um but unfortunately the the uh the ergonomics
44:48 here and the code repetition i think just kind of erases some of the usefulness of this at least with this
44:55 such a good compiler like clang that is able to uh to produce really good code with efficient register usage for the switch case um so i think the switch case it is
45:09 uh it is portable it seems the hundred good code in this case and if it ever becomes a performance issue we can always do one of the other two approaches for for uh for that to
45:20 improve performance but again performance is not a
45:24 question right now okay so what you're going to do is uh i'm going
45:31 to make a let's see i'm just going to make a copy of this i just want to keep this around so i have a little like
45:37 i just pasted in a temporary buffer so i i
45:41 tend to keep little copies around of things that are written so i can go and reference them again later um but now we just can remove all the
45:52 so this bye bye check in on the stream and see if it still works yeah it does amazing
46:01 and uh almost no one is watching that's amazing yeah this i can i can see how this is boring the most um okay so we can just drop this we just inline
46:21 put the switch here uh next shortening precise
46:27 the the case might be a simpler to use case
46:33 r up um
46:51 and all of this goes away okay now 30 40-ish lines
46:57 that's pretty good i mean it doesn't really do much
47:03 um and another thing we can do here we can just drop this label we can do a for
47:09 loop and before doing this just to verify that
47:12 this doesn't have a weird impact on good quality but i don't
47:16 think it would let's do this
47:24 should never be able to get outside of it um so that's guaranteed at this point uh
47:30 and we can return from our function and let's get these cups next to
47:41 okay see if that works
47:45 yeah that's roughly the same code
47:49 uh lbp1 let's see
47:58 15. i see just stay slippery ah interesting
48:22 yeah we definitely get more is more code in this case um and different go to you
48:27 know so the again i keep talking about perf but i think it's at this
48:32 it's just important to keep mentioning that this is not about performance it's more about the the quality of things it just feels good you know if you're like making i mean this hobby project is not for um for any commercial purposes right like so the incentives here are not efficiency in terms of money or getting
48:49 shipping things or anything like that here my incentives are like about making
48:54 something that is that is nice you know that is kind of elegant that feels good
48:59 um and so that is the driving motivator by me is like doing all this work right now i just wanted to feel good you know
49:07 it's like if you make a nice uh wooden table or something like that then you
49:11 for yourself you might take is the time you need to make sure that the the wood
49:16 uh and yes it's the right piece of wood and uh you know the way you fold your
49:21 wooden stuff is like all the little things that uh commercially might not make any sense to do you you would do it right uh or your home you know well little
49:31 things that makes things nice um
49:35 so i think it really doesn't matter what we're doing here this label is also shorter than the for loop ish
50:04 stick to this let's be good uh more familiar to someone else who might stumble up endless up on this kid to see that yeah break is a familiar
50:20 thing capital letters next is is not fishes okay i think we're gonna stop it
50:32 there um this has been pretty long what we've done and let me exit out of
50:41 this and run the program again uh so what we've done now we've we've put together a uh
50:46 the the beginning the sapling of uh the
50:50 evaluator of the virtual machine uh it seems to be working we know that the code we're generating is
50:57 nice and small and reliable we've done it in a standard
51:01 way portable way so if you toss that into toss this into
51:07 another compiler we can be fairly confident that it'll work in behavior the same way the code is very readable
51:15 it's very straightforward code we so far we might add to these locals that
51:21 we're using here but so far like we're getting we're getting by with uh
51:25 with fairly small and clear code we just deal with a
51:29 single function um it's all good so in the next in the next part
51:37 um we will be uh flushing out the uh the other instructions that we need to complete our uh function our example
51:43 function here uh and then we'll see the running so
51:47 here we're calling it so we'll see that function running and we'll hopefully see this the correct result coming out at the end and then we would have
51:54 implemented and completed the first big milestone of this project which would be
51:59 to have a full pipeline where we construct a program we evaluate the
52:04 program into the uh the actual uh you know results so it'll
52:09 be a complete very small very limited but still complete and valid a little
52:13 virtual um virtual machine so uh check out next part and
52:18 i hope you enjoyed this one