"Transcript of RSM Episode 9"

RSM project page

00:04 okay so um i actually got a working bully
00:11 it's uh i'm kind of surprised myself uh exactly a week later so
00:16 a week ago i decided i would do this little project to build this little
00:20 uh sort of you know virtual computer virtual
00:24 machine thing right to come up with an instruction set implement a virtual machine and um and a little assembler for it and stuff like that and uh yeah i just finished
00:35 with the assembler today at least for for a simple sort of construction set um
00:41 let's see right now so this is uh let's have a look at it so i didn't get around to
00:47 making this program read a file from this but that's just a couple of lines i
00:52 just decided that um you know it's it's been sort of a long day i've i've done some i've had some
00:58 other stuff i had to do today and then spend a couple of hours and listen it got kind of late anyhow
01:02 um you can just imagine this is right from a file you know so from this point on um
01:10 the uh the assembler has a parser right and
01:17 then it has now a code generator uh i ended up with quite a lot of code
01:22 it's not i guess it's not too crazy for being sort of like a a parser
01:27 and um a compiler or like a full compiler but anyhow
01:31 you could use some nice little little tricks like here this is the this is the code generation for uh for
01:39 generating and up code so the uh let's just review the stages
01:45 first so there are three stages to uh the assembler the first one is so this is the entry
01:50 point the first stage is that we parse the
01:56 assembly kind of program like we parse this text right
02:01 and if any errors occur we kind of report that there's also a new reporting
02:06 structure it reports like a little a piece of information here to
02:09 to the caller that you can print and and has you know
02:13 source line information stuff that anyhow so the first stage is to is to
02:18 parse this and we looked at that in the previous parts um and the output from that is i just call
02:26 it a module right now let's see where's the parse function here um so it's called a module right now it's it's really is a list of all the functions that were present in the source text so
02:39 it just keeps like parsing functions until the end of the source text and that returns that list of functions
02:45 the next stage is analysis i stubbed this out i actually didn't end up doing it needing this instead i integrated that into the code generator since that needs to do some checks anyways and the code generator needs to be aware of things like for this operation what
03:04 exactly are the acceptable arguments right needs to it also needs to understand if uh here we have two move operations move
03:13 instructions right and the first one takes the form of two register operands
03:17 and the second one takes the form of a race dropper and an immediate operand
03:22 so you know it has to it has to have that sort of knowledge um
03:27 so why not just do the analysis in one place it's kept us around uh since i
03:31 might play around with this next week but we'll see
03:35 the final stage is cogeneration this is where i spend most of the time kind of
03:40 yesterday on today it's pretty straight forward i'm not going to go into much detail how it works today maybe i'll maybe i'll do the
03:47 next video um but for code generation we're
03:51 essentially first setting up some state that we just need for code generation
03:56 uh there's a all of these things isn't a struct at the top kind of has some
03:59 better documentation of these short names but essentially we allocate some space for the instructions that we'll be
04:06 producing and then some uh some state for the the function like a table of all
04:10 the functions that we've that we created and then we have two
04:15 arrays for tables for labels we have one for defined labels and we want four
04:19 labels yet to be defined or on the phone and
04:25 then it's pretty surprised from this point we use loop throughs we get the module here so we just loop through the module which contains functions one function
04:34 after another and then so we then we call the john fun
04:40 in function coding thingy let's can have a better name
04:45 um and this one doesn't really it doesn't do anything with parameters and results right now since there are no locals support but in the future you know you
04:53 would produce kind of locals and allocate resistors based on these things uh if the function doesn't have a body it just does nothing since you know a function without a body
05:04 in our virtual machine code would really
05:07 not mean anything it we would have to if we wanted support i would have to insert a return instruction otherwise you know you could just fall into the next function
05:18 um and the next thing it does is just to
05:24 save the the offset to the function and then it goes and then does the same thing we did here it goes through each
05:30 block of the function for each block where um first recording the uh
05:37 the the blocks position sort of logically speaking like the label will record at
05:44 um it's it's offset and that's how we can then later do things like referring
05:49 to stuff by name right so that if we look at this code again like we we see
05:54 here that we'll we're going to branch to the label end right so we can name
05:59 this label we can just call it anything um and then here we'll we have a another label right and another label and we name that here and so on so we can use names for labels so
06:10 labels has a certain offset into our like program
06:13 um and then what we do here i'm not going
06:17 to go into detail about this but essentially you'll have um you'll you'll have this sort of like a chicken and egg problem right with with
06:26 labels with this kind of assembly so at this point right um we define
06:32 label b1 and down here we're referencing it so straightforward right
06:37 we define it and then later we reference it we just know that that's at instruction four i guess right however it gets tricky when you
06:47 reference the label before it's been defined right so we reference the end
06:51 label at this point we haven't gotten the heat yet right so we don't really know what the offset is so what we have to do is to stuff that
07:00 into a table and later when we define this we'll go back and we patch this
07:07 this up with an actual distance or or value for the label so that's what this thing does or part
07:13 of it this does the the latter part and then you know again we just go
07:17 through each of the instructions of the uh of the block and remember from our
07:21 little uh syntax which is compounded readme
07:26 file here a block contains a a list of statements and a
07:34 statement it's either a operation assignment or a binary op and this i'm
07:39 not going to talk about this now but the syntax supports this the uh there's then
07:45 a an ast transformer that just folds this into an operation um and so that the code yet never
07:51 actually sees this so the coding just has to deal with these two things
07:56 and so it does that it switches on the type and if it if it gets something else from the asd it'll be you know it'll be complaining pointing out on the source code for you
08:07 if it gets an assign uh an assignment send assignment there's really just you know something that looks like
08:15 x equals y right or it might actually we have two examples here so it really looks like this a equals an operation
08:20 and some arguments possibly and the other form is just like
08:26 a equals b and the requirements here are a little different um and so this what this function does really it just transforms the asd
08:36 from a equals up bc to up abc and in this
08:42 case it just um transforms it to move operations that's
08:46 essentially what this is and then just calls our gen up function
08:50 so really there's just there there really is one
08:54 function in a way there's really just one substantial function to the code
08:58 generation which is kind of elegant um so there's not a code path to here
09:02 right so if you get an operation straight out um and this is the meat of the
09:10 generator and what it does here is that you know it gets the this node represents the uh the operation that it's gonna generate an
09:17 instruction for it looks the first thing we need to do
09:22 is to look at the instruction encoding for that operation and so we've defined
09:27 that in our in our isa little instruction um definition here right so we have a
09:34 name right a name that we used internally in the in the you know um in our c code here
09:42 then we have an arguments that is sort of like the uh the instruction encoding
09:46 or the argument encoding for that particular operation
09:51 um and down we have just an the name that's actually used in the uh right now they i think that they are
10:01 just exactly matching the lowercase but you know that allows us to have like
10:05 weird things like you know if we wanted to have parentheses around it or i don't know like um something that's not a valid you
10:12 know um c identifier
10:16 and then there's just comments about the semantics for it right so anyhow this is the key here so what we need to do is to map
10:27 the operation type to its its arguments right in the in the code generator
10:33 and make a change here oh i did so let's just undo that
10:38 so do that in uh uh with just a switch
10:42 so you generate a switch that has a case for each operation right
10:47 and then you just issue a go to so this is
10:51 essentially just a table this is what the compiler is going to enter it for it's like an interaction table here that maps one or more sort of
11:07 argument encodings there is labels and so i guess it would be nice to have like
11:14 a little drawing but you could you could imagine sort of like a freeway maybe
11:18 like a road where you drive with a car like with a lot of lanes and then suddenly it sort of you know gets
11:24 narrower um and you have the shoes if you're gonna go south and north or whatever right that's essentially what's
11:29 happened happens here we go from a lot of things just use the things that matter in this case this is the argument
11:35 encoding so we jump from here down to one of
11:39 these this is just a preprocessor macro to help the code not
11:45 be too messy here uh and we have two formats
11:50 we have a format where we know that the um so again these are these are perfectly correlated right so we have a format here that is for
12:00 example a b we know that the b is names a register and nothing
12:04 else if it if it tries to have a number like 1 2 3 in the assembly it will be an error and then we have a
12:13 either then we have sort of like a and
12:20 a variable sort of type of argument so a here either names a register or it is
12:26 an unsigned immediate value in the in the ace argument and this is roughly the same it's either
12:34 names the register or it is a signed immediate value in the a argument and
12:39 then we repeat the same things for argument b c and d
12:48 now we start seeing some of these kind of error messages and where the you know
12:53 where the enough inline analysis goes here like we're checking and making sure
12:56 that we're getting the correct number of arguments and we're pointing things out
13:02 and stopping things if that's not correct
13:08 excuse me um for example we even do things like
13:13 if you try to give something that only accepts a register if you try to give
13:18 them immediate we'll point that out that will just jump down here from this
13:23 thing up here so jump down here to print out this error message says you know the last argument argument must be a register uh and we named the the operation to make that easy for you to to spot the error in the code
13:35 um and then the other piece so this function kind of broken up into two functions uh since since this is called in a bunch
13:44 of places here it's separate shankar code so get get i arcs is like get into your
13:50 arguments from the ast note and sort of you know
13:55 put those into um uh in 2ds oh what did it make it mistake
14:01 oh yeah it should be i64 let's make sure it's still no it doesn't
14:07 work at all dude okay
14:23 yeah so our register size is a is an i-64 should
14:28 probably make that a typed up somewhere so that can be changed easily but that's
14:32 how it is now so this thing called so this this sets up there up to four arguments right a b
14:38 c and d uh and so what this function does is
14:42 that it uh it takes the information from this right about um
14:49 you know whether we want a uh let's see here uh the the number of arguments the uh
14:55 the storage for the arguments to be parsed so to speak or checked and then the uh the ast node
15:02 and this is just some some um it's in generator state we're passing
15:05 around and this function up here it's it's it looks really messy but it's actually pretty simple so what it does is that it um
15:16 it first it first loads the the value so that we
15:23 have up here where is it this mostly as a check for you know producing error
15:27 messages so this just loads the register number
15:33 so you know r5 or r0 or whatever right so like this one so r1
15:38 so we know that for for any instruction encoding
15:44 that uh for the for the first arguments of all of these instruction encodings they're all named or this search they cannot name immediate immediates can only be the last um argument right so what we're doing is that we are just
15:57 checking making sure that these are populating their events for
16:03 all of the first um the head of the arguments what is the correct name all but the last okay all
16:08 but the last um and you know if we had any any issues
16:14 then we jumped down to see if we got the the wrong sort of um
16:18 so number of arguments we jump down here
16:23 and we print out some some error message or print out we report it now
16:27 and then we print it out in our reporter function and and you know if we got less than we wanted we'll say not enough if we got
16:36 more than we wanted we'll say that there were too many and we point out where it is and we point out what we got and what
16:43 we expected and which output is so we can just try it out real quick um
16:48 so x whatever uh so now we're gonna get here
16:54 uh in our factorial function line six column eight too many arguments
16:59 for the multiplication one three but got four right
17:03 so to get it too few it'll say not enough arguments you know it wants three
17:11 and and then this thing finally what this
17:19 function does is the last argument which could be an immediate um so now again we're going from we're not
17:27 going to the virtual machine sort of uh instructions we're going from ast to
17:31 those destruction straight so remember that so now we're looking at is we're looking at the lost arguments we've winded this
17:39 up so arg here is the last argument and we're looking at the the type of that and actually let me
17:51 we do a bookmark here let's clear my bookmarks go up enable
17:57 ast logging okay so now we can look at the the asd
18:02 that we're parsing here so so it's going to look here if it gets a
18:08 integer register like one of these and then it it loads this and it reports
18:13 back to the caller that the last argument is not an immediate so
18:18 the last argument is the register if he gets a name
18:23 it it does the labels that were talked about before it goes and finds the label
18:27 that is it is named and uh if that label
18:32 let's not go into the details about the label the referencing because you know it involves that sort of chicken in the egg thing but um
18:40 but essentially what we get back here is this just looks up the label essentially and gets back the the relative offset so
18:47 it's a signed number relative offset to um
18:54 and if we get and then reports back that it's an immediate sorry um if we get like a literal like here for
19:03 example we get the inter integer a decimal literal one
19:07 then um we make sure that it fits that it
19:12 wouldn't overflow the value right so if we go in here and if we you know enter a really large
19:18 number here we should get an error message this value is too large right so that doesn't
19:23 fit in as an immediate we would have to use
19:28 a constant flap and again we do the same thing we just store the arguments and now um report
19:34 back to the color that was an immediate and for all other things that are unexpected you know then we'll just report that um
19:42 we we got something unexpected there we point that out and so on
19:47 and something that's kind of neat about like this and the partial looked at like
19:51 with the with the if you remember the the nil node that we used instead of
19:56 like just uh you know a null pointer uh we use a similar technique here for
20:01 like for error handling um generally i think that it's it's better
20:06 to write parsers and sort of compilers or whatever you call this coding or anything so that when an error occurs it can keep
20:13 going at some sort of sensible state that won't break the program it might not make any sense to use the results from
20:21 it but the program can keep going without sort of being in an undefined or weird state
20:27 so that's you know as we're parsing things and let's see what is it called
20:31 nail something big nail right so
20:35 when we um when an error occurs right instead of
20:38 just like not doing anything or returning null here and sort of having
20:42 unknown what's gonna happen with the caller now we create this nil node and
20:46 we just return that and then we report the error and similarly
20:56 oh my label stuck good uh and similarly what we do here is uh we us we assign a
21:01 value to the argument that we're parsing along with reporting the error
21:05 so that we can keep going and we can keep getting messed like error messages
21:11 and since last time we looked at this we used we as using the log function to just print out errors as they were happening now today i changed that so
21:20 as an error occurs is that invoke this callback that you pass that you pass on
21:26 to the assembler so you pass on this little callback you can also pass on a
21:30 some arbitrary like data to yourself that has looped through which is pretty common so user data whatever
21:35 and so this is being evoked then if we return false here the whole thing is
21:39 going to stop um parsing at that place instead of keep
21:44 going so that way you can control how many error messages you want if you want to like limit error messages and stuff like that so if we have
21:51 see if we have two errors here and we run this we gonna get two reports straight too many arguments
21:58 too many arguments and then let's say that we only we want to stop as soon as
22:03 we get any error we can just return false and then the parser will stop
22:13 and uh yeah i think that is that that's a pretty quick sort of summary on on how the code generation works um and
22:23 changes since since yesterday um maybe i'll do a more deep dive into
22:29 uh what um sort of uh
22:34 to to build it this way i don't know and add the ability to use like give a file
22:44 to this program that we're running here right so let's look at that so we have this
22:56 program that we'll be building right which is 403k with all these with all
23:01 the instrumentation let's just build a non-debug bill and we can look at that
23:13 so 73k 67k the entire compiler and virtual
23:24 machine including our sort of diagnostic
23:28 messages i think that's that's uh that's pretty well done and we can run it of course and you know we'll we'll uh get our little a little
23:36 output anyhow so the next step here will be to you know say something like this factorial right you pass it a function
23:43 or even or even like echo
23:51 program you know or or whatever that program i just have that run so that's
23:54 going to be the next step that won't take long that they'll probably only take a few minutes to implement um but still i'm i'm pretty happy we're just turn out um i'm pretty impressed
24:06 again as i was saying earlier in this video that i actually finished this in in the week
24:12 um and reflecting a little bit on the the whole like video stuff i spend i spend more time
24:21 i think maybe at this point it's it's even out but definitely the first few days i'd spent way more time on the whole like video stuff than getting that working and working around all the quirks and stuff around that than actually working on the project uh
24:35 so to probably have taken sort of a backseat at least time spent
24:38 and i think at this point it's probably about a 50 50 the time has been done uh
24:44 on all video production stuff which is like surprisingly complicated in the year 2022 but that's how it is uh
24:52 and and this product so yeah it's very satisfying uh very satisfying little
24:55 thing and we'll see um where i'll take it from
24:58 here thanks for watching and i hope to see you soon