Comparing C to machine language


So I have a program here that prints out Fibonacci numbers, and I want to walk through how we get this program compiled to machine code and running on the computer. But first of all, just to see what the output of this looks like when we run it is it prints out Fibonacci numbers up to 255. And actually, 233 is the largest Fibonacci number that’s less than 255. And then it just starts over. And it just does this forever and ever. And so just as a reminder, Fibonacci numbers start with 0 and 1, and then the next number is just the sum of the last two numbers. So 0+1 is 1, 1+1 is 2, 1+2 is 3, 2+3 is 5, 3+5 is 8. So on and so forth all the way down. And so just to walk through this program to make sure we understand how it works: we’ve got three variables, x, y and z, and we have this loop that just continues forever and ever. But then inside that we set x to 0 and y to 1. So x starts out as 0, y starts out as 1, And then we have this loop here where we print out the value of x. So x is 0, so we print that out. And then we calculate z as being x+y. So 0+1 is 1. And then what we do is basically kind of shift all these numbers over. So x gets equal to y, so x becomes 1. y is equal to z, so then y becomes 1 as well, even though it was already 1 And then we go through and loop it through again. x is less than 255 (it’s only 1), so we loop through again. We print out x, x is 1, and then we compute z again. z is x+y, so 1+1 is 2, And then we kind of shift things over: so x is equal to y, so x is equal to 1, and then y is equal to z. z is 2, so now y becomes 2. And then we loop through again, because x is less than 255. And here we print out 1. We compute z, which is now 1+2 is 3, And then we’ve got to shift things over again, so the two shifts over here, the 3 shifts over here, so x equals y, y equals z,… and then we loop through again. And here we print out the 2, we compute z as x plus y, so 2+3 is 5,… And so on and so forth. And you can see as we do this, we’re printing out 0,1,1,2 and so forth. So 0,1,1,2… and it keeps going like that. So that’s how this program works print out Fibonacci numbers. But what happens when we compile this? So, if we compile it, by running… this is the GNU C Compiler, and I’m saying the output file is just going to be a file called fib. That’ll be the executable that will run, and then the input is fib.c. (Which is this this code here, which is looked on) So if we compile it then we can run it, but what I did here is actually disassemble it. So what this command does, This is just on my macbook. I don’t know if this will work on other computers. Whatever, but it worked on my macbook… and it prints out the machine language instructions that were compiled. And so we’re looking at the compiled version of this program that we would run. And so I want to walk through these instructions, just to see if we can kind of, like, figure out which of these instructions kind of correlate to what’s going on in the original C program here. And so if we start here, the first couple of things here just kind of setting things up. Everything up here isn’t really part of what I wrote here. It’s just kind of setting up the stack frame. (I think this is a return value or something like that) So in which we aren’t really doing anything with. Let’s just kind of ignore those for now. But here’s where we actually get into the code that we wrote over here: so this first line here, this “Move long”, is moving this value 0 into this thing, which is actually an address offset. So %rbp is the the stack base pointer, and this -8 is actually just offset. So this is referring to a location in memory, and so we’re putting a 0 into this specific location in memory. Which is exactly what we’re doing here, we’re saying x=0. and so what we can see is that x is actually this location in memory, this -0x8 offset from the stack base pointer. So over here, I’ll just write down that 0x8 is the variable x so when we see it elsewhere in the program, we’ll know that’s x. So this line here is basically saying: x=0. So we’re setting 0 into 0x8 which is the memory location where x is. The next line is basically the same thing, except now we’re putting 1 into this 0xc location. So here we’re actually saying y=1. and 0xC refers to the variable y in the program over here. The next couple lines (so now we enter this this loop here), So the next couple lines have to do with the the “printf”, and so I’ll just call out these four lines here, that have to do with the printf, and we’re printing the value of x. And basically what these four lines are doing is, it’s setting up all the things, and then calling this, you know, “printf”. I guess this is a memory address that’s somewhere else in the program, it’s not not listed here. But this is presumably the “printf” function that’s provided by the C standard library, and in order to call it, we have to set some things up. I think this is probably the address of the “%dn” string, and then, of course, 0x8. We recognize that, that’s x. So we’re printing x. And then …I’m not sure what this other thing is, and then the “call” actually makes the call to printf. So this sort of corresponds to that “printf”. So then after the printf we have this z=x+y, and that is actually, these three lines here are the z=x+y. And the way this works is, what we’re doing is we’re moving 0x8 (which is x), Into this %esi register, so we’re saying x goes into %esi. And then we’re saying “add the value of c.” So we’re saying “add y to the value of %esi.” (And you probably can’t read my handwriting here, get a little messy…) But basically what we’re saying is, you know, x goes into the %esi register, and then we add the value of y (because 0xc is is the memory location of y) to the %esi register. And then take the value of %esi register, and put it into this 0x10. And 0x10 is a new memory location we haven’t seen yet. So 0x10, that’s actually the memory location of z. And so then we just put this into z. So these three lines are basically doing the z=x+y, so we’re getting x and putting it into %esi, we’re adding y to it, and then we’re putting the the sum back into z. So then, moving on the next two lines here are doing the x=y. And again, we’re using this %esi register as kind of a temporary location. And so we’re taking 0xc (which is y), load it again into %esi, and then we’re taking what’s in %esi and putting it into 0x8. So we’re taking c, and we’re taking y, putting into %esi And then we’re taking %esi and putting it into x, so from y to x, or in other words: x=y. We’re setting x equal to y. So this is the x=y. And then the next two lines are basically the same thing, except now, we’re going from 0x10 into 0xc, so we’re going from z into y. Or the way, we write it here is y=z, So y equals z. This next line, I’m actually not sure what this line does. And so if… yeah, because this is the %eax register being moved into this other memory location that we are not using for anything else, And we don’t have any other variables defined here… I’m honestly not sure what this line is doing. So if you guys know, point it out in the comments. But after this line we do this comparison, so this is the “compare long value”, and so we’re comparing 0xff, which is the hexadecimal representation of 255, we’re comparing that to what’s in the memory location 0x8, which is x. So we’re comparing 255 and x. And we’re saying “jump if less than”.
So “jl” is the jump if less then, (based on this comparison) and the “jump if less than” is taking us to this address f3d. And f3d is up here, the first statement here that was part of our printf. So if x is less than 255, We are going back up to the “printf”, which is what we’re saying here: “while (x is less than 255)” we stay in this loop… if it’s not less than 255 then The program flow just jumps, comes down to the next line here, Which is just a jump. and this is an unconditional jump. We would always jump to f2f. And f2f is back here, where we start with the x=0. And of course you know once this loop is finished we fall down here, and we are back into this main loop that just goes on for ever and ever. So you can see how, when we compile the C program, we end up with this machine code. And so next I want to take the the machine code, and see how we can take machine code similar to this and load it into our home-built breadboard computer.

100 Replies to “Comparing C to machine language

  1. I believe at 5:53 you are moving a Hex 0 ( 0x0 ) into register "A" low ( which is 8 bits ). There's also a Register "A" high ( Also 8 bits ), and Register "AX" ( which is 16 Bits ) and which contains both of the "AL" ( "A" low ) and "AH" ( "A" high ) Registers.

    http://www.eecg.toronto.edu/~amza/www.mindsec.com/files/x86regs.html

  2. As a web dev, watching this makes me feel like I just swallowed the red pill and saw the real world for the first time.

  3. That was some mac user level of understanding and expanation of assembler code. I am not sure, I don't know etc etc.

  4. Why does it use the esi register as an intermediate instead of directy modifying the memory address? (I'm kinda new to assembly)

  5. rax is a 64-bit register
    eax is a 32-bit register which refers to the lower 32-bits of rax
    ax is a 16-bit registers which refers to the lower 16-bits of eax
    ah is an 8-bit register which refers to the upper 8-bits of ax
    al is an 8-bit register which refers to the lower 8-bits of ax

    gcc -S -masm=intel program.c

    ATT syntax is ok, but I prefer Intel personally… you’re welcome and thanks for the good video!

  6. I love your explanation, especially because it is being made on paper.
    On 5:58 you show the printf call. The values on these section are usually parameters for system calls. Some system call require you to push values to eax, ebx, esi, edi and so on.

    About the zero, I couldn't find why it is needed (but it is, missing it results in a crash)

    https://www.cs.uaf.edu/courses/cs301/2014-fall/notes/printf/

  7. I also check the assembly output of my c++ program to check the efficiency of the compiled code. Also sometimes I'm curious to see how will the compiler optimize polymorphed codes (vtables, etc)

  8. That's not machine code it's assembly language. Machine code is the hexadecimal or octal output from compiled assembler or manually written.. Just saying

  9. Why did this show up? I remember like 9 years ago I was learning C just for fun. Just for fun. I never got past pointers, I did learn it though. Now I forgot everything almost. This brings back memories, thanks😇

  10. Lol when the title said "machine language" I was waiting for actual binary/hexadecimal operations. I already know assembler.

  11. Thanks for pulling back the curtain on the black box that is machine code.

    So basically, magic.

    The computer runs on magic.

  12. Wow you're way of explaining this is really straightforward and amazing! Thank you for this you've just earned yourself a subscriber sir.

  13. not sure exactly which compiler and cpu, but last eax is preparing the z-flag for more efficient conditional operator

  14. Meh… "probably"… "looks like"…. "not sure what this other thing is"…. Sorry dude – if you don't know… research it before doing a tutorial…

  15. I'm not familiar with C. What the condition while (1) means? The 1 stands for True, therefore will never leave the loop?

  16. Note this would compile a lot better without the printf, a function call that can't be inlined costs a ton in cases like this with having to store everything to memory. If you want to see things like this https://godbolt.org/ is an online compiler that will show you disassembly and can even run it. Will also show you what lines correspond to what assembly instructions.

  17. =I HATE ALL HIGH LEVEL PROGRAMMING LANGUAGES BECAUSE OF I AM ABOUT BASICS

    =SO IF I'D LINK MY LIFE WITH PROGRAMMING,I WOULD LEARN ONLY MACHINE CODE LANGUAGE

  18. Rather than use Z, why not just do "x = x + y". That's how I always did it, from applesoft basic to c/c++/ c#.

  19. Have you worked out what's wrong with your anecdotal guesswork, yet? Please tell me the machine code instruction equivalent for `#ifdef`…

  20. When you use standard C library functions your program refers to libc installed somewhere on your computer. At 5:31 the address 0x100000F78 should be the printf address, but notice that it starts with 1 – that is where your libc should be located: from 0x100000000 to some other point in memory where it ends (it probably depends on headers you included, import tables exist for a reason). So when you see these addresses this is most likely your libc or any other library your program uses.

  21. About the printf call, the address 0x100000f78 is an embedded function. This address is a few bytes after the last instruction on this page. Regarding "leaq 0x56(%rip), %rdi" presumably it loads the effective address (LEA) of the string "%dn". When this instruction is executed, the instruction pointer (IP) is 0x100000f3d, so the address of the string is that plus 0x56 which is 0x100000f93. That's after the assembly code of the printf function. Finally the register al set to zero is probably an option, maybe the addressing mode of the format variables.

  22. Hello! A bit off topic, but I'm wondering why you use two while loops?

    Wouldn't something similar to the code below work as well?

    (JavaSript)

    var x = 0;
    var y = 1;
    var z;

    while (x < 255) {

    console.log(x);

    z = x + y;
    x = y;
    y = z;

    }

  23. I know I am late to the party, but you should go the opposite way. I am taking very old dos programs, disassembling them and turning them in to c code by hand and then putting a modern GUI face on the program while keeping the original code to run on modern operating systems.

  24. As far as I know, the "move" instructions are not "mov1" but "movl" – Move Long – where long means 4 bytes.

  25. Coming back to this after learning a bit about assembly is really interesting. I now know what this initialization stuff is doing for example.
    Still a fantastic video.

  26. "High Level" languages are more human, highly abstracted. A single statement might invoke 10,000 or more machine instructions; but always the same way. It's like a pre-fabricated house; easy to make a house but you only get your choice of three kitchens and two bedrooms. Most of the programming is in the library which you did not make, have no control over and probably has bugs. "C" is a "Low level" language where each line of "C" often directly maps to a single CPU instruction. It comes with some modest libraries such as LIBC; the standard C library and handles the grunt work of things like "printf". Programming in "C" is like building a house from scratch; sticks and bricks and plumbing; you can do some tricky things not that it is a good idea to do so. Make anything you want but it is a lot of work. Assembly takes that one step lower; it is directly mapping the CPU instructions and uses no libraries at all unless you wish to. For extreme programming accuracy this is the way to go but it is a LOT of work and typically used only for things like interrupt service routines that must be fast, compact, sometimes using "floatable code" with no absolute address jumps (all jumps relative; can be loaded anywhere in memory and work the same). Assembly is also good for self-modifying code but allowing that at all is a significant security risk (viruses work by modifying code in memory and then executing it).

  27. By the way, Assembly is not machine language. In many cases, a single assembly command is multiple machine language's commands.
    And each CPU's and GPU's language, depending on the model and the manufacturer, is different from each other.

    That's not just a not for the title of the video, but for those in comments who say "Assembly is machine language". No, it's not. It's just what we have most close to it.

  28. Just a little remark for people wondering why the code generated by the compiler contains strange and unuseful constructs. It is simply because the code was generated with the -O0 parameter which means, no optimization whatsoever. This means that the compiler basically does a nearly 1 to 1 translation of the C code to the assembly, without considering if the operation are redundant, unused or stupid.
    It is only when optimization is enabled that the compiler will generate better code.
    In this example, for example, it is stupid to read & write x, y, z continuously from memory. An optimizing compiler will assign register in the inner loop and will never write their values to memory. The spilling of the printf return value 'movq eax, 0x14'bp)' will of course not be emitted/

  29. I am still confused. I thought z = x + y represents 2 operations. How should I count the operations in this line?

  30. Most of general-purpose programming is split between database operation, and User Interface. So yes, in the limited scope of the minuscule piece of code you run through, C is THE programming language of all programming languages, hands down. No question about it. However, in the day-to-day grind & "dumb" programming, high-level languages make more sense. That's both the grandeur, and the misery, of coding.

  31. I loved learning Assembler code. Felt like there was nothing i couldn't do. I created a mouse pointer shaped like a hand and finger when windows 3.1 still had an arrow pointer. Wonder if i could have copyrighted it??? 🙂

  32. Just 2 minor comments, Assembly language is NOT Machine Code; and it would have been nice if you explained the long storage class and why the index in memory were those in particular.

  33. Occasionally this c code is translated directly line by line to machine code. In real world gcc optimize the code, so it can look complete different. Especially with o2, o3 compiling options.

  34. lot of wrong information
    are you serious that you have good background in c and asm?
    then, what is the purpose of the video?

  35. C SUCKS #LUA GANGG

    fib = {0, 1}

    for i,v in ipairs(fib) do

    table.insert(fib,(fib[i] + fib[i + 1]))

    print(fib[i])

    if fib[i] >= math.huge then

    break

    end

    end

    –this message was paid for by the Lua users organization

  36. Why assigning operations take two statements and youse temporary register instead directly copy from one address to another?

  37. Watching this I realize that C is not too far off from BASIC, which I know very well having learned it back in the eighties.

  38. Excellent ❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️

  39. In only 10 minutes, you made me want to learn assembly language. Il looks so simple when it's explained so well. You did a great job, Ben Eater.

  40. Memory stack buffer of data structures of floating algorithm inside the ram cache. 10 base, octal base and hexadecimal base 10 base is binary codes Function machine. In pre Calculus there 4 kind of function machine and one kind in here is the application. Volatile floating number calculator.

  41. That is not C vs machine code. That's just C vs compiled C. Using real machine code (or assembly) language, program could become MUCH simpler and faster.

  42. Ben, the "movb $0x0, %al" is assuming zero for the printf return value. As characters are printed, this value will increase. I believe that the "movl %eax, -0x14(%rbp)" is assigning the return value for the "fib" main program in preparation for exiting back to the OS. I don't know why this is occurring inside the loop though.

  43. I don’t get, why somebody like you, of all people, would ever even touch anything Apple with a thousand foot pitchfork…
    You don’t strike me as the internally insecure ”bling“ poser type that usually buys such jewelry for non-computer users…

  44. I remember spending hours upon hours typing almost endless lines of hexadecimal code into the computer's RAM and then compiling it overnight and recording it onto DAT cassettes so I could play computer games. Intel 4004 processor, 4k of RAM, with a 12" amber CRT… Good times… Good times…

  45. Some people use to say that C is a low level language but no, assembly is ! I didn’t understand anything but I enjoyed the video anyway

Leave a Reply

Your email address will not be published. Required fields are marked *