codebite _

Let's Make a CPU: Part 5 - Flip-Flops and Registers

Mon 22 of May, 2017

Previous part. If you missed the introductory post, head here: Part 0.


Last time we managed to finish the work horse of our processor, the ALU. Now we can crunch numbers and do all sorts of cool bitwise magic on them. But what good does that do for us if we can't store it?

So this time, we will create our working memory, the registers. You may have noticed the weird banner on top, well it should make more sense after we're done with the first part of this post. But really it's just for giggles.

"Okay then, how do we store, say, a single bit?" glad you asked! We'll use a latch also known as a flip-flop. (But they are a bit different, we'll get to it) Latch is a device that has 2 states, one of which represents 1/ON/TRUE bit and the other is 0/OFF/FALSE, and we can switch the state of the latch, hence, store information.

So let's build one. For that we need 2 NOR gates:

SR NOR Latch

It's called SR NOR Latch. It's a Latch, hence the Latch, it uses NOR gates, and the SR because it has Set and Reset pins.

As you can see, we actually feed the outputs into the inputs, which as you might have guessed can cause oscillation. This is why Latches start in unknown state, and need to reset (or preset) latches at least once. Logisim will give Error before we set any value into it. Also you might've noticed that ~Q output, it's complementary (opposite) of Q which is our actual output. It's just how these Latches are made in real world, and we're sticking to that.

Let's improve this design a bit. At the moment to store 1 we need to set the Set input. If we want to store 0 after that we need to Reset the Latch first. But Set and Reset are complementary, as in they are opposite. To store a 1 we need to send 2 bits 01 and to store 0 we need to send 10. No fun. We can use SR's complementary property to our advantage. But first, as always, the symbol:

SR NOR Latch Symbol

We can have single input D (stands for Data) and NOT it before it goes to Reset. Also, let's make this new latch a gated one. What's a gated latch? Well, at the moment any change on D pin would change the stored value. (this is why D latches are also called Delay latches, since they simply pass the same input, just delayed a bit) Okay, so what do we do? Well, how about an AND gate before Set and Reset and 1 pin on both of them is dedicated to let the input go through. Let's see it in action:

Gated D Latch

Nice, as you can see, I also added "asynchronous" Async Reset, what I mean by that, is that we can reset the latch even when Enable is 0. We do this so we can have a stable state latch whenever we want.

And the symbol for this Gated D Latch is:

Gated D Latch Symbol

So, remember I told you that latches and flip-flops are not quite the same. Well, it's time to tell you the difference. Latches are transparent, while flip-flops are synchronous or edge-triggered. What does any of that mean, I hear you ask. Well, before this point all our logic was so called combinational logic. That means that the output only depended on input. Kinda like pure functions. In sequential logic on the other hand, output not only depends on current input, it also depends on sequence of previous inputs. In other words it has memory, aka state. How do we tell the difference between previous signal, and next one? We'll we need to somehow synchronize all the data that is going around. And for that we use clocks. Like any oscillators, crystals and alike. This is not your clock that counts time. It's more like device that produces 1 output every X time units. If you know what square wave is, you know what I mean. In any case, this wave has edges, where it starts, and where it ends.

Clock edges

So what does all this have to to with latches and flip-flops? Well, latches are transparent, aka, they change output immediately on input change. We want it all be in sync, so we can easily move data from one part of the CPU to another. Hence why we want flip-flops not latches. We want them to change their state only on clock change.

How do we do that? Well we kinda did that already (sneaky I know), the Enable input only allows the data to change when it's high/1. But to have better control over this we use 2 D latches to make 1 flip-flop:

D Flip-Flop

As you can see, the 2nd latch only triggers when clock is going from high to low, so this would be Falling Edge Triggered D Flip-Flop. I also added back the Enable pin by ANDing it with the Clock. The 1st latch there is acting like a buffer before the clock edge actually fall.

And let's give it a symbol:

D Flip-Flop Symbol

See that weird illuminati triangle/arrow symbol? That's how people usually mark clock pins. So I just follow that for consistency.

Okay now that we have 1-Bit storage, let's scale it to our WORD size, 8 bits, a byte. We call it a WORD, because in ye olde days, we called a natural unit of data that CPU can operate on a MACHINE WORD, and it's size was however long the word was in bits. So for us it's a Byte, or Octet (Since bytes weren't always 8 bits long too).

Anyways, we can just have 8 D Flip-Flops with control lines connected together, pretty easy:

8-Bit Register

And as always, the symbol:

8-Bit Register Symbol

Now that's all the memory we need internally for a CPU. We can have a bunch of register to store some values. But what if want to store much more values? We'd need something we could access randomly, too. So we can have any value. Hmm... So how do we build this... RAM?

Well, you'll have to tune in next time, because this is getting a bit long.

For now, feel free to play around with current CPU:

Playing around with current state of the CPU

Okay, so that's the basics of memory. I decided to split it in 2, because it's getting long. Next time we'll make RAM, address the problem of addressing and have 2/3 of the CPU done. As always Logisim schematics: cpu-scratch-p5.circ

P.S. If you find any errors, mistakes, and/or want to give me some feedback, feel free to send me an email

Hibernaculum got #1 in Theme

Sun 21 of May, 2017

hibernaculum logo

So, I just wanted to share that this happened.

My and @Quaternius Ludum Dare game Hibernaculum got scored #1 in theme, #27 in innovation, #22 in graphics, and #98 in audio! This is very exciting for us both, so we are going to speak about the future of this game, we'll analyze your feedback, and after that, we'll see where it'll lead us.


working on a game banner

If you are curious about how we developed the game. You might want to read the post-mortems me and Quaternius both wrote:

I also like to share my "LD stats" after the results are in for each LD. This time I decided to host all the data on my website, so if you're curios head here: /blog/ld-stats

So that's that, this is the first first place in LDJAM for me, so I am very much excited!

Have a nice time of day!

Let's Make a CPU: Part 4 - Finishing ALU

Sun 07 of May, 2017

Previous part. If you missed the introductory post, head here: Part 0.

Welcome back to our little endeavour to build a CPU. Last time we finished our adder/subtractor, and it's all fun and cool, but we'd like our CPU to do more than add and subtract. Today we will finish the ALU so we can finally move on to the overall architecture of the CPU.

Let's wrap the arithmetic part of the ALU first. How about multiplication and division? Well, as it turns out it's a bit more convoluted to create multipliers and dividers. And I was thinking for a long time if I even want to cover them. Here's the deal, we can totally emulate multiplication and division with addition and subtraction. Since we're trying to create a simple CPU, not necessarily fast and efficient I think we can leave both multiplication and division to software, not hardware. In fact that's what early CPUs like Intel 4004 did. They didn't have dedicated circuitry for multiplication or division, instead you would have a programmer write code to add in a loop to multiply. This approach is slower, but takes less space on a dye for actual IC (integrated circuit) and, frankly, they are complicated enough, and don't add too much to overall theory. I think it's optional to talk about them.

I will, however, point you in the right direction if you want to make them on your own. Wikipedia has a good article on binary multipliers and division algorithms. Here's for example an unoptimized binary multiplier I made:

8-bit multiplier schematic

It involves lots of logical shifting (which we didn't cover, yet) and addition. As you can see it's a bit complex, and it's technically not full, since 8-bit by 8-bit multiplication should yield 16-bit integer, or at least set an overflow flag or something.

Alright, but how would the software version look? Good question, let's look at pseudo-code assembly example:

; multiply 5 by 3

    load R0, 5      ; load decimal 5 into register 0
    load R1, 3      ; load decimal 3 into register 1

mult:               ; label for our loop
    add R0, R0      ; add register 0 to itself
    sub R1, 1       ; subtract 1 from register 1
    cmp R1, 1       ; compare register 1 with 1 (sets zero flag if same)
    jnz mult        ; if zero flag is not set by cmp jump to mult

    print R0        ; 15

Something similar can be written for division:

; divide 15 by 3

    load R0, 15     ; load decimal 15 into register 0, this is our dividend
    load R1, 3      ; load decimal 3 into register 1, this is our divisor
    load R2, 0      ; load 0 into register 2, this is our result

div:                ; label for our loop
    sub R0, R1      ; subtract divisor from dividend
    add R2, 1       ; increment the result
    cmp R0, 0       ; compare dividend with 0 (sets negative flag if less, zero if same)
    jg div          ; jump if greater; if both zero and negative flags aren't set jump to div

    print R2        ; 5

As you can see we don't cover the reminder, but you get the gist. In theory, we can provide some sort of standard library that contains routines, or even implement this in microcode (which we haven't covered, yet).

Alright, so that covers the Arithmetic part of the ALU, now let's move on to Logic. This is going to be pretty easy for the most part. All we need to have are AND, OR, XOR, and NOT, as well as, Logical Left and Right shifts. The simple logical operators are super easy, it's just 8 gates in parallel. For example an 8-nit AND gate would look something like this:


All the other ones are the same, but instead of AND gate we use the appropriate gate. But we don't need to do this manually. In Logisim we can change the bit width of the gate so we can use built-in Logism logic gates.

Okay so now we can move to the shifting. Logical shifting is a simple operation in which we take the operand's bits and shift all of them left or right. We can achieve this by connecting the wires from input to output, but we offset them to the next one. So input bit 0 becomes bit 1 in the output.

Here's the right shift: 8-bit-logic-right-shift

And its symbol: 8-bit-logic-right-shift-symbol

And of course the left shift: 8-bit-logic-right-shift

As well as its symbol: 8-bit-logic-right-shift-symbol

Both of them shift the input by 1 place. You can create circuits that can do more than that. You might want to just offset the inputs more, but there is a better way. We won't use it for simplicity and consistency (with the multiplier and divider) sake. But if you want, take a look at Barrel Shifters. They use something called Multiplexer which we didn't cover, yet. It's a very useful gate which allows selecting different inputs depending on select bits.

Here's an example of logical left barrel shifter I made: 8-bit-logic-left-shift-barrel

As I said we won't use it, instead we will do same as with multiplication, we'll do it in software:

; shift 0xFF (0b11111111) left by 3

    load R0, 0xFF   ; load hex 0xFF (dec 255) into register 0
    load R1, 3      ; load decimal 3 into register 1

shift:              ; label for our loop
    lsh R0          ; left shift register 0 by 1
    sub R1, 1       ; decrement the register 0 (sub sets zero and neg flags)
    jz shift        ; jump if zero flag is set to shift label

    print R0        ; 0xF8 (0b11111000)

Now that we have all the parts of the ALU we can wire it all together, right? Well, we need to cover one more little guy and his brother. For that we will need to jump back to transistors for a second. I am talking about Buffer and Tri-State Buffer (or Controlled Buffer). Buffer is a gate that simply pass input to output.

So, the truth table will look like this:

in out
0 0
1 1

Boring, right? So why do we need it? Well usually it's used to buffer signal source between different circuits. We, however, are more interested in his brother, the Tri-State Buffer. Let's take a look at the truth table:

in enable out
0 0 Z
1 0 Z
0 1 0
1 1 1

Woah, woah, woah. What is this 'Z' thing? some of you might've thought to themselves. The Z thing is known as high-impedance or floating state. Logisim shows it as X instead of Z. What it basically means for us, is that nothing is driving that wire. Nothing is "connected" to it. Nothing is sending the signal through it. It's neither 0, nor is it 1.

So how do we build it? Okay let's start with buffer first, and we will keep using CMOS design here, for real world sake: buffer-schematic

It's basically the inverse of NOT gate. That arrangement of P-Type and N-Type transistors is basically an inverter, so by having them both stacked like that, we simply let the same signal go through (!!a == a).

The symbol for the Buffer is a simple triangle: buffer-symbol

Now the good part, the Tri-State Buffer: tri-state-buffer-schematic

It's a bit more involved, but it's pretty simple. We simply close the power and ground from the output transistors, so they have neither. As you can see, I also labeled the invertors in the circuit.

The symbol for it is the same as the Buffer one, except we have the enable pin out the bottom: tri-state-buffer-symbol

Okay so the last thing we need to do, is to use 8 of them in parallel for our 8-bit buses: 8-bit-tri-state-buffer-schematic

And let's give it a symbol to differentiate it from 1-bit one: 8-bit-tri-state-buffer-symbol

Okay. That's cool and all, but why do we need this again? Well, out ALU has different operators now, but it can only output one thing. So, we need a way to select which one it does. And that's exactly why we needed Tri-State Buffers. When they are off, nothing is driving the output wires, when one of them is on, only that data is on the bus. You can think of this like this: when Tri-State Buffer is disabled, the wires aren't "connected", so no signal, not even 0 is going through them. We need this because bad things happen when multiple signals try to drive same wire. In Logisim it simply gives us an Error a red wire and a big E for the output. In real life I honestly never even tried that, but it can probably damage the device, basically it's bad, and you won't have any readable data there for sure.

Now that we have everything we need, let's wire our ALU: 8-bit-alu

As you can see, we just combine all our operands and we use Tri-State Buffers to control what we are outputting. You may have also noticed, that I moved the Zero flag calculation from our Adder/Subtractor into the ALU proper. That's because I think it can be useful if other operations can set it, but all the other ones are arithmetic only. One can also notice that Add and Sub signals are ORed. That's because both of them are used in Adder so we output from it when any of them is set.

Now for the symbol of ALU. It's usually represented as this V shaped thing, hence this is what I've done: 8-bit-alu-symbol

The inputs come from top, control bits from the left, flags from the right, and of course the output from the bottom.

And that's that! We just build the muscle of the CPU, the thing that does the work! With all this operations we can do all sorts of things. But all of that will come in the future. For now play around with your newly baked ALU. Wanna go deeper? Try to implement multiplications and divisions in hardware, as well as, the barrel shifters.

Till next time!

Yey! I am back! Sorry for such a long delay between the posts, was bit busy, then Ludum Dare happened, then the dilemma "just tell the basics vs go deep". I decided to only cover the basics for multiple reasons. It's going to be easier and faster to produce, less technical people don't get scared off, the deep stuff is not that necessary for understanding of the overall architecture of a CPU. Maybe after I finish this series, I can make another one that covers some optimizations.

Anyways, I hope you enjoyed this one, next time we'll cover either overall architecture or start working on memory. Tune in next time to find out!

Oh, and the download link for Logisim circuit for this part: cpu-scratch-p4.circ I included a play area for the ALU in the main circuit.

P.S. If you find any errors, mistakes, and/or want to give me some feedback, feel free to send me an email

More posts...