My projects: јул 2018

This is the second follow-up of my initial text about the FPGA Computer.

I use a fork of the customasm project for my FPGA-based CPU. It is on the github here:

https://github.com/milanvidakovic/FPGAcustomasm

This 16-bit CPU has 8 general-purpose registers (r0 – r7), pc (program counter), sp (stack pointer), ir (instruction register), and h (higher word when multiplying, or remainder when dividing). Each register is 16-bits wide.

The address bus is 16 bits wide, addressing 65536 addresses. Data bus is also 16 bits wide, but all the addresses are 8-bit aligned.

There are eleven groups of instructions:

Group number	Group name	Group members	Group description
0	NOP/MOV/ IN/OUT/PUSH/ POP/RET/IRET/ HALT/SWAP	nop mov reg, xx mov reg, reg in reg, [xx] out [xx], reg push reg push xx pop reg ret iret swap halt	The most general group. Deals with putting values into registers, exchanging values between registers, I/O operations, stack operations, returning from subroutines, and register content swapping. NOP and HALT are also in this group.
1	JUMP	j xx jc xx jnc xx jz xx jnz xx jo xx jno xx jp xx jnp xx jg xx jge xx js xx jse xx	Jump to the given location.
2	CALL	call xx callc xx callnc xx callz xx callnz xx callo xx callno xx callp xx callnp xx callg xx callge xx calls xx callse xx	Calling subroutine. Puts the return address on the stack before jumping to the subroutine. Needs to call RET when returning from the subroutine.
3	LOAD/STORE	ld reg, [xx] ld reg, [reg] ld reg, [reg + xx] ld.b reg, [xx] ld.b reg, [reg] ld.b reg, [reg + xx] st [xx], reg st [reg], reg st [reg + xx], reg st.b [xx], reg st.b [reg], reg st.b [reg + xx], reg	Load from memory into the register destination: register source: memory address given by the number, or by the register, or by the register+number. Store the given register into the memory location destination: memory location given by the number, or by the register, or by the register+number.
4	ADD/SUB	add reg, reg add reg, xx add reg, [reg] add reg, [xx] add reg, [reg + xx] add.b reg, [reg] add.b reg, [xx] add.b reg, [reg + xx] sub reg, reg sub reg, xx sub reg, [reg] sub reg, [xx] sub reg, [reg + xx] sub.b reg, [reg] sub.b reg, [xx] sub.b reg, [reg + xx]	Add and sub group.
5	AND/OR	and reg, reg and reg, xx and reg, [reg] and reg, [xx] and reg, [reg + xx] and.b reg, [reg] and.b reg, [xx] and.b reg, [reg + xx] or reg, reg or reg, xx or reg, [reg] or reg, [xx] or reg, [reg + xx] or.b reg, [reg] or.b reg, [xx] or.b reg, [reg + xx]	And / or group.
6	XOR	xor reg, reg xor reg, xx xor reg, [reg] xor reg, [xx] xor reg, [reg + xx] xor.b reg, [reg] xor.b reg, [xx] xor.b reg, [reg + xx]	Xor group.
7	SHL/SHR	shl reg, reg shl reg, xx shl reg, [reg] shl reg, [xx] shl reg, [reg + xx] shl.b reg, [reg] shl.b reg, [xx] shl.b reg, [reg + xx] shr reg, reg shr reg, xx shr reg, [reg] shr reg, [xx] shr reg, [reg + xx] shr.b reg, [reg] shr.b reg, [xx] shr.b reg, [reg + xx]	Shift group.
8	MUL/DIV	mul reg, reg mul reg, xx mul reg, [reg] mul reg, [xx] mul reg, [reg + xx] mul.b reg, [reg] mul.b reg, [xx] mul.b reg, [reg + xx] div reg, reg div reg, xx div reg, [reg] div reg, [xx] div reg, [reg + xx] div.b reg, [reg] div.b reg, [xx] div.b reg, [reg + xx]	Multiply / divide group.
9	INC/DEC	inc reg inc [reg] inc [xx] inc [reg + xx] inc.b [reg] inc.b [xx] inc.b [reg + xx] dec reg dec [reg] dec [xx] dec [reg + xx] dec.b [reg] dec.b [xx] dec.b [reg + xx]	Increment and decrement group.
10	CMP/NEG	cmp reg, reg cmp reg, xx cmp reg, [reg] cmp reg, [xx] cmp reg, [reg + xx] cmp.b reg, [reg] cmp.b reg, [xx] cmp.b reg, [reg + xx] neg reg neg [reg] neg [xx] neg [reg + xx] neg.b [reg] neg.b [xx] neg.b [reg + xx]	Compare / negate group.

All the instructions are two or four bytes long. Since the data bus is 16-bits wide, the complete instruction is fetched in either one or two memory reads. This means that, since the SRAM is used, the complete instruction is fetched, decoded, and executed in three or more clock cycles.

All the instructions have the similar format:

from	to	what	group
bbbb 0-7: r0-r7 8-sp 9-h	bbbb 0-7: r0-r7 8-sp 9-h	0000 0=>mov regx, regy	0000

The first byte has lower four bits used to designate the destination register (to), while upper four bits are used for the source register (from) identification. The second byte has lower four bits for the instruction group identification (group) and upper four bits for the type of the instruction in that group (what).

For example, the mov r2, r1 instruction is encoded as:

binary: 0001 0010 0000 0000

hex: 12 00

The Source is r1 (0001), the Destination is r2 (0010), the group is 0 (0000) and the type is move regx, regy (0000).

Second example is the mov r1, 0x0f instruction:

binary: 0000 0001 0010 0000, 0000 0000 0000 1111

hex: 01 20, 00 0f

The Load instructions are used to load the value from the memory into the register. The Store instructions store the value of the register into the given memory location. Memory location is given as number (ld r1, [0x0a] - load the content of the 0x0a location into the r1 register), or as a value of a register (ld r1, [r2] - load the content of the memory location to which r2 points), or as a sum of number and register (ld r1, [0x0f + r2]).

ld r1, [0x0a] loads two bytes from the 0x0a location. The address (0x0a) must be even if we work with 16-bit values.

If we want to load a byte from a location, we need to use the ".b" suffix:
ld.b r1, [0x0a]

The code above will load a byte from the 0x0a location into the r1 register.

Hello World example

Let's look at the Hello World example:

; this program will print HELLO WORLD
#addr 0x400
VIDEO_0 = 2400 ; beginning of the text frame buffer

mov r2, 0 ; r1 is the index
mov r1, hello ; r1 holds the address of the "HELLO WORLD" string

again:
ld.b r0, [r1] ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0 ; if the current character is 0 (string terminator),
jz end ; go out of this loop
st [r2 + VIDEO_0], r0 ; store the character at the VIDEO_0 + r2
inc r1 ; move to the next character
add r2, 2 ; move to the next location in the video memory
j again ; continue with the loop

end:
halt
hello:

#str "HELLO WORLD!\0"

First we define the constant VIDEO_0 with the valuer of 2400. This is the address of the text-based frame buffer. It points to the first character in the video memory.

Then we set the r2 to 0 and r1 to the address of the hello string. Note that the mov instruction is used to move the number into the register (for example, mov r2, 0), or to move a value of the source register to the destination register (for example, mov r1, r2).

Next, we enter the loop. The loop starts with the again label, and in the loop we load the byte value from the current address (starts with the first character of the hello string), then we compare that byte with the zero (checking the end of the string), and then we store that byte in the current address of the video memory.

When all the characters are printed on the screen, the CPU halts (halt instruction).

Interrupts

Let's look at the UART echo demo. This demo waits for the character to arrive via serial UART (115200 baud, one start bit, one stop bit, no partiy), then prints that character on the screen, and finally, echoes that character back to the UART:

#addr 0x400
; ########################################################
; REAL START OF THE PROGRAM
; ########################################################
mov sp, 1000

mov r0, 14
st [cursor], r0

; set the IRQ handler for UART to our own IRQ handler
mov r0, 1
mov r1, 16
st [r1], r0
mov r0, irq_triggered
mov r1, 18
st [r1], r0

halt

The code above sets the interrupt handling routine (irq_triggered) for the UART. This is the IRQ1 and its handling routine is at the address 16 (0x0010). This means that whenever the serial UART subsystem receives a byte, the CPU will jump to the 0x0010 address. At that address, we have placed the JUMP instruction (j irq_triggered), having at the address 0x0010 value of 0x0001 (the JUMP instruction opcode - 0x0001) and at the address 0x0012 the address of the irq_triggered routine (st [r1], irq_triggered).

That way, we have prepared the UART interrupt routine and the main program halts. The rest of the program is in the interrupt routine. Let's look at the interrupt routine:

; ##################################################################

; Subroutine which is called whenever some byte arrives at the UART

; ##################################################################

irq_triggered:

push r0

push r1

push r2

push r5

push r6

in r1, [64] ; r1 holds now received byte from the UART (address 64 decimal)

ld r6, [cursor]

st [r6 + VIDEO_0], r1 ; store the UART character at the VIDEO_0 + r2

add r6, 2 ; move to the next location in the video memory

st [cursor], r6

loop2:

in r5, [65] ; tx busy in r5

cmp r5, 0

jz not_busy ; if not busy, send back the received character

j loop2

not_busy:

out [66], r1 ; send the received character to the UART

skip:

pop r6

pop r5

pop r2

pop r1

pop r0

iret

When the interrupt happens, the irq_triggered routine first pushes some registers on the stack, obtains the received byte from the UART (in r1, [64]), prints it on the screen, and then sends back that character through UART (out [66], r1). If the UART is busy sending some character, the in r5, [65] will have r5 set to 1; otherwise, the r5 will have 0. Finally, the routine pops the registers from the stack and returns (iret instruction).

The difference between iret and ret is that ret pops the return address from the stack and jumps to the obtained address (return from the call subroutine), while the iret pops the return address, pops the flags, and then jumps to the obtained address (interrupt routine might have changed flags,so they need to be saved before interrupt routine is invoked, and restored during the iret execution).

All the examples are stored in the FPGACustomasm project on the github:
https://github.com/milanvidakovic/FPGAcustomasm/tree/master/examples/FPGA/raspbootin

Adding byte-oriented instructions

This is a follow-up of my previous post about the FPGA Computer.

When I initially commited the FPGA Computer, the CPU was 16-bit wide in both address and data bus. Also, all the instructions were word-oriented, working with 16 bits. Even the memory was word-oriented, having 64KWords, not 64KB. At first, that looked promising, having double the amount of RAM memory compared to the usual 8-bit platforms (64KW compared to 64KB).

However, all the instructions were word oriented, making byte-oriented programs complicated. For example, the UART loader receives bytes, not words, since the UART is byte-oriented. That causes a problem when the loader has to receive the code from the UART:

in r1, [64] ; get the byte from the uart into r1

ld r2, [flip]

cmp r2, 0
jz do_flip ; we have received the even byte

; at this moment, r1 holds the received byte

neg [flip] ; we have received the odd byte - time to complete the word out of those two bytes (even and odd)

ld r0, [current_byte] ; get the even byte from the memory (stored earlier)

shl r0, 8 ; shift it 8 bits to the left

or r0, r1 ; complete the word

ld r2, [current_addr] ; r2 holds the current pointer in memory to store the received byte

st [r2], r0 ; store the completed word into the memory

inc r2 ; move to the next location in memory

st [current_addr], r2 ; save the incremented value of the current address

ld r2, [current_size] ; increment the byte counter

inc r2

st [current_size], r2

cmp r2, [size] ; did we receive all?

jz all_arrived

j skip

do_flip:

neg [flip]

st [current_byte], r1 ; we need to receive two bytes to form the word, so we are saving this byte before receiving the other

ld r2, [current_size]

inc r2 ; increment the byte counter

st [current_size], r2

cmp r2, [size] ; did we receive all?

jz all_arrived_even

j skip ; return and wait for the next byte

all_arrived_even:
; at this moment, r1 holds the received byte

shl r1, 8 ; the upper byte is for the odd bytes

ld r2, [current_addr] ; r2 holds the current pointer in memory to store the received byte

st [r2], r1 ; store the incomplete word into the memory

all_arrived:

As you can see, the problem is with the word-oriented instructions and memory locations. Whenever a byte comes to the computer, it must be saved, then combined with the next byte that would come, and that combination then stored in memory as a 16-bit value.

That was the reason for the redesign. I have introduced the ".b" suffix. If the instruction has the ".b" suffix, it is byte-oriented. This also caused the change in the addressing. The data bus is still 16-bit wide, and all the memory operations are 16-bit, but the address range covers 64KB now, instead of 64KW. That way, all the addresses in the assembler are byte-oriented, not word-oriented.

This means that if the instruction does not have the ".b" suffix, it will work with the word-oriented memory location, aiming at the word at the given address. If that is the case, the address must be aligned to 16-bits (even).

For example, this instruction is word-oriented:

ld r0, [1000]

It loads the 16-bit content of the address 1000 (two bytes, one byte from the 1001 and the other from 1000) and stores that 16-bit value in the r0 register. The address must be even.

If the instruction has the ".b" suffix, then it is byte-oriented. The address in byte-oriented instructions can be both even and odd. This instruction is byte-oriented:

ld.b r0, [1001]

It loads the 8-bit value (one byte) from the address 1001 into the r0 register.

It the 16-bit word is stored in the memory, it is stored as big endian, having the lower byte in odd address, and the upper byte in the even address. For example, the number 0x1234 stored at the 1000 address looks like this:

address	content
1000	0x12
1001	0x34

Now let's look at the same UART loader code, having byte-oriented instructions:

in r1, [64] ; get the byte from the uart into r1

; at this moment, r1 holds the received byte

; r2 holds the current pointer in memory to store the received byte

ld r2, [current_addr]

st.b [r2], r1 ; store the received byte into the memory

inc r2 ; move to the next location in memory

st [current_addr], r2 ; save the incremented value of the current address

ld r2, [current_size] ; increment the byte counter

inc r2

st [current_size], r2

cmp r2, [size] ; did we receive all?

jz all_arrived

j skip

all_arrived:

As you can see, the code is shorter and easier to understand.

The same idea can be applied to strings. Now that we have the byte-oriented instructions, dealing with byte-oriented strings is easy. This code prints the hello string on the screen:

VIDEO_0 = 2400 ; beginning of the text frame buffer

mov r2, 0 ; r1 is the index

mov r1, hello ; r1 holds the address of the "HELLO WORLD" string

again:

; load r0 with the content of the memory location to which r1 points

ld.b r0, [r1]

cmp r0, 0 ; if the current character is 0 (string terminator),

jz end ; go out of this loop

st.b [r2 + VIDEO_0], r0 ; store the character at the VIDEO_0 + r2

inc r1 ; move to the next character

add r2, 2 ; move to the next location in the video memory

j again ; continue with the loop

end:

halt

hello:

#str "HELLO WORLD\0"

Conclusion

This change in the design of the CPU contributed to the much better assembler code. I haven't lost all the word-oriented instructions, but I have gained whole bunch of byte-oriented instructions. I did lose 64KB of memory, but my FPGA didn't have 128KB of SRAM memory anyway.

Even if we try to make whole code word-oriented, we cannot skip 8-bit strings and protocols. That is why I have done this refactoring.

Here are github links:
- FPGAComputer
- FPGA Custom Assembler
- FPGA UART Loader (Raspbootin-like)
- FPGA Emulator

My projects

петак, 13. јул 2018.

FPGA Computer Assembler

Hello World example

Interrupts

четвртак, 12. јул 2018.

Adding byte-related instructions

Adding byte-oriented instructions

Conclusion