петак, 13. јул 2018.

FPGA Computer Assembler

This is the second follow-up of my initial text about the FPGA Computer.

I use a fork of the customasm project for my FPGA-based CPU. It is on the github here:

https://github.com/milanvidakovic/FPGAcustomasm

This 16-bit CPU has 8 general-purpose registers (r0 – r7), pc (program counter), sp (stack pointer), ir (instruction register), and h (higher word when multiplying, or remainder when dividing). Each register is 16-bits wide.

The address bus is 16 bits wide, addressing 65536 addresses. Data bus is also 16 bits wide, but all the addresses are 8-bit aligned. 

There are eleven groups of instructions:


Group number
Group name
Group members
Group description
0
NOP/MOV/
IN/OUT/PUSH/
POP/RET/IRET/
HALT/SWAP
nop
mov reg, xx
mov reg, reg
in reg, [xx]
out [xx], reg
push reg
push xx
pop reg
ret
iret
swap
halt
The most general group. Deals with putting values into registers, exchanging values between registers, I/O operations, stack operations, returning from subroutines, and register content swapping. NOP and HALT are also in this group.
1
JUMP
j xx
jc xx
jnc xx
jz xx
jnz xx
jo xx
jno xx
jp xx
jnp xx
jg xx
jge xx
js xx
jse xx
Jump to the given location.

2
CALL
call xx
callc xx
callnc xx
callz xx
callnz xx
callo xx
callno xx
callp xx
callnp xx
callg xx
callge xx
calls xx
callse xx
Calling subroutine. Puts the return address on the stack before jumping to the subroutine. Needs to call RET when returning from the subroutine.
3
LOAD/STORE
ld reg, [xx]
ld reg, [reg]
ld reg, [reg + xx]
ld.b reg, [xx]
ld.b reg, [reg]
ld.b reg, [reg + xx]
st [xx], reg
st [reg], reg
st [reg + xx], reg
st.b [xx], reg
st.b [reg], reg
st.b [reg + xx], reg
Load from memory into the register
destination: register
source: memory address given by the number, or by the register, or by the register+number.
Store the given register into the memory location
destination: memory location given by the number, or by the register, or by the register+number.
4
ADD/SUB
add reg, reg
add reg, xx
add reg, [reg]
add reg, [xx]
add reg, [reg + xx]
add.b reg, [reg]
add.b reg, [xx]
add.b reg, [reg + xx]
sub reg, reg
sub reg, xx
sub reg, [reg]
sub reg, [xx]
sub reg, [reg + xx]
sub.b reg, [reg]
sub.b reg, [xx]
sub.b reg, [reg + xx]
 Add and sub group.
5
AND/OR
and reg, reg
and reg, xx
and reg, [reg]
and reg, [xx]
and reg, [reg + xx]
and.b reg, [reg]
and.b reg, [xx]
and.b reg, [reg + xx]
or reg, reg
or reg, xx
or reg, [reg]
or reg, [xx]
or reg, [reg + xx]
or.b reg, [reg]
or.b reg, [xx]
or.b reg, [reg + xx]
 And / or group.
6
XOR
xor reg, reg
xor reg, xx
xor reg, [reg]
xor reg, [xx]
xor reg, [reg + xx]
xor.b reg, [reg]
xor.b reg, [xx]
xor.b reg, [reg + xx]
 Xor group.
7
SHL/SHR
shl reg, reg
shl reg, xx
shl reg, [reg]
shl reg, [xx]
shl reg, [reg + xx]
shl.b reg, [reg]
shl.b reg, [xx]
shl.b reg, [reg + xx]
shr reg, reg
shr reg, xx
shr reg, [reg]
shr reg, [xx]
shr reg, [reg + xx]
shr.b reg, [reg]
shr.b reg, [xx]
shr.b reg, [reg + xx]
 Shift group.
8
MUL/DIV
mul reg, reg
mul reg, xx
mul reg, [reg]
mul reg, [xx]
mul reg, [reg + xx]
mul.b reg, [reg]
mul.b reg, [xx]
mul.b reg, [reg + xx]
div reg, reg
div reg, xx
div reg, [reg]
div reg, [xx]
div reg, [reg + xx]
div.b reg, [reg]
div.b reg, [xx]
div.b reg, [reg + xx]
Multiply / divide group.
9
INC/DEC
inc reg
inc [reg]
inc [xx]
inc [reg + xx]
inc.b [reg]
inc.b [xx]
inc.b [reg + xx]
dec reg
dec [reg]
dec [xx]
dec [reg + xx]
dec.b [reg]
dec.b [xx]
dec.b [reg + xx]
Increment and decrement group.
10
CMP/NEG
cmp reg, reg
cmp reg, xx
cmp reg, [reg]
cmp reg, [xx]
cmp reg, [reg + xx]
cmp.b reg, [reg]
cmp.b reg, [xx]
cmp.b reg, [reg + xx]
neg reg
neg [reg]
neg [xx]
neg [reg + xx]
neg.b [reg]
neg.b [xx]
neg.b [reg + xx]
 Compare / negate group.

All the instructions are two or four bytes long. Since the data bus is 16-bits wide, the complete instruction is fetched in either one or two memory reads. This means that, since the SRAM is used, the complete instruction is fetched, decoded, and executed in three or more clock cycles.

All the instructions have the similar format:


from
to
what
group
bbbb
0-7: r0-r7
8-sp
9-h
bbbb
0-7: r0-r7
8-sp
9-h
0000
0=>mov regx, regy
0000

The first byte has lower four bits used to designate the destination register (to), while upper four bits  are used for the source register (from) identification. The second byte has lower four bits for the instruction group identification (group) and upper four bits for the type of the instruction in that group (what).

For example, the  mov r2, r1  instruction is encoded as:
binary: 0001 0010 0000 0000
hex: 12 00

The Source is r1 (0001), the Destination is r2 (0010), the group is 0 (0000) and the type is move regx, regy (0000).

Second example is the  mov r1, 0x0f  instruction:
binary: 0000 0001 0010 0000, 0000 0000 0000 1111
hex: 01 20, 00 0f


The Load instructions are used to load the value from the memory into the register. The Store instructions store the value of the register into the given memory location. Memory location is given as number (ld  r1, [0x0a] - load the content of the 0x0a location into the r1 register), or as a value of a register (ld  r1, [r2] - load the content of the memory location to which r2 points), or as a sum of number and register (ld  r1, [0x0f + r2]). 

ld r1, [0x0a] loads two bytes from the 0x0a location. The address (0x0a) must be even if we work with 16-bit values.

If we want to load a byte from a location, we need to use the ".b" suffix:
ld.b r1, [0x0a]

The code above will load a byte from the 0x0a location into the r1 register.

Hello World example


Let's look at the Hello World example:

; this program will print HELLO WORLD
#addr 0x400
VIDEO_0 = 2400 ; beginning of the text frame buffer

mov r2, 0      ; r1 is the index
mov r1, hello  ; r1 holds the address of the "HELLO WORLD" string

again:
ld.b r0, [r1]          ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0              ; if the current character is 0 (string terminator),
jz end                 ; go out of this loop 
st [r2 + VIDEO_0], r0  ; store the character at the VIDEO_0 + r2 
inc r1                 ; move to the next character
add r2, 2              ; move to the next location in the video memory
j again                ; continue with the loop

end:
halt
hello:

#str "HELLO WORLD!\0"

First we define the constant VIDEO_0 with the valuer of 2400. This is the address of the text-based frame buffer. It points to the first character in the video memory.

Then we set the r2 to 0 and r1 to the address of the hello string. Note that the mov instruction is used to move the number into the register (for example, mov r2, 0), or to move a value of the source register to the destination register (for example, mov r1, r2).

Next, we enter the loop. The loop starts with the again label, and in the loop we load the byte value from the current address (starts with the first character of the hello string), then we compare that byte with the zero (checking the end of the string), and then we store that byte in the current address of the video memory.

When all the characters are printed on the screen, the CPU halts (halt instruction).


Interrupts


Let's look at the UART echo demo. This demo waits for the character to arrive via serial UART (115200 baud, one start bit, one stop bit, no partiy), then prints that character on the screen, and finally, echoes that character back to the UART:

#addr 0x400
; ########################################################
; REAL START OF THE PROGRAM
; ########################################################
mov sp, 1000

mov r0, 14
st [cursor], r0

; set the IRQ handler for UART to our own IRQ handler
mov r0, 1
mov r1, 16
st [r1], r0
mov r0, irq_triggered
mov r1, 18
st [r1], r0

halt

The code above sets the interrupt handling routine (irq_triggered) for the UART. This is the IRQ1 and its handling routine is at the address 16 (0x0010). This means that whenever the serial  UART subsystem receives a byte, the CPU will jump to the 0x0010 address. At that address, we have placed the JUMP instruction (j irq_triggered), having at the address 0x0010 value of 0x0001 (the JUMP instruction opcode - 0x0001) and at the address 0x0012 the address of the irq_triggered routine (st [r1], irq_triggered).

That way, we have prepared the UART interrupt routine and the main program halts. The rest of the program is in the interrupt routine. Let's look at the interrupt routine:

; ##################################################################
; Subroutine which is called whenever some byte arrives at the UART
; ##################################################################
irq_triggered:
push r0
push r1
push r2   
push r5
push r6

in r1, [64] ; r1 holds now received byte from the UART (address 64 decimal)
ld r6, [cursor]
st [r6 + VIDEO_0], r1    ; store the UART character at the VIDEO_0 + r2 
add r6, 2       ; move to the next location in the video memory
st [cursor], r6

loop2:
in r5, [65]   ; tx busy in r5
cmp r5, 0     
jz not_busy   ; if not busy, send back the received character 
j loop2
not_busy:
out [66], r1  ; send the received character to the UART
skip:
pop r6
pop r5
pop r2
pop r1                 
pop r0
iret
When the interrupt happens, the irq_triggered routine first pushes some registers on the stack, obtains the received byte from the UART (in r1, [64]), prints it on the screen, and then sends back that character through UART (out [66], r1). If the UART is busy sending some character, the in r5, [65] will have r5 set to 1; otherwise, the r5 will have 0. Finally, the routine pops the registers from the stack and returns (iret instruction). 

The difference between iret and ret is that ret pops the return address from the stack and jumps to the obtained address (return from the call subroutine), while the iret pops the return address, pops the flags, and then jumps to the obtained address (interrupt routine might have changed flags,so they need to be saved before interrupt routine is invoked, and restored during the iret execution).

All the examples are stored in the FPGACustomasm project on the github:
https://github.com/milanvidakovic/FPGAcustomasm/tree/master/examples/FPGA/raspbootin


четвртак, 12. јул 2018.

Adding byte-related instructions

Adding byte-oriented instructions

This is a follow-up of my previous post about the FPGA Computer.

When I initially commited the FPGA Computer, the CPU was 16-bit wide in both address and data bus. Also, all the instructions were word-oriented, working with 16 bits. Even the memory was word-oriented, having 64KWords, not 64KB. At first, that looked promising, having double the amount of RAM memory compared to the usual 8-bit platforms (64KW compared to 64KB).

However, all the instructions were word oriented, making byte-oriented programs complicated. For example, the UART loader receives bytes, not words, since the UART is byte-oriented. That causes a problem when the loader has to receive the code from the UART:

in r1, [64] ; get the byte from the uart into r1

ld r2, [flip]
cmp r2, 0
jz do_flip       ; we have received the even byte
; at this moment, r1 holds the received byte
neg [flip] ; we have received the odd byte - time to complete the word out of those two bytes (even and odd)
ld r0, [current_byte] ; get the even byte from the memory (stored earlier)
shl r0, 8 ; shift it 8 bits to the left
or r0, r1 ; complete the word
ld r2, [current_addr] ; r2 holds the current pointer in memory to store the received byte
st [r2], r0 ; store the completed word into the memory
inc r2 ; move to the next location in memory
st [current_addr], r2  ; save the incremented value of the current address
ld r2, [current_size]  ; increment the byte counter
inc r2
st [current_size], r2
cmp r2, [size] ; did we receive all?
jz all_arrived
j skip

do_flip:
neg [flip]
st [current_byte], r1 ; we need to receive two bytes to form the word, so we are saving this byte before receiving the other
ld r2, [current_size] 
inc r2 ; increment the byte counter
st [current_size], r2

cmp r2, [size] ; did we receive all?
jz all_arrived_even

j skip ; return and wait for the next byte

all_arrived_even:
; at this moment, r1 holds the received byte
shl r1, 8 ; the upper byte is for the odd bytes
ld r2, [current_addr] ; r2 holds the current pointer in memory to store the received byte
st [r2], r1 ; store the incomplete word into the memory
all_arrived:

As you can see, the problem is with the word-oriented instructions and memory locations. Whenever a byte comes to the computer, it must be saved, then combined with the next byte that would come, and that combination then stored in memory as a 16-bit value.

That was the reason for the redesign. I have introduced the ".b" suffix. If the instruction has the ".b" suffix, it is byte-oriented. This also caused the change in the addressing. The data bus is still 16-bit wide, and all the memory operations are 16-bit, but the address range covers 64KB now, instead of 64KW. That way, all the addresses in the assembler are byte-oriented, not word-oriented.

This means that if the instruction does not have the ".b" suffix, it will work with the word-oriented memory location, aiming at the word at the given address. If that is the case, the address must be aligned to 16-bits (even).

For example, this instruction is word-oriented:

ld r0, [1000]

It loads the 16-bit content of the address 1000 (two bytes, one byte from the 1001 and the other from 1000) and stores that 16-bit value in the r0 register. The address must be even.

If the instruction has the ".b" suffix, then it is byte-oriented. The address in byte-oriented instructions can be both even and odd. This instruction is byte-oriented:

ld.b r0, [1001]

It loads the 8-bit value (one byte) from the address 1001 into the r0 register.

It the 16-bit word is stored in the memory, it is stored as big endian, having the lower byte in odd address, and the upper byte in the even address. For example, the number 0x1234 stored at the 1000 address looks like this:

address
content
1000
0x12
1001
0x34

Now let's look at the same UART loader code, having byte-oriented instructions:

in r1, [64] ; get the byte from the uart into r1

; at this moment, r1 holds the received byte
; r2 holds the current pointer in memory to store the received byte
ld r2, [current_addr]
st.b [r2], r1 ; store the received byte into the memory
inc r2 ; move to the next location in memory
st [current_addr], r2 ; save the incremented value of the current address
ld r2, [current_size] ; increment the byte counter
inc r2
st [current_size], r2
cmp r2, [size] ; did we receive all?
jz all_arrived
j skip

all_arrived:

As you can see, the code is shorter and easier to understand.

The same idea can be applied to strings. Now that we have the byte-oriented instructions, dealing with byte-oriented strings is easy. This code prints the hello string on the screen:

VIDEO_0 = 2400 ; beginning of the text frame buffer
mov r2, 0 ; r1 is the index
mov r1, hello ; r1 holds the address of the "HELLO WORLD" string
again:
; load r0 with the content of the memory location to which r1 points
ld.b r0, [r1]          
cmp r0, 0 ; if the current character is 0 (string terminator),
jz end ; go out of this loop 
st.b [r2 + VIDEO_0], r0 ; store the character at the VIDEO_0 + r2 
inc r1 ; move to the next character
add r2, 2 ; move to the next location in the video memory
j again ; continue with the loop
end:
halt
hello:
#str "HELLO WORLD\0"

Conclusion

This change in the design of the CPU contributed to the much better assembler code. I haven't lost all the word-oriented instructions, but I have gained whole bunch of byte-oriented instructions. I did lose 64KB of memory, but my FPGA didn't have 128KB of SRAM memory anyway. 

Even if we try to make whole code word-oriented, we cannot skip 8-bit strings and protocols. That is why I have done this refactoring.

Here are github links:
- FPGAComputer
- FPGA Custom Assembler
- FPGA UART Loader (Raspbootin-like)
- FPGA Emulator