четвртак, 30. август 2018.

PS/2 keyboard and FPGA Computer

Added PS/2 keyboard to the FPGA Computer

This is a follow-up of the FPGA computer post. 

I have added a keyboard port to the FPGA Computer. The port is PS/2 because it is easier to work with the PS/2 than with the USB HID protocol. The final look is here (you will recognize the purple PS/2 keyboard connector):

The hardware part of this project is simple - add four resistors and a PS/2 connector:
Now the board has three connectors: PS/2, VGA and UART.

PS/2 connector is connected to the GPIO ports of the DE0-NANO board:
- Data is connected to the GPIO31 (PIN_D11) port
- Clock is connected to the GPIO33 (PIN_B12) port.

The communication between keyboard and computer is a clocked serial. Clock pulses appear on the Clock pin, while data is on the Data pin, synchronized with the Clock on the falling edge. There is one start bit, one parity bit and one stop bit. Here are oscilloscope snapshots of the "A" key being pressed (and then released):

The waveform below is the make code of the "A" key (1C hex)


The waveform below is the first byte of the "A" break code (F0 hex)

The waveform below is the second byte of the "A" break code (1C hex)

Keyboards work by sending the make and the break codes for each key. Make code is sent when the key is pressed, while the break code is sent when the key is released. For example, when we press and then release the "A" key, we get the following sequence:
1C F0 1C
This could be interpreted as: A pressed (1C), A released (F0 1C)

Unfortunately, it is all not that simple. First of all, if you quickly press A and C, one after another, you will get the following sequence:
1C 1B F0 1B F0 1C
This could be interpreted as: make code of "A", make code of "S", break code of "S" and break code of "A".

When you press Shift + A, you will get the following sequence:
12 1C F0 1C F0 12
Shift pressed, A pressed, A released, Shift released

When you press A for a long time (autorepeat will occur):
1C 1C 1C 1C 1C F0 1C
A pressed, A pressed, A pressed, A pressed, A pressed, A released (F0 1C)

To make things more complicated, extended key codes (both make and break) have been introduced, for some keys. For example, the Cursor Down (Arrow Down) key produces the following sequence:
E0 72 E0 F0 72
Cursor down pressed (E0 72), Cursor down released (E0 F0 72).

And so on... All this makes parsing a bit complicated, but eventually you will be able to figure it out.

The next step was to add the support for the keyboard within the FPGA Computer.

Introducing the keyboard interrupt

I have introduced a new interrupt for the keyboard - the IRQ#2. This IRQ is triggered when a byte from PS/2 keyboard arrives. The CPU then jumps to the address of 24 decimal, where the raw PS/2 keyboard handling routine should be. Actually, at that address should be one JUMP instruction which will jump to the handling routine.

In the main computer module, I have instantiated the PS/2 module:
// ####################################
// PS/2 keyboard instance
// ####################################
wire [7:0] ps2_data;
wire ps2_received;
reg [7:0] ps2_data_r;

ps2_read ps2(
  CLOCK_50,
  reset,
  gpio0[31], // Input pin - PS/2 data line
  gpio0[33], // Input pin - PS/2 clock line
  ps2_data,  // here we will receive a character
  ps2_received  // if something came from PS/2, this goes high
); 

Then I have detected the byte being received from the PS/2 module and triggered the IRQ:



always @ (posedge CLOCK_50) begin
// ######### IRQ2 - keyboard ######
if (ps2_received) begin
ps2_data_r <= ps2_data;
// if we have received a byte from 
// the keyboard, we will trigger the IRQ#2
irq[2] <= 1'b1;
end 
...



In the cpu.v module, I have added a support for the new interrupt:
if (irq_r[2]) begin
`ifdef DEBUG
LED[7] <= 1;
$display("3.1 JUMP TO IRQ #2 SERVICE");
`endif
pc <= 16'd24;
addr <= 16'd12;
end

So, to receive bytes from the PS/2 keyboard, a programmer must register the IRQ#2 handler:
; set the IRQ handler for keyboard to our own IRQ handler
mov r0, 1 ; JUMP instruction opcode
mov r1, IRQ2_ADDR ; IRQ#2 vector address
st [r1], r0
mov r0, irq_triggered
mov r1, IRQ2_ADDR + 2   
st [r1], r0

Since this is raw PS/2 handling, the programmer must write the complete make/break code handling. I have done that in this example.

Unfortunately, the code is quite long since it has to deal with the raw PS/2 protocol. The code demonstrates parsing the raw PS/2 protocol and it looks like those vintage screen editors:

How to use the keyboard? First of all, two callbacks should be registered - one for the key pressed, and the other one for the key released:
mov r0, 1 ; JUMP instruction opcode
mov r1, KEY_PRESSED_HANDLER_ADDR
st [r1], r0
mov r0, pressed ; key pressed routine address
mov r1, KEY_PRESSED_HANDLER_ADDR + 2
st [r1], r0

mov r0, 1 ; JUMP instruction opcode
mov r1, KEY_RELEASED_HANDLER_ADDR
st [r1], r0
mov r0, released ; key released routine address
mov r1, KEY_RELEASED_HANDLER_ADDR + 2
st [r1], r0

Both callbacks will then need to obtain the virtual key code of the key pressed (or released) by reading from the location 48 (VIRTUAL_KEY_ADDR):

pressed:
ld r0, [VIRTUAL_KEY_ADDR]
cmp r0, VK_F1
...

released:
ld r1, [VIRTUAL_KEY_ADDR]
...

What is the Virtual Key Code? It is a number assigned to each key, so all the programs would get the same number when a key is pressed, or released. In the code above, VK_F1 is the constant assigned to the F1 key, so the programmer can determine if the F1 was pressed by writing cmp r0, VK_F1.

Then, if needed, programmer can call the vk_to_char function which translates a virtual key to  the actual character, if possible (not all keys produce characters; F1 key does not produce character, for example):

; ###############################
; r1 = function vk_to_char(r1)
; translates virtual key to character
; if shift is pressed, does the uppercase
; ###############################
vk_to_char:
push r0
push r2
...

Conclusion

Most examples for keyboard support on the net use PS/2 keyboards, since USB HID protocol is quite complex and PS/2 isn't. I went the same path. I have couple of spare keyboards, some of them are PS/2, so I have soldered the PS/2 female connector and those four resistors from the schematics above. From that point on, everything was programming - a little bit of Verilog programming, and much more of assembler programming.

субота, 25. август 2018.

UART Loader

FPGA Computer UART Loader

This is a follow-up of the FPGA computer post. 

I have developed the UART Loader for the FPGA Computer to be able to send programs to it. It is based on the UART module developed in Verilog, for the FPGA Computer. This module provides both sending and receiving bytes, using 115200 bauds, 8 bits, 1 start, 1 stop bit, no parity. The serial port of the FPGA computer is connected to the TTL SerialToUSB dongle, which is then connected to the USB port of the computer:

When I initially created the FPGA Computer, I was able to store just one program in it, by hardcoding it in the RAM memory. Here is the part of the RAM.v Verilog module that includes the program in the RAM:

// Declare the RAM variable
reg [N-1:0] ram[32767:0];

initial
begin
  $readmemh("program.hex", ram);
end

The problem with this approach is that it is very slow. This program has to be embedded into the computer during the building of the computer, which can last several minutes. That is why I have devised the Loader. It is hardcoded in the RAM module, and when the computer powers on, it jumps to the address 0x0000, where I have placed a JUMP instruction to go to the Loader:

; ########################################################
; RESET CODE (4 bytes max)
; ########################################################
#addr 0x0000
j start

When started, Loader sends an initialisation sequence of bytes to the PC, via UART:

; send raspbootin boot char sequence
mov r0, 77 ; "M" character
call uart_send
mov r0, 13 ; \n character
call uart_send
mov r0, 10 ; \r character
call uart_send
mov r0, 3
call uart_send
mov r0, 3
call uart_send
mov r0, 3
call uart_send

This sequence is inherited from the original Raspbootin protocol for which I have made a Java implementation. This version is similar, but I have added a checksum at the end (more about this below).

The Loader then fetches the number of bytes to be received:

first_byte:
in r1, [64] ; get the char from the uart
st [size], r1 ; store the lowest byte to the size variable
inc [state] ; next state -> 1 (second byte)
j skip ; return from interrupt
second_byte:
in r1, [64] ; get the char from the uart (8 upper bits)
ld r2, [size] ; get the lower 8 bits (received earlier)
shl r1, 8 ; shift the received byte 8 bits
or r1, r2 ; put together lower and upper 8 bits
st [size], r1 ; store the calculated size
inc [state] ; next state 
j skip ; return from interrupt

After that, the Loader returns back the received size (just to make sure that it received the correct number of bytes):

; this is 16-bit cpu, so we don't load code bigger than 65535 bytes
; send confirmation that the code has been loaded
ld r0, [size]
and r0, 255
call uart_send
ld r0, [size]
shr r0, 8
call uart_send
inc [state] ; next state ->  (code arrives)

After that, all incoming bytes are loaded into the memory, starting from the 0x400 address:

in r1, [64] ; get the byte from the uart into r1

mov r2, r1
ld r0, [sum_all]
add r0, r2
st [sum_all], r0 ; primitive checksum - sum of all bytes
; at this moment, r1 holds the received byte
ld r2, [current_addr]
st.b [r2], r1 ; store the received byte into the memory
inc r2 ; move to the next location in memory
st [current_addr], r2   ; save the incremented value of the address

ld r2, [current_size]   ; increment the byte counter
inc r2
st [current_size], r2
cmp r2, [size] ; did we receive all?
jz all_arrived
j skip

When all bytes are received, the Loader sends back the primitive checksum, so the PC can check if everything is OK:

all_arrived:
; send the sum of all bytes
ld r0, [sum_all]
and r0, 255
call uart_send
ld r0, [sum_all]
shr r0, 8
call uart_send

mov r0, 1; signal to the main program ->loader has received all
st [loaded], r0

After that, the Loader jumps to the 0x400 address:

not_loaded:
ld r0, [loaded]
cmp r0, 1
jz 0x400
nop
j not_loaded

For the PC, I have modified the Raspbootin Loader, originally used in the Raspberry Pi bare metal programming, and it is also stored on the github.

Conclusion

When I tried Raspberry Pi bare metal programming, I immediately had the problem of transferring programs from the PC to the RPI. Usually, there is no network (it is bare metal platform with almost none of the I/O libraries) and the only other way is by transferring programs via micro SD cards (card dance). You would cross-compile the program on the PC, save it to the SD card, eject it, put it in the RPI, and reset the RPI. And then again, and again...

That was a motivation for the programmers to develop some kind of a loader for the RPI. One of those loaders is the Raspbootin. It is fairly simple. I re-used it for the exaclty same purpose - to load programs on my FPGA Computer from the PC. The only problematic part of this development was debugging the Loader. It could be only done on the FPGA, with those couple-of-minutes compiling. When I survived that, I was able to cross-assemble programs on my PC and send them to the board via Loader.


уторак, 21. август 2018.

Text mode in the FPGA computer

How text mode works


This is a follow-up of the FPGA computer post. 

In this post I will give more details about the text mode of the FPGA computer. The text mode is the default mode for the computer. When the computers powers up, this is the default mode.

Text mode is 80x60 characters, occupying 4800 words, or 9600 bytes, starting from the address of 2400

Lower byte is the ASCII code of a character, while the upper byte contains the attributes:

7

6

5

4

3

2

1

0

Foreground color, inverted

Background color

x

r

g

b

x

r

g

b


The foreground color is inverted so zero values (default) would mean white color. That way, you don't need to set the foreground color to white, and by default (0, 0, 0), it is white. The default background color is black (0, 0, 0). This means that if the upper (Attribute) byte is zero (0x00), the background color is black, and the foreground color is white.

I have used Ken Shirriff's blog post FizzBuzz a lot for this implementation. I highly recommend his posts!

Verilog implementation relies on the character ROM. Character ROM is implemented as a separate Verilog module, and is used like this:
// Character generator
chars chars_1(
  .char(curr_char[7:0]),
  .rownum(y[2:0]),
  .pixels(pixels)
); 

Current character (which is read from the address of 2400, up to the 2400+9600) is received in the curr_char register. This register is wired to the chars module, together with two additional parameters: rownum (wired to the y register - the y coordinate) and the pixels output register (this register will hold the pixels of the current character, for the current y coordinate).

The chars module itself is a giant switch statement:
always @(*)
  case ({char, rownum})

    11'b00110000000: pixels = 8'b01111100; //  XXXXX  
    11'b00110000001: pixels = 8'b11000110; // XX   XX 
    11'b00110000010: pixels = 8'b11001110; // XX  XXX 
    11'b00110000011: pixels = 8'b11011110; // XX XXXX 
    11'b00110000100: pixels = 8'b11110110; // XXXX XX 
    11'b00110000101: pixels = 8'b11100110; // XXX  XX 
    11'b00110000110: pixels = 8'b01111100; //  XXXXX  
    11'b00110000111: pixels = 8'b00000000; //         

    11'b00110001000: pixels = 8'b00110000; //   XX    
    11'b00110001001: pixels = 8'b01110000; //  XXX    
    11'b00110001010: pixels = 8'b00110000; //   XX    
    11'b00110001011: pixels = 8'b00110000; //   XX    
    11'b00110001100: pixels = 8'b00110000; //   XX    
    11'b00110001101: pixels = 8'b00110000; //   XX    
    11'b00110001110: pixels = 8'b11111100; // XXXXXX  
    11'b00110001111: pixels = 8'b00000000; //       

As you can see, the input character and the y coordinate are concatenated to determine which row of pixels will be returned to the vga text module.

How is the current_char obtained? There are three distinctive situations when this byte is obtained:
1. during the visible scanline processing. During this case, we wait for the last column (pixel) of the current character to be displayed, and then we fetch the next character:
else if (x < 640 && !mem_read) begin
 if ((x & 7) == 7) begin
  // when we are finishing current character, 
  // we need to fetch in advance 
  // the next character (x+1, y)
  // (at the last pixel of the current character, let's fetch next)
  rd <= 1'b1;
  wr <= 1'b0;
  addr <= VIDEO_MEM_ADDR + ((x >> 3) + (y >> 3)*80 + 1);
  mem_read <= 1'b1;
 end

end 
2. during the horizontal blanking. During this case, we need to obtain either the current character (we haven't finished the current row yet), or the next character in the next row:


else if ((x >= 640) && ((y & 7) < 7)) begin
// when we start the horizontal blanking, 
// and still displaying character in the current row,
// we need to fetch in advance 
// the first character in the current row (0, row)
rd <= 1'b1;
wr <= 1'b0;
addr <= VIDEO_MEM_ADDR + ((y >> 3)*80);
mem_read <= 1'b1;
end
else if ((x >= 640) && ((y & 7) == 7)) begin
// when we start the horizontal blanking, 
// and we need to go to the next line, 
// we need to fetch in advance the first character in next row (0, row+1)
rd <= 1'b1;
wr <= 1'b0;
addr <= VIDEO_MEM_ADDR + (((y >> 3) + 1)*80);
mem_read <= 1'b1;
end



3. during the vertical blanking. In this case, we need to fetch the first character, at the beginning of the frame buffer:

if ((x >= 640) && (y >= 480)) begin
// when we start the vertical blanking, 
// we need to fetch in advance the first character (0, 0)
rd <= 1'b1;
wr <= 1'b0;
addr <= VIDEO_MEM_ADDR + 0;
mem_read <= 1'b1;
end

The code above sets the address bus and control lines. The character is then fetched from the data bus:
if (mem_read) begin
curr_char <= data;
rd <= 1'bz;
wr <= 1'bz;
mem_read <= 1'b0;
end

The character is wired to the character ROM, and the output is placed in the pixels register. From that point, the pixels are shifted bit by bit to the r, g, and b wires of VGA connector:
if (valid) begin
r <= pixels[7 - (x & 7)] ? !curr_char[6+8] : curr_char[2+8];
g <= pixels[7 - (x & 7)] ? !curr_char[5+8] : curr_char[1+8];
b <= pixels[7 - (x & 7)] ? !curr_char[4+8] : curr_char[0+8];
end 
else begin
// blanking -> no pixels
r <= 1'b0;
g <= 1'b0;
b <= 1'b0;
end

It is interesting how horizontal and vertical sync pulses are generated:
assign hs = x < (640 + 16) || x >= (640 + 16 + 96);
assign vs = y < (480 + 10) || y >= (480 + 10 + 2);
assign valid = (x < 640) && (y < 480);

Just  by wiring hs an vs one-bit registers to the VGA connector and by assigning to them expressions above, horizontal and vertical sync pulses are generated according to the current state of the x and y counters. When the x counter reaches 640 + 10, it is the end of the current scanline and the hs pulse is low (inverted logic). Similarly, the vs pulse is low when the y counter (the line counter) reaches 480 + 10.

If you look at the value range of the x and y registers, you will see that the x goes from 0 to 799, while y goes from 0 to 524. This means that the actual resolution of the VGA 640x480 mode is 800x525. However, during the horizontal and vertical blanking some of those pixels (and also lines) are not visible, so the actual visible pixels are from the 640x480 range. That is detected in the "assign valid =..." line of the code above. 

Programming in assembler

Assembler examples can be found here.

Following assembler code writes a string with color attributes.
mov r1, hello  ; r1 holds the address of the "Hello World!" string
mov r2, 0      ; r2 is the index
mov r3, 0      ; r3 has the attribute
again:
ld.b r0, [r1]  ; load r0 with the content of the memory location to which r1 points (current character)
cmp r0, 0      ; if the current character is 0 (string terminator),
jz end         ; go out of this loop 
st.b [r2 + VIDEO_1], r3; store the attribute
inc r2        ; move to the character location
st.b [r2 + VIDEO_1], r0; store the character at the VIDEO_0 + r2 
inc r1        ; move to the next character in the string
inc r2         ; move to the next location (attribute) in video memory
inc r3         ; change the attribute of the current character
j again        ; continue with the loop
end:
halt


hello:

#str "Hello World!\0"



The result is on the image below.


In the emulator, it looks like this:


Conclusion

The text mode is implemented by reading the character from the framebuffer, and then by obtaining its pixels from the character ROM. When those pixels are obtained, they are shifted one by one to the VGA connector, to the corresponding r, g and b wires. That way, the character is shown on the screen. I have implemented the text mode first, and then l have implemented the graphics mode. Both modes are surprisingly simple to be implemented in Verilog.

Text mode module is on the github.