My projects: август 2018

четвртак, 30. август 2018.

PS/2 keyboard and FPGA Computer

Added PS/2 keyboard to the FPGA Computer

This is a follow-up of the FPGA computer post.

I have added a keyboard port to the FPGA Computer. The port is PS/2 because it is easier to work with the PS/2 than with the USB HID protocol. The final look is here (you will recognize the purple PS/2 keyboard connector):

The hardware part of this project is simple - add four resistors and a PS/2 connector:

Now the board has three connectors: PS/2, VGA and UART.

PS/2 connector is connected to the GPIO ports of the DE0-NANO board:
- Data is connected to the GPIO31 (PIN_D11) port
- Clock is connected to the GPIO33 (PIN_B12) port.

The communication between keyboard and computer is a clocked serial. Clock pulses appear on the Clock pin, while data is on the Data pin, synchronized with the Clock on the falling edge. There is one start bit, one parity bit and one stop bit. Here are oscilloscope snapshots of the "A" key being pressed (and then released):

The waveform below is the make code of the "A" key (1C hex)

The waveform below is the first byte of the "A" break code (F0 hex)

The waveform below is the second byte of the "A" break code (1C hex)

Keyboards work by sending the make and the break codes for each key. Make code is sent when the key is pressed, while the break code is sent when the key is released. For example, when we press and then release the "A" key, we get the following sequence:

1C F0 1C
This could be interpreted as: A pressed (1C), A released (F0 1C)

Unfortunately, it is all not that simple. First of all, if you quickly press A and C, one after another, you will get the following sequence:
1C 1B F0 1B F0 1C
This could be interpreted as: make code of "A", make code of "S", break code of "S" and break code of "A".

When you press Shift + A, you will get the following sequence:
12 1C F0 1C F0 12
Shift pressed, A pressed, A released, Shift released

When you press A for a long time (autorepeat will occur):
1C 1C 1C 1C 1C F0 1C
A pressed, A pressed, A pressed, A pressed, A pressed, A released (F0 1C)

To make things more complicated, extended key codes (both make and break) have been introduced, for some keys. For example, the Cursor Down (Arrow Down) key produces the following sequence:
E0 72 E0 F0 72
Cursor down pressed (E0 72), Cursor down released (E0 F0 72).

And so on... All this makes parsing a bit complicated, but eventually you will be able to figure it out.

The next step was to add the support for the keyboard within the FPGA Computer.

Introducing the keyboard interrupt

I have introduced a new interrupt for the keyboard - the IRQ#2. This IRQ is triggered when a byte from PS/2 keyboard arrives. The CPU then jumps to the address of 24 decimal, where the raw PS/2 keyboard handling routine should be. Actually, at that address should be one JUMP instruction which will jump to the handling routine.

In the main computer module, I have instantiated the PS/2 module:

// ####################################

// PS/2 keyboard instance

// ####################################

wire [7:0] ps2_data;

wire ps2_received;

reg [7:0] ps2_data_r;

ps2_read ps2(

CLOCK_50,

reset,

gpio0[31], // Input pin - PS/2 data line

gpio0[33], // Input pin - PS/2 clock line

ps2_data, // here we will receive a character

ps2_received // if something came from PS/2, this goes high

);

Then I have detected the byte being received from the PS/2 module and triggered the IRQ:

always @ (posedge CLOCK_50) begin

// ######### IRQ2 - keyboard ######

if (ps2_received) begin

ps2_data_r <= ps2_data;

// if we have received a byte from

// the keyboard, we will trigger the IRQ#2

irq[2] <= 1'b1;

end

...

In the cpu.v module, I have added a support for the new interrupt:

if (irq_r[2]) begin

`ifdef DEBUG

LED[7] <= 1;

$display("3.1 JUMP TO IRQ #2 SERVICE");

`endif

pc <= 16'd24;

addr <= 16'd12;

end

So, to receive bytes from the PS/2 keyboard, a programmer must register the IRQ#2 handler:
; set the IRQ handler for keyboard to our own IRQ handler
mov r0, 1 ; JUMP instruction opcode
mov r1, IRQ2_ADDR ; IRQ#2 vector address
st [r1], r0
mov r0, irq_triggered
mov r1, IRQ2_ADDR + 2
st [r1], r0

Since this is raw PS/2 handling, the programmer must write the complete make/break code handling. I have done that in this example.

Unfortunately, the code is quite long since it has to deal with the raw PS/2 protocol. The code demonstrates parsing the raw PS/2 protocol and it looks like those vintage screen editors:

How to use the keyboard? First of all, two callbacks should be registered - one for the key pressed, and the other one for the key released:

mov r0, 1 ; JUMP instruction opcode

mov r1, KEY_PRESSED_HANDLER_ADDR

st [r1], r0

mov r0, pressed ; key pressed routine address

mov r1, KEY_PRESSED_HANDLER_ADDR + 2

st [r1], r0

mov r0, 1 ; JUMP instruction opcode

mov r1, KEY_RELEASED_HANDLER_ADDR

st [r1], r0

mov r0, released ; key released routine address

mov r1, KEY_RELEASED_HANDLER_ADDR + 2

st [r1], r0

Both callbacks will then need to obtain the virtual key code of the key pressed (or released) by reading from the location 48 (VIRTUAL_KEY_ADDR):

pressed:

ld r0, [VIRTUAL_KEY_ADDR]

cmp r0, VK_F1

...

released:

ld r1, [VIRTUAL_KEY_ADDR]

...

What is the Virtual Key Code? It is a number assigned to each key, so all the programs would get the same number when a key is pressed, or released. In the code above, VK_F1 is the constant assigned to the F1 key, so the programmer can determine if the F1 was pressed by writing cmp r0, VK_F1.

Then, if needed, programmer can call the vk_to_char function which translates a virtual key to the actual character, if possible (not all keys produce characters; F1 key does not produce character, for example):

; ###############################

; r1 = function vk_to_char(r1)

; translates virtual key to character

; if shift is pressed, does the uppercase

; ###############################

vk_to_char:

push r0

push r2

...

Conclusion

Most examples for keyboard support on the net use PS/2 keyboards, since USB HID protocol is quite complex and PS/2 isn't. I went the same path. I have couple of spare keyboards, some of them are PS/2, so I have soldered the PS/2 female connector and those four resistors from the schematics above. From that point on, everything was programming - a little bit of Verilog programming, and much more of assembler programming.

субота, 25. август 2018.

UART Loader

FPGA Computer UART Loader

This is a follow-up of the FPGA computer post.

I have developed the UART Loader for the FPGA Computer to be able to send programs to it. It is based on the UART module developed in Verilog, for the FPGA Computer. This module provides both sending and receiving bytes, using 115200 bauds, 8 bits, 1 start, 1 stop bit, no parity. The serial port of the FPGA computer is connected to the TTL SerialToUSB dongle, which is then connected to the USB port of the computer:

When I initially created the FPGA Computer, I was able to store just one program in it, by hardcoding it in the RAM memory. Here is the part of the RAM.v Verilog module that includes the program in the RAM:

// Declare the RAM variable

reg [N-1:0] ram[32767:0];

initial

begin

$readmemh("program.hex", ram);

end

The problem with this approach is that it is very slow. This program has to be embedded into the computer during the building of the computer, which can last several minutes. That is why I have devised the Loader. It is hardcoded in the RAM module, and when the computer powers on, it jumps to the address 0x0000, where I have placed a JUMP instruction to go to the Loader:

; ########################################################

; RESET CODE (4 bytes max)

; ########################################################

#addr 0x0000

j start

When started, Loader sends an initialisation sequence of bytes to the PC, via UART:

; send raspbootin boot char sequence

mov r0, 77 ; "M" character

call uart_send

mov r0, 13 ; \n character

call uart_send

mov r0, 10 ; \r character

call uart_send

mov r0, 3

call uart_send

mov r0, 3

call uart_send

mov r0, 3

call uart_send

This sequence is inherited from the original Raspbootin protocol for which I have made a Java implementation. This version is similar, but I have added a checksum at the end (more about this below).

The Loader then fetches the number of bytes to be received:

first_byte:

in r1, [64] ; get the char from the uart

st [size], r1 ; store the lowest byte to the size variable

inc [state] ; next state -> 1 (second byte)

j skip ; return from interrupt

second_byte:

in r1, [64] ; get the char from the uart (8 upper bits)

ld r2, [size] ; get the lower 8 bits (received earlier)

shl r1, 8 ; shift the received byte 8 bits

or r1, r2 ; put together lower and upper 8 bits

st [size], r1 ; store the calculated size

inc [state] ; next state

j skip ; return from interrupt

After that, the Loader returns back the received size (just to make sure that it received the correct number of bytes):

; this is 16-bit cpu, so we don't load code bigger than 65535 bytes

; send confirmation that the code has been loaded

ld r0, [size]

and r0, 255

call uart_send

ld r0, [size]

shr r0, 8

call uart_send

inc [state] ; next state -> (code arrives)

After that, all incoming bytes are loaded into the memory, starting from the 0x400 address:

in r1, [64] ; get the byte from the uart into r1

mov r2, r1

ld r0, [sum_all]

add r0, r2

st [sum_all], r0 ; primitive checksum - sum of all bytes

; at this moment, r1 holds the received byte

ld r2, [current_addr]

st.b [r2], r1 ; store the received byte into the memory

inc r2 ; move to the next location in memory

st [current_addr], r2 ; save the incremented value of the address

ld r2, [current_size] ; increment the byte counter

inc r2

st [current_size], r2

cmp r2, [size] ; did we receive all?

jz all_arrived

j skip

When all bytes are received, the Loader sends back the primitive checksum, so the PC can check if everything is OK:

all_arrived:

; send the sum of all bytes

ld r0, [sum_all]

and r0, 255

call uart_send

ld r0, [sum_all]

shr r0, 8

call uart_send

mov r0, 1; signal to the main program ->loader has received all

st [loaded], r0

After that, the Loader jumps to the 0x400 address:

not_loaded:

ld r0, [loaded]

cmp r0, 1

jz 0x400

nop

j not_loaded

For the PC, I have modified the Raspbootin Loader, originally used in the Raspberry Pi bare metal programming, and it is also stored on the github.

Conclusion

When I tried Raspberry Pi bare metal programming, I immediately had the problem of transferring programs from the PC to the RPI. Usually, there is no network (it is bare metal platform with almost none of the I/O libraries) and the only other way is by transferring programs via micro SD cards (card dance). You would cross-compile the program on the PC, save it to the SD card, eject it, put it in the RPI, and reset the RPI. And then again, and again...

That was a motivation for the programmers to develop some kind of a loader for the RPI. One of those loaders is the Raspbootin. It is fairly simple. I re-used it for the exaclty same purpose - to load programs on my FPGA Computer from the PC. The only problematic part of this development was debugging the Loader. It could be only done on the FPGA, with those couple-of-minutes compiling. When I survived that, I was able to cross-assemble programs on my PC and send them to the board via Loader.

уторак, 21. август 2018.

Text mode in the FPGA computer

How text mode works

This is a follow-up of the FPGA computer post.

In this post I will give more details about the text mode of the FPGA computer. The text mode is the default mode for the computer. When the computers powers up, this is the default mode.

Text mode is 80x60 characters, occupying 4800 words, or 9600 bytes, starting from the address of 2400.

Lower byte is the ASCII code of a character, while the upper byte contains the attributes:

7	6	5	4	3	2	1	0
Foreground color, inverted				Background color
x	r	g	b	x	r	g	b

The foreground color is inverted so zero values (default) would mean white color. That way, you don't need to set the foreground color to white, and by default (0, 0, 0), it is white. The default background color is black (0, 0, 0). This means that if the upper (Attribute) byte is zero (0x00), the background color is black, and the foreground color is white.

I have used Ken Shirriff's blog post FizzBuzz a lot for this implementation. I highly recommend his posts!

Verilog implementation relies on the character ROM. Character ROM is implemented as a separate Verilog module, and is used like this:

// Character generator

chars chars_1(

.char(curr_char[7:0]),

.rownum(y[2:0]),

.pixels(pixels)

);

Current character (which is read from the address of 2400, up to the 2400+9600) is received in the curr_char register. This register is wired to the chars module, together with two additional parameters: rownum (wired to the y register - the y coordinate) and the pixels output register (this register will hold the pixels of the current character, for the current y coordinate).

The chars module itself is a giant switch statement:

always @(*)

case ({char, rownum})

11'b00110000000: pixels = 8'b01111100; // XXXXX

11'b00110000001: pixels = 8'b11000110; // XX XX

11'b00110000010: pixels = 8'b11001110; // XX XXX

11'b00110000011: pixels = 8'b11011110; // XX XXXX

11'b00110000100: pixels = 8'b11110110; // XXXX XX

11'b00110000101: pixels = 8'b11100110; // XXX XX

11'b00110000110: pixels = 8'b01111100; // XXXXX

11'b00110000111: pixels = 8'b00000000; //

11'b00110001000: pixels = 8'b00110000; // XX

11'b00110001001: pixels = 8'b01110000; // XXX

11'b00110001010: pixels = 8'b00110000; // XX

11'b00110001011: pixels = 8'b00110000; // XX

11'b00110001100: pixels = 8'b00110000; // XX

11'b00110001101: pixels = 8'b00110000; // XX

11'b00110001110: pixels = 8'b11111100; // XXXXXX

11'b00110001111: pixels = 8'b00000000; //

As you can see, the input character and the y coordinate are concatenated to determine which row of pixels will be returned to the vga text module.

How is the current_char obtained? There are three distinctive situations when this byte is obtained:

1. during the visible scanline processing. During this case, we wait for the last column (pixel) of the current character to be displayed, and then we fetch the next character:
else if (x < 640 && !mem_read) begin
if ((x & 7) == 7) begin
// when we are finishing current character,
// we need to fetch in advance
// the next character (x+1, y)
// (at the last pixel of the current character, let's fetch next)
rd <= 1'b1;
wr <= 1'b0;
addr <= VIDEO_MEM_ADDR + ((x >> 3) + (y >> 3)*80 + 1);
mem_read <= 1'b1;
end

end
2. during the horizontal blanking. During this case, we need to obtain either the current character (we haven't finished the current row yet), or the next character in the next row:

else if ((x >= 640) && ((y & 7) < 7)) begin

// when we start the horizontal blanking,

// and still displaying character in the current row,

// we need to fetch in advance

// the first character in the current row (0, row)

rd <= 1'b1;

wr <= 1'b0;

addr <= VIDEO_MEM_ADDR + ((y >> 3)*80);

mem_read <= 1'b1;

end

else if ((x >= 640) && ((y & 7) == 7)) begin

// when we start the horizontal blanking,

// and we need to go to the next line,

// we need to fetch in advance the first character in next row (0, row+1)

rd <= 1'b1;

wr <= 1'b0;

addr <= VIDEO_MEM_ADDR + (((y >> 3) + 1)*80);

mem_read <= 1'b1;

end

3. during the vertical blanking. In this case, we need to fetch the first character, at the beginning of the frame buffer:

if ((x >= 640) && (y >= 480)) begin

// when we start the vertical blanking,

// we need to fetch in advance the first character (0, 0)

rd <= 1'b1;

wr <= 1'b0;

addr <= VIDEO_MEM_ADDR + 0;

mem_read <= 1'b1;

end

The code above sets the address bus and control lines. The character is then fetched from the data bus:

if (mem_read) begin

curr_char <= data;

rd <= 1'bz;

wr <= 1'bz;

mem_read <= 1'b0;

end

The character is wired to the character ROM, and the output is placed in the pixels register. From that point, the pixels are shifted bit by bit to the r, g, and b wires of VGA connector:

if (valid) begin

r <= pixels[7 - (x & 7)] ? !curr_char[6+8] : curr_char[2+8];

g <= pixels[7 - (x & 7)] ? !curr_char[5+8] : curr_char[1+8];

b <= pixels[7 - (x & 7)] ? !curr_char[4+8] : curr_char[0+8];

end

else begin

// blanking -> no pixels

r <= 1'b0;

g <= 1'b0;

b <= 1'b0;

end

It is interesting how horizontal and vertical sync pulses are generated:

assign hs = x < (640 + 16) || x >= (640 + 16 + 96);

assign vs = y < (480 + 10) || y >= (480 + 10 + 2);

assign valid = (x < 640) && (y < 480);

Just by wiring hs an vs one-bit registers to the VGA connector and by assigning to them expressions above, horizontal and vertical sync pulses are generated according to the current state of the x and y counters. When the x counter reaches 640 + 10, it is the end of the current scanline and the hs pulse is low (inverted logic). Similarly, the vs pulse is low when the y counter (the line counter) reaches 480 + 10.

If you look at the value range of the x and y registers, you will see that the x goes from 0 to 799, while y goes from 0 to 524. This means that the actual resolution of the VGA 640x480 mode is 800x525. However, during the horizontal and vertical blanking some of those pixels (and also lines) are not visible, so the actual visible pixels are from the 640x480 range. That is detected in the "assign valid =..." line of the code above.

Programming in assembler

Assembler examples can be found here.

Following assembler code writes a string with color attributes.

mov r1, hello ; r1 holds the address of the "Hello World!" string

mov r2, 0 ; r2 is the index

mov r3, 0 ; r3 has the attribute

again:

ld.b r0, [r1] ; load r0 with the content of the memory location to which r1 points (current character)

cmp r0, 0 ; if the current character is 0 (string terminator),

jz end ; go out of this loop

st.b [r2 + VIDEO_1], r3; store the attribute

inc r2 ; move to the character location

st.b [r2 + VIDEO_1], r0; store the character at the VIDEO_0 + r2

inc r1 ; move to the next character in the string

inc r2 ; move to the next location (attribute) in video memory

inc r3 ; change the attribute of the current character

j again ; continue with the loop

end:

halt

hello:

#str "Hello World!\0"

The result is on the image below.

In the emulator, it looks like this:

Conclusion

The text mode is implemented by reading the character from the framebuffer, and then by obtaining its pixels from the character ROM. When those pixels are obtained, they are shifted one by one to the VGA connector, to the corresponding r, g and b wires. That way, the character is shown on the screen. I have implemented the text mode first, and then l have implemented the graphics mode. Both modes are surprisingly simple to be implemented in Verilog.

Text mode module is on the github.