субота, 24. новембар 2018.

Hardware sprites on the FPGA computer

Adding hardware sprites

This is a follow-up of the FPGA computer post. 

I have added hardware sprites to the graphic mode of my FPGA computer. It now supports up to 16 sprites, each one being 16x16pixels in size. Here is how it looks on the monitor:
In emulator, it looks the same:



Each sprite is defined by the 8-byte structure:
  • sprite definition data address (2 bytes)
  • x coordinate (2 bytes)
  • y coordinate (2 bytes)
  • transparent color (2 bytes).
The sprite structure for the first sprite starts at address of 56 decimal. Each next sprite structure starts 8 bytes later. 

Sprite definition data consists of 16 lines, each line described by 16 pixels, each pixel defined by 4 bits: xrgbThis means that one sprite line consists of 8 bytes (two pixels per byte), so total bytes needed for the sprite definition is 8x16 bytes == 128 bytes.

Here is the example of showing one sprite at (25, 25) in assembler language:

  mov r0, sprite_def
  mov r1, 56
  st [r1], r0  ; sprite definition is at sprite_def address
  mov r0, 25
  st [r1 + 2], r0  ; x = 25  at addr 58
  mov r0, 25
  st [r1 + 4], r0  ; y = 25  at addr 60
  mov r0, 0
  st [r1 + 6], r0  ; transparent color is black (0) at addr 62
  ; sprite definition
sprite_def:
  #d16 0x0000, 0x0000, 0x0000, 0x0000  ; 0
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 1
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 2
  #d16 0x0000, 0x000f, 0xf000, 0x0000  ; 3
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 4
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 5
  #d16 0x0000, 0x044f, 0xf440, 0x0000  ; 6
  #d16 0x0000, 0x444f, 0xf444, 0x0000  ; 7
  #d16 0x0004, 0x444f, 0xf444, 0x4000  ; 8
  #d16 0x0044, 0x444f, 0xf444, 0x4400  ; 9
  #d16 0x0400, 0x004f, 0xf400, 0x0040  ; 10
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 11
  #d16 0x0000, 0x004f, 0xf400, 0x0000  ; 12
  #d16 0x0000, 0x041f, 0xf140, 0x0000  ; 13
  #d16 0x0000, 0x4111, 0x1114, 0x0000  ; 14
  #d16 0x0004, 0x4444, 0x4444, 0x4000  ; 15

How this stuff works? First of all, I had to decide how to implement sprites. I have decided to fetch all sprite data during the vertical blanking interval (VBI). During VBI, the video subsystem starts fetching sprite data by reading the 8-byte sprite structure starting from the address of 56 decimal (the address and data bus are 16-bit, so the computer is word-oriented (reads two bytes at the same time), and the actual address is set to
56 >> 1 == 28):

if ((x >= 640) && (y == 479) && (state == IN_LINE)) begin
  // when we start the vertical blanking, 
  // we need to fetch in advance the first sprite data
  state <= READ_SPRITES;
  sprite_counter <= 4'b0;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  addr <= 16'd28;    // prepare to read sprite definition address
end

In the next clock cycle, the system is in the READ_SPRITES state. The first thing that we do in the READ_SPRITES state is fetching the sprite definition address which is present at the data bus, since we have initiated a memory read from within the previous state.

Then we need to prepare the address bus for the next state in which we will fetch the x coordinate of the sprite. We do that by setting the address bus to (58 + (sprite_counter << 3)) for all sprites, having the sprite_counter iterating from 0 to 15:

READ_SPRITES: begin
  sprite_addr[sprite_counter] <= data;
  state <= READ_SPRITE_X;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  // prepare to read x coordinate of the sprite
  addr <= (16'd58 + (sprite_counter << 3)) >> 1;    
end

In the READ_SPRITE_X state, we fetch the x coordinate of the sprite which was ready at the data bus, and then we prepare to read the y coordinate in the next state:

READ_SPRITE_Y: begin
  sprite_y[sprite_counter] <= data;
  state <= READ_SPRITE_TRANSPARENT_COLOR;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  // prepare to read transparent color of the sprite  
  addr <= (16'd62 + (sprite_counter << 3)) >> 1;    
end

In the READ_SPRITE_Y state, we fetch the y coordinate of the sprite which was ready at the data bus, and then we prepare to read the sprite transparent color in the next state:

READ_SPRITE_TRANSPARENT_COLOR: begin
  sprite_transparent_color[sprite_counter] <= data[3:0];
  state <= READ_SPRITE_DATA;
  rd <= 1'b1;
  wr <= 1'b0;
  mem_read <= 1'b1;
  line_counter <= 16'b0;
  word_counter <= 4'b0;
  // read sprite definition bytes
  addr <= sprite_addr[sprite_counter] >> 1;    
end

In the READ_SPRITE_TRANSPARENT_COLOR state, we fetch the transparent color of the sprite, and then put the address of the sprite definition to the address bus so we can fetch it in the next state:

READ_SPRITE_DATA: begin
  if (line_counter < 16) begin
    case (word_counter) 
    0:  sprite_pixels[sprite_counter][line_counter][63:48] <= data;
    1:  sprite_pixels[sprite_counter][line_counter][47:32] <= data;
    2:  sprite_pixels[sprite_counter][line_counter][31:16] <= data;
    3:  sprite_pixels[sprite_counter][line_counter][15:0]  <= data;
    endcase
    state <= READ_SPRITE_DATA;
    rd <= 1'b1;
    wr <= 1'b0;
    mem_read <= 1'b1;
    if (word_counter < 3) begin
      word_counter = word_counter + 1'b1;
    end
    else begin
      word_counter = 1'b0;
      line_counter = line_counter + 16'b1;
    end
    // read sprite definition bytes
    addr = (sprite_addr[sprite_counter] + ((word_counter +
           (line_counter << 2)) << 1) ) >> 1;    
  end
  else 
  begin
    if (sprite_counter < SPRITE_NUM) begin
      sprite_counter = sprite_counter + 1'b1;
      state <= READ_SPRITES;
      rd <= 1'b1;
      wr <= 1'b0;
      mem_read <= 1'b1;
      // read next sprite definition address
      addr <= (16'd56 + (sprite_counter << 3)) >> 1;   
    end
    else begin
      sprite_counter <= 4'b0;
      rd <= 1'b1;
      wr <= 1'b0;
      mem_read <= 1'b1;
      addr <= VIDEO_MEM_ADDR + 0;
      state <= V_BLANK;
    end
  end
end

In the READ_SPRITE_DATA state we start reading sprite definition from the memory. We do it for each line of the sprite (16 lines per sprite), and within the line, for each word containing four pixels of the sprite line definition.

When we finish loading all sprite definition data for the current sprite, then we do the same for other sprite until we read all sprite definition data. Then we then set the address bus to load the pixel data at the (0, 0) position on the screen, and move to the V_BLANK state:

V_BLANK: begin
  pixels <= data;
  state <= SCAN_IDLE;
  rd <= 1'bz;
  wr <= 1'bz;
  mem_read <= 1'b0;
end

In the V_BLANK state we read the pixels of the frame buffer at the (0, 0) coordinate, and then set the all the control signals to high impedance and set the state to SCAN_IDLE. We will leave the SCAN_IDLE state when the the time comes to start displaying pixels starting from the (0, 0) coordinate.

Displaying sprite data

During the scanline processing, we need to display both original pixels from the frame buffer as well as the sprite data, and we need to make sure that the original pixels must be displayed through the transparent sprite color.

This is done in the following code:

if (valid) begin
  for (i = 0; i < SPRITE_NUM; i = i+1) begin
    if ((sprite_addr[i] != 16'b0) &&
       (xx >= sprite_x[i]) &&
       (xx < (sprite_x[i] + 16)) &&
       (yy >= sprite_y[i]) &&
       (yy < (sprite_y[i] + 16))) begin

      sprite_found = 1'b1;
      if (
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0] != sprite_transparent_color[i][0] ||
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 1] != sprite_transparent_color[i][1] ||
        sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 2] != sprite_transparent_color[i][2]
      ) begin
        r <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 0] == 1'b1;
        g <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 1] == 1'b1;
        b <= sprite_pixels[i][yy - sprite_y[i]][60-(((xx - sprite_x[i]) << 2) ) + 2] == 1'b1;
      end 
      else begin
        r <= pixels[12 - ((xx & 3) << 2) + 0] == 1'b1;
        g <= pixels[12 - ((xx & 3) << 2) + 1] == 1'b1;
        b <= pixels[12 - ((xx & 3) << 2) + 2] == 1'b1;
      end
    end 
  end
  if (!sprite_found) begin
    r <= pixels[12 - ((xx & 3) << 2) + 0] == 1'b1;
    g <= pixels[12 - ((xx & 3) << 2) + 1] == 1'b1;
    b <= pixels[12 - ((xx & 3) << 2) + 2] == 1'b1;
  end
  else begin
    sprite_found = 1'b0;
  end
end
else begin
  // blanking -> no pixels
  r <= 1'b0;
  g <= 1'b0;
  b <= 1'b0;
end
end

The most interesting thing is the "for loop". It is not a loop - it actually repeats the Verilog code SPRITE_NUM times. That is the most important thing to understand about "loops". You don't have the linear code to be executed multiple times. Instead, everything is a giant state machine that pulses with the clock signals and the "for loop" just unwraps the code multiple times, and all that unwrapped code "works" at the same time.

So, when we have this Verilog code:
 for (i = 0; i < SPRITE_NUM; i = i+1) begin
    if ((sprite_addr[i] != 16'b0) &&
       (xx >= sprite_x[i]) &&
       (xx < (sprite_x[i] + 16)) &&
       (yy >= sprite_y[i]) &&
       (yy < (sprite_y[i] + 16))) begin

It actually does this:
    if ((sprite_addr[0] != 16'b0) &&
       (xx >= sprite_x[0]) &&
       (xx < (sprite_x[0] + 16)) &&
       (yy >= sprite_y[0]) &&
       (yy < (sprite_y[0] + 16))) begin

...
    end
    if ((sprite_addr[1] != 16'b0) &&
       (xx >= sprite_x[1]) && 
       (xx < (sprite_x[1] + 16)) && 
       (yy >= sprite_y[1]) && 
       (yy < (sprite_y[1] + 16))) begin
...
    end
...


The code with the "for loop" does the same thing for all sprites:
  1. if the spite definition address is not zero, and current x and y coordinates of the scanline are within sprite coordinates, then we put the current sprite pixel color to the output r, g and b signals, or we put the original frame buffer pixel colors, if the current sprite pixel is transparent one (the color of the current sprite pixel is the transparent color).
  2. else, if the current x and y coordinates of the scanline are outside of the sprite coordinates, we put the frame buffer pixel data to the r, g and b output signals.
  3. else, it must be blanking interval, so put zeros to r, g and b to output signals.

Conclusion

This implementation of sprites requires that the vga module has its own internal memory which is filled with the sprite data from the main memory. Then, during the scanline processing, sprite pixels are combined with frame buffer pixels in a way that sprite pixels are placed "over" the frame buffer pixels, unless the current sprite pixel is the transparent one. If that is the case, then the frame buffer pixel is "shown" through the sprite.

The great thing about hardware sprites is that they do not consume processor time at all. Everything is done in hardware and showing sprites actually requires just to have the sprite definition address set to non-zero value.

уторак, 6. новембар 2018.

MozillaThunderbird still working when you close the program

How to fix the problem when Mozilla Thunderbird remains in memory after closing

I use Thunderbird for more than ten years. I simply cannot migrate to anything else. Unfortunately, Thunderbird has one annoying feature (or a bug): it stays in memory when you close the program using the X button at the top right corner. Why is that bad? Well, if you try to close the Thunderbird that way, and then you start it again, it will start behaving oddly. It will either stop recieving mails, or get stuck when sending an email. It will simply work bad.

To remedy this, it is recommended that you close the Thunderbird by using the Exit option from the File menu. However, I got used to close the application by clicking on the X button so much, that it is annoying for me to change my habits and do some extra clicks (instead of a single one).

Even worse, some people claim that the Thunderbird remains active even if you go to the File menu and choose the Exit option. I am almost sure that it happend with my Thunderbird too, couple of times.

So, I have decided to do something with that problem, and today I have created a batch file with the following content:

taskkill /IM thunderbird.exe /F
start "Thunderbird" "c:\Program Files (x86)\Mozilla Thunderbird\thunderbird.exe"

Instead of clicking on the Thunderbird icon, I click on the shortcut on my desktop which points to this batch file. 

What this batch file does? It kills the remaining Thunderbird instance before starting a new one. The "/F" switch forces the kill (just like the "kill -9" in Linux), while "/IM thunderbird.exe" is the image name of the process to be killed.

Make desktop shortcut to a batch file work on Windows 10

This brings us to the second problem: on Windows 10, if you create a shortcut on the Desktop which points to a batch file, it doesn't work (if you double click on that shortcut, nothing happens).

You need to edit the newly created shortcut and add the following text in the Target field, just before the original command:

C:\Windows\System32\cmd.exe /c "original command"

For example, my batch file is in the C:\Tools folder. This means that the original shortcut had the content of the Target field like this:

C:\Tools\thunderbird.bat

Now, the modified Target looks like this:

C:\Windows\System32\cmd.exe /c "C:\Tools\thunderbird.bat"


Pay attention that you need to put the original command in double quotes.


недеља, 14. октобар 2018.

Electronic art or Raspberry PI as a wall clock

Raspberry pi as a wall clock

In one of my posts, I have described how I have used old Android phones to show time and temperature. Both phones were connected to the power supply, and without a battery. (One of the highlights of that post was the manual how to remove the battery and yet have the phone working). In that post I was more worried about the display being turned on constantly, 365x24, than anything else. I thought that since I have removed the battery, the only thing that could break would be the screen, since it worked constantly, day and night.

Well, I was wrong. Both phones died (both managed to work that way for more than a year), but it was not the display, nor the motherboard or the CPU. The WiFi module in both phones died. One interesting fact is that one of those phones (Samsung Galaxy S plus) had the Super AMOLED capacitive touchscreen, and that screen did have a kind of burned pixels from my program. Not too noticeable, but existent.

When the first phone died, I had one spare RPI and one small touch screen for that RPI. I have written a small Swing Java program that connects to my weather server, pulls the data and displays that, just as the Android application on those dead phones. The program writes the information one pixel to the left every second, to save the pixels (not all pixels - those displaying the taskbar will be more burned). 

The result was - from this:
to this:


If you look carefully, you will notice that the display (and the RPI) is turned upside-down. That is because of the USB power cable, which tends to crack if it stands upright without any support (this RPI stands upright on its own). So, I have made one additional transformation of the picture to flip it. 

The small red rectangle at the lower right corner turns on and off each second to give me a visual indicator that the computer did not freeze (RPIs tend to freeze for too much reasons).

Then the second phone died. Again, I had another unused RPI and a 7-inch display purchased to be a portable HDMI monitor when I need to service my other RPIs permanently placed around the apartment. Well, that screen (and the RPI) has found its new purpose - to be a wall clock.

The only problem was that there were too many cables. I had 5V adapter which powers both RPI and the screen, the RPI and the screen. Two USB power cables and one HDMI cable.

So, instead of this phone:

I have created a kind of electronic art:


On the picture above, you can spot the power adapter at the lower left corner, RPI at the lower right corner, and the screen placed inside a portable LCD TV which died long time ago, but had the same-sized LCD display, so I was able to recycle the case.

Now, if you look closely at the picture, you will notice that the picture is not flipped. On this RPI, I don't have the problem with the power cable (everything was secured inside the frame), so I did not have to flip the picture. So I have introduced one additional configuration parameter: to flip the picture or not. 

The RPI connects to the weather server using WiFi. I plan to bring the Ethernet cable there as a next step.

петак, 7. септембар 2018.

Snakes!

The first game on my FPGA platform

This is a follow-up of the FPGA computer post. 

I have decided to make a game for the FPGA Computer. My friend gave me his Pascal implementation of the Snakes game and I have ported it into the FPGA Assembler. Not an easy task, but it works now:


The game was made to work in the video text mode. The frame is constructed using '-', '+', and '|' characters. The head of the snake is '@', the body is 'O', the star is '*', and when the snake hits the wall or its tail, we get the 'X' character at the crash scene.

In the emulator it works the same way:

The game is placed on the github.

Milliseconds counter register

During the game development, it occured to me that I need to implement the milliseconds counter register in order to implement the delay function. In the cpu.v file, I have created two additional registers:

reg [N-1:0] millis_counter;

reg [15:0] clock_counter;


The millis_counter register will hold the number of milliseconds counted so far. Incrementing this register is done whenever clock_counter reaches 50000 (clock is 50MHz, which means that when the clock_counter reaches 50000, one millisecond has elapsed):

always @ (posedge CLOCK_50) begin
if (clock_counter < 50000) begin
clock_counter <= clock_counter + 1'b1;
end
else begin
clock_counter <= 0;
millis_counter <= millis_counter + 1'b1;
end
...

To read the millis_counter register, I have introduced another port address for the IN instruction:

4'b0011: begin
  // IN reg, [xx]
  `ifdef DEBUG
  $display("%2x: IN r%-d, [%4d]",ir[3:0], (ir[11:8]), data);
  `endif
  case (mc_count)
0: begin
mbr <= data;  // remember the address of IO port
mc_count <= 1;
pc <= pc + 2'd2;  // move to the next instruction
end
1: begin
    case (mbr)
64: begin    // UART RX DATA
regs[ir[11:8]] <= {8'b0, rx_data_r};
end
65: begin   // UART TX BUSY
regs[ir[11:8]] <= tx_busy;
end
68: begin    // keyboard data
regs[ir[11:8]] <= {8'b0, ps2_data_r};
end
69: begin // milliseconds counted so far
regs[ir[11:8]] <= millis_counter;
end
  endcase // end of case(mbr)
  ir <= 0;      // initiate fetch
  addr <= pc >> 1;
end
default: begin
end
  endcase  // end of case (mc_count)
end // end of IN reg, [xx]

The example of the usage can be found in the snakes.asm file:

; ################################################################
; function delay(r0)
; waits for the r0 milliseconds
; ################################################################
delay:
push r1
push r2
delay_loop2:
in r1, [PORT_MILLIS] ; port 69
delay_loop1:
in r2, [PORT_MILLIS] ; port 69
sub r2, r1
jz delay_loop1 ; one millisecond elapsed here
dec r0
jnz delay_loop2
pop r2
pop r1
ret

Conclusion

This was the first game made for the FPGA Computer. The game is quite simple and uses video text mode. I had to implement a milliseconds counter register and the corresponding IN port to read it. It was used to implement the delay function. 




четвртак, 30. август 2018.

PS/2 keyboard and FPGA Computer

Added PS/2 keyboard to the FPGA Computer

This is a follow-up of the FPGA computer post. 

I have added a keyboard port to the FPGA Computer. The port is PS/2 because it is easier to work with the PS/2 than with the USB HID protocol. The final look is here (you will recognize the purple PS/2 keyboard connector):

The hardware part of this project is simple - add four resistors and a PS/2 connector:
Now the board has three connectors: PS/2, VGA and UART.

PS/2 connector is connected to the GPIO ports of the DE0-NANO board:
- Data is connected to the GPIO31 (PIN_D11) port
- Clock is connected to the GPIO33 (PIN_B12) port.

The communication between keyboard and computer is a clocked serial. Clock pulses appear on the Clock pin, while data is on the Data pin, synchronized with the Clock on the falling edge. There is one start bit, one parity bit and one stop bit. Here are oscilloscope snapshots of the "A" key being pressed (and then released):

The waveform below is the make code of the "A" key (1C hex)


The waveform below is the first byte of the "A" break code (F0 hex)

The waveform below is the second byte of the "A" break code (1C hex)

Keyboards work by sending the make and the break codes for each key. Make code is sent when the key is pressed, while the break code is sent when the key is released. For example, when we press and then release the "A" key, we get the following sequence:
1C F0 1C
This could be interpreted as: A pressed (1C), A released (F0 1C)

Unfortunately, it is all not that simple. First of all, if you quickly press A and C, one after another, you will get the following sequence:
1C 1B F0 1B F0 1C
This could be interpreted as: make code of "A", make code of "S", break code of "S" and break code of "A".

When you press Shift + A, you will get the following sequence:
12 1C F0 1C F0 12
Shift pressed, A pressed, A released, Shift released

When you press A for a long time (autorepeat will occur):
1C 1C 1C 1C 1C F0 1C
A pressed, A pressed, A pressed, A pressed, A pressed, A released (F0 1C)

To make things more complicated, extended key codes (both make and break) have been introduced, for some keys. For example, the Cursor Down (Arrow Down) key produces the following sequence:
E0 72 E0 F0 72
Cursor down pressed (E0 72), Cursor down released (E0 F0 72).

And so on... All this makes parsing a bit complicated, but eventually you will be able to figure it out.

The next step was to add the support for the keyboard within the FPGA Computer.

Introducing the keyboard interrupt

I have introduced a new interrupt for the keyboard - the IRQ#2. This IRQ is triggered when a byte from PS/2 keyboard arrives. The CPU then jumps to the address of 24 decimal, where the raw PS/2 keyboard handling routine should be. Actually, at that address should be one JUMP instruction which will jump to the handling routine.

In the main computer module, I have instantiated the PS/2 module:
// ####################################
// PS/2 keyboard instance
// ####################################
wire [7:0] ps2_data;
wire ps2_received;
reg [7:0] ps2_data_r;

ps2_read ps2(
  CLOCK_50,
  reset,
  gpio0[31], // Input pin - PS/2 data line
  gpio0[33], // Input pin - PS/2 clock line
  ps2_data,  // here we will receive a character
  ps2_received  // if something came from PS/2, this goes high
); 

Then I have detected the byte being received from the PS/2 module and triggered the IRQ:



always @ (posedge CLOCK_50) begin
// ######### IRQ2 - keyboard ######
if (ps2_received) begin
ps2_data_r <= ps2_data;
// if we have received a byte from 
// the keyboard, we will trigger the IRQ#2
irq[2] <= 1'b1;
end 
...



In the cpu.v module, I have added a support for the new interrupt:
if (irq_r[2]) begin
`ifdef DEBUG
LED[7] <= 1;
$display("3.1 JUMP TO IRQ #2 SERVICE");
`endif
pc <= 16'd24;
addr <= 16'd12;
end

So, to receive bytes from the PS/2 keyboard, a programmer must register the IRQ#2 handler:
; set the IRQ handler for keyboard to our own IRQ handler
mov r0, 1 ; JUMP instruction opcode
mov r1, IRQ2_ADDR ; IRQ#2 vector address
st [r1], r0
mov r0, irq_triggered
mov r1, IRQ2_ADDR + 2   
st [r1], r0

Since this is raw PS/2 handling, the programmer must write the complete make/break code handling. I have done that in this example.

Unfortunately, the code is quite long since it has to deal with the raw PS/2 protocol. The code demonstrates parsing the raw PS/2 protocol and it looks like those vintage screen editors:

How to use the keyboard? First of all, two callbacks should be registered - one for the key pressed, and the other one for the key released:
mov r0, 1 ; JUMP instruction opcode
mov r1, KEY_PRESSED_HANDLER_ADDR
st [r1], r0
mov r0, pressed ; key pressed routine address
mov r1, KEY_PRESSED_HANDLER_ADDR + 2
st [r1], r0

mov r0, 1 ; JUMP instruction opcode
mov r1, KEY_RELEASED_HANDLER_ADDR
st [r1], r0
mov r0, released ; key released routine address
mov r1, KEY_RELEASED_HANDLER_ADDR + 2
st [r1], r0

Both callbacks will then need to obtain the virtual key code of the key pressed (or released) by reading from the location 48 (VIRTUAL_KEY_ADDR):

pressed:
ld r0, [VIRTUAL_KEY_ADDR]
cmp r0, VK_F1
...

released:
ld r1, [VIRTUAL_KEY_ADDR]
...

What is the Virtual Key Code? It is a number assigned to each key, so all the programs would get the same number when a key is pressed, or released. In the code above, VK_F1 is the constant assigned to the F1 key, so the programmer can determine if the F1 was pressed by writing cmp r0, VK_F1.

Then, if needed, programmer can call the vk_to_char function which translates a virtual key to  the actual character, if possible (not all keys produce characters; F1 key does not produce character, for example):

; ###############################
; r1 = function vk_to_char(r1)
; translates virtual key to character
; if shift is pressed, does the uppercase
; ###############################
vk_to_char:
push r0
push r2
...

Conclusion

Most examples for keyboard support on the net use PS/2 keyboards, since USB HID protocol is quite complex and PS/2 isn't. I went the same path. I have couple of spare keyboards, some of them are PS/2, so I have soldered the PS/2 female connector and those four resistors from the schematics above. From that point on, everything was programming - a little bit of Verilog programming, and much more of assembler programming.

субота, 25. август 2018.

UART Loader

FPGA Computer UART Loader

This is a follow-up of the FPGA computer post. 

I have developed the UART Loader for the FPGA Computer to be able to send programs to it. It is based on the UART module developed in Verilog, for the FPGA Computer. This module provides both sending and receiving bytes, using 115200 bauds, 8 bits, 1 start, 1 stop bit, no parity. The serial port of the FPGA computer is connected to the TTL SerialToUSB dongle, which is then connected to the USB port of the computer:

When I initially created the FPGA Computer, I was able to store just one program in it, by hardcoding it in the RAM memory. Here is the part of the RAM.v Verilog module that includes the program in the RAM:

// Declare the RAM variable
reg [N-1:0] ram[32767:0];

initial
begin
  $readmemh("program.hex", ram);
end

The problem with this approach is that it is very slow. This program has to be embedded into the computer during the building of the computer, which can last several minutes. That is why I have devised the Loader. It is hardcoded in the RAM module, and when the computer powers on, it jumps to the address 0x0000, where I have placed a JUMP instruction to go to the Loader:

; ########################################################
; RESET CODE (4 bytes max)
; ########################################################
#addr 0x0000
j start

When started, Loader sends an initialisation sequence of bytes to the PC, via UART:

; send raspbootin boot char sequence
mov r0, 77 ; "M" character
call uart_send
mov r0, 13 ; \n character
call uart_send
mov r0, 10 ; \r character
call uart_send
mov r0, 3
call uart_send
mov r0, 3
call uart_send
mov r0, 3
call uart_send

This sequence is inherited from the original Raspbootin protocol for which I have made a Java implementation. This version is similar, but I have added a checksum at the end (more about this below).

The Loader then fetches the number of bytes to be received:

first_byte:
in r1, [64] ; get the char from the uart
st [size], r1 ; store the lowest byte to the size variable
inc [state] ; next state -> 1 (second byte)
j skip ; return from interrupt
second_byte:
in r1, [64] ; get the char from the uart (8 upper bits)
ld r2, [size] ; get the lower 8 bits (received earlier)
shl r1, 8 ; shift the received byte 8 bits
or r1, r2 ; put together lower and upper 8 bits
st [size], r1 ; store the calculated size
inc [state] ; next state 
j skip ; return from interrupt

After that, the Loader returns back the received size (just to make sure that it received the correct number of bytes):

; this is 16-bit cpu, so we don't load code bigger than 65535 bytes
; send confirmation that the code has been loaded
ld r0, [size]
and r0, 255
call uart_send
ld r0, [size]
shr r0, 8
call uart_send
inc [state] ; next state ->  (code arrives)

After that, all incoming bytes are loaded into the memory, starting from the 0x400 address:

in r1, [64] ; get the byte from the uart into r1

mov r2, r1
ld r0, [sum_all]
add r0, r2
st [sum_all], r0 ; primitive checksum - sum of all bytes
; at this moment, r1 holds the received byte
ld r2, [current_addr]
st.b [r2], r1 ; store the received byte into the memory
inc r2 ; move to the next location in memory
st [current_addr], r2   ; save the incremented value of the address

ld r2, [current_size]   ; increment the byte counter
inc r2
st [current_size], r2
cmp r2, [size] ; did we receive all?
jz all_arrived
j skip

When all bytes are received, the Loader sends back the primitive checksum, so the PC can check if everything is OK:

all_arrived:
; send the sum of all bytes
ld r0, [sum_all]
and r0, 255
call uart_send
ld r0, [sum_all]
shr r0, 8
call uart_send

mov r0, 1; signal to the main program ->loader has received all
st [loaded], r0

After that, the Loader jumps to the 0x400 address:

not_loaded:
ld r0, [loaded]
cmp r0, 1
jz 0x400
nop
j not_loaded

For the PC, I have modified the Raspbootin Loader, originally used in the Raspberry Pi bare metal programming, and it is also stored on the github.

Conclusion

When I tried Raspberry Pi bare metal programming, I immediately had the problem of transferring programs from the PC to the RPI. Usually, there is no network (it is bare metal platform with almost none of the I/O libraries) and the only other way is by transferring programs via micro SD cards (card dance). You would cross-compile the program on the PC, save it to the SD card, eject it, put it in the RPI, and reset the RPI. And then again, and again...

That was a motivation for the programmers to develop some kind of a loader for the RPI. One of those loaders is the Raspbootin. It is fairly simple. I re-used it for the exaclty same purpose - to load programs on my FPGA Computer from the PC. The only problematic part of this development was debugging the Loader. It could be only done on the FPGA, with those couple-of-minutes compiling. When I survived that, I was able to cross-assemble programs on my PC and send them to the board via Loader.