субота, 26. децембар 2020.

Networking with the FPGA computer

This is a followup of my original post.

As I mentioned in the SPI-related post, I have added the SPI interface to my FPGA computer. Not one, but two: one for the SD card, and the other one for the Ethernet card. Today I am going to talk about the Ethernet.

First of all, I have used the ENC28J60 module, which I use for my Raspberry Pi Zero and Arduino/ESP32 ethernet connectivity This is rather simple module, which uses SPI as an interface to the host computer. Since I have already used this module with the Arduino and ESP32, I have decided to reuse the corresponding Arduino library for this module and to adjust it to work with my FPGA computer.

The library I used for the Arduino is: https://github.com/njh/EtherCard

This library is written in C++. Since I haven't finished porting GCC to my FPGA, I don't have the support for the C++. This means that I had to unwrap the code from C++ to pure C. When I finished that, the only thing that I had to do was to replace Arduino-based SPI code with my FPGA SPI code. For example, one of the original functions was:

static void writeOp (byte opbyte addressbyte data) {
    enableChip();
    SpiPtr->beginTransaction(SPISettings(spiClk, MSBFIRST, SPI_MODE0));
    SpiPtr->transfer(op | (address & ADDR_MASK));
    SpiPtr->transfer(data);
    SpiPtr->endTransaction();
    disableChip();
}

My code is:

void enc28j60WriteOp(uint8_t opuint8_t addressuint8_t data)
{
        chipSelectLowE();
        // issue write command
        spiSendE(op | (address & ADDR_MASK));
        // write data
        spiSendE(data);
        chipSelectHighE();
}

The support for the TCP/IP protocol is built in the EtherCard library. It also had to be modified from C++ to C. After that, I was able to use the library.

Besides simple TCP/IP examples, I have decided to make use of this network support. I have added network drive support in my BASIC interpreter - I have added the DRIVE command. DRIVE 0 selects the SD card. DRIVE 1 selects the UART-based drive which communicates with the Raspbootin application on the PC via UART, while DRIVE 2 sets the network drive which also communicates with the Raspbootin application on the PC, but this time over Ethernet.

Here is the snapshot of the FPGA screen:


In the example above, the drive was set to be the network drive and directory was listed on the PC. The C1.BAS program was loaded from the PC via network drive.

The C code for the DIR command is here:

// called when the client request is complete
void my_callback (uint8_t statusuint16_t offuint16_t len) {
    memcpy(to_print_buffeth_buffer+offlen);
    to_print_len = len;

...
// DRIVE 2 - ETHERNET NETWORK DRIVE
to_print_len = 0;
browseUrl("/dir""", server_ip, 0, my_callback);
for (i = 0; i < 1000; i++) { // approx. 1MB max file size
    packetLoop(enc28j60PacketReceive(4500, eth_buffer));
    if (to_print_len > 0) {
        to_print_buff[to_print_len] = 0;
        printf("%s\n", to_print_buff);
        to_print_len = 0;
        return;
    }
}
printf("NETWORK TIMEOUT\n");
...

The code on the PC side is here:

if (req.startsWith("/dir")) {
    File currFile = new File(Rest.path);
    File dir = currFile.getParentFile().getParentFile();
    System.out.println(dir.getCanonicalPath());
    File[] files = dir.listFiles();
    StringBuilder sb = new StringBuilder();
    for (File f : files) {
        if (f.isDirectory()) {
            sb.append("<" + f.getName() + ">");
            sb.append("\n");
        }
    }
    for (File f : files) {
        if (f.isFile()) {
            sb.append(f.getName());
            sb.append("\n");
        }
    }
    String str = sb.toString();
    int size = str.length();
    System.out.println("size: " + size);
    out.print(str);
}

Conclusion

Since I have implemented the SPI interface on the FPGA computer, it was possible to connect the ENC28J60 Ethernet module to it. I cannot stress enough how important for me was to be able to port the GCC to my platform. That allows me to use all sorts of C code instead of programming in assembly language.

Network drive also makes development easy since I do all the programming on my PC, I compile it using the GCC, and then I load that program on the FPGA computer over Ethernet. No need to transfer the program to the SD card (card dance), or to use slower UART. This time I use the Ethernet!


четвртак, 24. децембар 2020.

Adding PS/2 mouse to my FPGA computer

This is a followup of my original post.

So far I had PS/keyboard only on my FPGA computer. The time has come to add the mouse, too. Without any investigation how PS/2 mouse works, I first tried to plug the mouse into my PS/2 keyboard connector and watch what would come from it. It didn't work. The keyboard worked, but the mouse didn't. I expected that the mouse would send bytes as I move or click, but it didn't. After a brief investigation, I found out that the PS/2 mouse needs an initialization in order to start sending bytes to the computer.

PS/2 is actually a bidirectional interface. Both computer and mouse/keyboard can send bytes to the other. Well, the initialization sequence for the mouse actually means that the computer needs to send one byte to the mouse. Unfortunately, that is not so simple. In order to send a command to the mouse, host (computer) needs to set both data and clock lines low for a given period of time, then to release both lines, and then to start setting bits of the command in synchronization with the clock that has just started to arrive from the mouse.

Fortunately, there is a module that already does all these steps, and can be found here.

I have replaced my original PS2 module with this one and now I have two ports in my computer:

// ####################################
// PS/2 keyboard instance
// ####################################
wire [7:0] ps2_data;
wire ps2_received;
reg [7:0] ps2_data_r;

PS2_Controller #(.INITIALIZE_MOUSE(0)) PS2 (
    // Inputs
    .CLOCK_50           (CLOCK_50),
    .reset              (~KEY[0]),

    // Bidirectionals
    .PS2_CLK            (gpio0[33]),
    .PS2_DAT            (gpio0[31]),

    // Outputs
    .received_data      (ps2_data),
    .received_data_en   (ps2_received)
); 

// ####################################
// PS/2 mouse instance
// ####################################
wire [7:0] ps2_data_mouse;
wire ps2_received_mouse;
reg [7:0] ps2_data_r_mouse;

PS2_Controller PS2_mouse (
    // Inputs
    .CLOCK_50           (CLOCK_50),
    .reset              (~KEY[0]),

    // Bidirectionals
    .PS2_CLK            (gpio0[2]),
    .PS2_DAT            (gpio0[4]),

    // Outputs
    .received_data      (ps2_data_mouse),
    .received_data_en   (ps2_received_mouse)
); 

The default value for the INITIALIZE_MOUSE parameter is 1, so the mouse controller initializes the mouse at reset.

I have allocated another IRQ for the mouse: 

localparam IRQ_PS2_MOUSE   = 5;

In the main irq loop, the mouse actually triggers the CPU interrupt #5:

always @ (posedge clk100) begin
    ...
    // ############################### IRQ2 - PS/2 keyboard #############################
    if (ps2_received) begin
        ps2_data_r <= ps2_data;
        // if we have received a byte from the keyboard, we will trigger the IRQ#2
        irq[IRQ_PS2] <= 1'b1;
    end
    else 
    begin
        irq[IRQ_PS2] <= 1'b0;
    end
    // ############################### IRQ5 - PS/2 mouse #############################
    if (ps2_received_mouse) begin
        ps2_data_r_mouse <= ps2_data_mouse;
        // if we have received a byte from the keyboard, we will trigger the IRQ#2
        irq[IRQ_PS2_MOUSE] <= 1'b1;
    end
    else 
    begin
        irq[IRQ_PS2_MOUSE] <= 1'b0;
    end
    ...
end

When the CPU detects that some bit in the irq register is set to 1, it triggers the interrupt handler routine. It does that by first checking if the interrupt vector is not zero. After that, the CPU pushes the current PC register and flags to the stack and then jumps to the interrupt handling routine. 

The mouse interrupt routine receives three bytes from the PS/2 mouse:

unsigned short int *PORT_PS2_MOUSE  = (unsigned short int *)(0x80000000 + 800)  ; // port for PS2 mouse

void ps2_mouse_irq_triggered()
{
asm 
    (
        "push r0\npush r1\npush r2\npush r3\npush r4\npush r5\npush r6\npush r7\npush r8\npush r9\npush r10\npush r11\npush r12\npush r13\n"
    );

    mouse_byte[mouse_counter++] = *PORT_PS2_MOUSE;
    if (mouse_counter == 3)
        mouse_counter = 0;

    asm 
    (
        "pop r13\npop r12\npop r11\npop r10\npop r9\npop r8\npop r7\npop r6\npop r5\npop r4\npop r3\npop r2\npop r1\npop r0\nmov.w sp,r13\npop r13\niret"
    );
}

void init_mouse() {
    mouse_counter = 0;
    *PS2_MOUSE_HANDLER_INSTR    = 1;
    *PS2_MOUSE_HANDLER_ADDR     = (int)&ps2_mouse_irq_triggered;
}

Three bytes for each mouse event come one by one. When all three bytes arrive, we are ready to process them in the main program:

        if ((mouse_counter == 0) && (
            mouse_byte[0] != old_mouse_byte[0] ||
            mouse_byte[1] != old_mouse_byte[1] || 
            mouse_byte[2] != old_mouse_byte[2])) {
                sprintf(str"mouse: %d, %d, %d"mouse_byte[0], mouse_byte[1], mouse_byte[2]);
                draw(1020REDstr);
                
                old_mouse_byte[0] = mouse_byte[0];
                old_mouse_byte[1] = mouse_byte[1];
                old_mouse_byte[2] = mouse_byte[2];
                ...

The first byte gives the button status: which button has been pressed. The second byte gives the x-axis speed of movement, or the amount of pixels the mouse has moved, while the third byte does the same, just for the y-axis. Both second and third byte are 8-bit signed values, meaning that if you read a value greater than 127, the value is negative (and can be calculated by subtracting it from the 256).

The whole demo can be seen here:



Conclusion

Adding PS/2 mouse was not that complicated once I managed to find the proper Verilog controller. After that, it was just the matter of allocating one more interrupt and writing handlers for it.


субота, 19. децембар 2020.

To BLIT or not to BLIT

This is a followup of my original post.

I have recently implemented the BLIT instruction for my FPGA computer. It is the most simple version of BLIT: copy the given number of bytes from the source memory location to the destination memory location. The syntax is like this:

mov.w r1, 1024  # destination address is in r1
mov.w r2, 9024  # source address is in r2
mov.w r3, 8000  # number of bytes is in r3
blit            # copy bytes

Registers r1, r2 and r3 are hardcoded. Later I might make it more flexible.

Results are quite impressive. When I copy 32KB using memcpy (not using BLIT), it takes approximately 100 milliseconds. When I use the BLIT instruction, it takes one millisecond!

How is BLIT implemented? Here is the Verilog code:

4'b1000begin
    // BLIT (r1, r2, r3) - r1 - dst; r2 - src; r3 - count
    case (mc_count)
        0begin
            addr <= regs[2] >> 1;
            regs[2] <= regs[2] + 2;
            regs[3] <= regs[3] - 2;
            mc_count <= 1;
            next_state <= EXECUTE;
            state <= READ_DATA;
        end
        1begin
            addr <= regs[1] >> 1;
            data_to_write <= data_r;
            regs[1] <= regs[1] + 2;
            next_state <= EXECUTE;
            state <= WRITE_DATA;
            if (regs[3] <= 0begin
                mc_count <= 2;
            end
            else 
                mc_count <= 0;
        end
        2begin
            state <= CHECK_IRQ;
            pc <= pc + 2;
        end
    endcase
end

In the code above we see that the CPU starts memory read at the address pointed by the r2 register in the first mc_count cycle. Then it obtains the word (two bytes) from memory and writes them to the address pointed by the r1 register. Both r1 and r2 are incremented by two and the r3 register is decremented by two; when it reaches zero, the instruction finishes.

Conclusion

The BLIT instruction does not execute in parallel with the CPU. It blocks the CPU while executing. Even with this constraint, it is approximately hundred times faster then copying bytes across the memory using the memcpy function. Therefore, it is worth using.