уторак, 30. јун 2020.

SPI interface on my FPGA computer

This is a follow-up of my original FPGA computer post.

SPI interface is a kind of a standard when it comes to connecting various peripherals to a computer (or, at least to a microcontroller). There is also I2C interface, but I will focus on the SPI in this post.

SPI stands for Serial Peripheral Interface. It is organized as a master-slave communication. If we presume that our FPGA computer is master, then the peripheral will be slave.

It usually has four important pins:
1. MISO (Master In Slave Out) - a wire which is used to transport data from slave to the master device,
2. MOSI (Master Out Slave In) - a wire which is used to transport data from master to the slave device,
3. SCL - clock (all the data transport is synchronized using this clock line), and
4. SS (Slave Select) - when active, the slave is selected (sometimes it is called CS - chip select). With this wire, it is possible to connect several peripherals to the same three mentioned wires (MISO, MOSI and SCK) and to have separate SS wires to each peripheral.

Why did I choose to use the SPI on my computer? First of all, SD cards have SPI built-in. This means that every SD card is actually a SPI slave device. Next, I use the ENC28J60 Ethernet module for my Arduino/ESP32/RaspberryPi Zero devices for the Ethernet connectivity. That module has SPI interface, too.


How did I integrate SPI into my FPGA computer. I have found a very nice implementation in Verilog here:
https://github.com/nandland/spi-master

BTW, that guy has excellent YouTube channel here: https://www.youtube.com/channel/UCsdA-aNqtMA1_2T15aXePWw

Next I had to integrate that module into my FPGA computer. I have decided to allocate an interrupt for the incoming data from the SPI and to ignore the module-controlled SS pin (I will manually activate SS signal from code, instead of letting that job to the SPI module):
// ####################################
// SPI Master instance
// ####################################
wire spi_start;
wire [7:0] spi_in;
reg [7:0] spi_out;
wire spi_ready;
wire spi_received;
reg [7:0] spi_in_r;
reg fake_CS;

SPI_Master_With_Single_CS spi0 (
.i_Clk(clk100),
.i_Rst_L(KEY[0]),
.i_TX_Count(1),
.i_TX_DV(spi_start),
.o_RX_Byte(spi_in),
.i_TX_Byte(spi_out),
.o_RX_DV(spi_received),
.o_TX_Ready(spi_ready),

.o_SPI_MOSI(gpio0[32]),
.i_SPI_MISO(gpio0[30]),
.o_SPI_Clk(gpio0[28]),
.o_SPI_CS_n(fake_CS)
);


The code above creates a SPI module named spi0 and connects it to a set of wires and registers. Next, in the main interrupt part, when the spi_received wire goes high (a byte has arrived on SPI), the IRQ_SPI interrupt is triggered:
// ##################### IRQ3 - SPI Master #####################
if (spi_received) begin
spi_in_r <= spi_in;
// if we have received a byte from the MISO,
  // we will trigger the IRQ#3
irq[IRQ_SPI] <= 1'b1;
end
else
begin
irq[IRQ_SPI] <= 1'b0;
end


In the CPU module, the IRQ_SPI interrupt causes processor to go to the predefined interrupt handler routine at the address of 56:
else if (irq_r[IRQ_SPI]) begin
// SPI byte received
pc <= 16'd56;
addr <= 16'd28;
irq_r[IRQ_SPI] <= 0;
end


All you have to do is to put some code at the address of 56 and to return from the interrupt handler routine using the IRET assembly instruction:

spi_irq_triggered:     push r0     ld.w    r0, [PORT_SPI_IN]   # PORT_SPI_IN.5_1, PORT_SPI_IN     ld.s    r0, [r0]    # _2, *PORT_SPI_IN.5_1     zex.s   r0, r0  # _3, _2     st.w    [received_byte], r0 # received_byte, _3    mov.w   r0, 1   # tmp29,     st.w    [received_from_slave], r0   # received_from_slave, tmp29     pop r0     iret

Now that I have the C compiler, the SPI interrupt handler routine can be written in C:
void init_spi()
{
    *SPI_HANDLER_INSTR  = 1;
    *SPI_HANDLER_ADDR   = (int)&spi_irq_triggered;
}

void spi_irq_triggered()
{
    received_byte = *PORT_SPI_IN;
    received_from_slave = 1;
    asm 
    (
        "mov.w sp,r13\npop r13\niret"
    );
}

In order to read the received byte, and to send some byte to the SPI, we need to implement some IO operations. As usual, I have done that in both direct and memory-mapped way. Here is the direct way using the IN and OUT assembly instructions:
// OUT [xx], reg
4'b0100: begin
`ifdef DEBUG
$display("%2x: OUT [%4d], r%-d",ir[3:0], data_r, (ir[15:12]));
`endif
case (mc_count) 
0: begin
// get the xx
addr <= (pc + 2) >> 1;
pc <= pc + 2;
mc_count <= 1;
next_state <= EXECUTE;
state <= READ_DATA;
end
1: begin
mbr <= data_r;
mc_count <= 2;
end
2: begin
case (mbr)
...
PORT_SPI_OUT: begin
spi_out <= regs[ir[15:12]];
spi_start <= 1'b1;
end
...
default: begin
end
endcase  // end of case (data)
mc_count <= 3;
end
3: begin
tx_send <= 1'b0;
spi_start <= 1'b0;
spi_start1 <= 1'b0;
state <= CHECK_IRQ;
pc <= pc + 2;
end
default: begin
end
endcase
end // end of OUT [xx], reg

What happens above? The OUT instruction is written in memory using four bytes. First two bytes are OPCODE of the instruction, and the second two bytes hold the port number (limiting the total number of available ports to 65536, but I think it is enough). 

In the first cycle (step 0) of the OUT instruction, the CPU sets the address to be read to be next two bytes after those two OPCODE bytes. Then the CPU waits for those two bytes to arrive (step 1). 

Then the CPU checks which IO port has been read from the memory, and of the port number is PORT_SPI_OUT, it means that we are trying to send some byte to the SPI, and the CPU sends the data to that port (step 2). In step 3 the CPU finishes sending and sets the next CPU state to be the IRQ check.

And, here is the memory-mapped IO way:
// Memory mapped IO
case (addr & 32'h3FFFFFFF)
...
PORT_SPI_OUT/2: begin
spi_out <= data_to_write;
spi_start <= 1'b1;
end
...
endcase

Memory-mapped is a bit simpler, but does the same job of sending a byte to the SPI.

OK, now that we have the working SPI interface, how can we use it to work with the SD card? I have made a Frankenstein-like code merging the original Arduino SD card code (written in C++) with some other pieces of code from the github in a way that now I have some elementary support for the SD cards. For example:

uint8_t sdcard_init(){
  writeCRC_ = errorCode_ = inBlock_ = partialBlockRead_ = type_ = 0;
  // 16-bit init start time allows over a minute
  uint32_t t0 = (uint32_t)get_millis();
  uint32_t arg;
   // must supply min of 74 clock cycles with CS high.
  for (uint8_t i = 0; i < 10; i++) spiSend(0XFF);

  chipSelectLow();

  // command to go idle in SPI mode
  while ((status_ = cardCommand(CMD0, 0)) != R1_IDLE_STATE) {
    if (((uint32_t)get_millis() - t0) > SD_INIT_TIMEOUT) {
      error(SD_CARD_ERROR_CMD0);
      goto fail;
    }
  }
 
  // check SD version
  if ((cardCommand(CMD8, 0x1AA) & R1_ILLEGAL_COMMAND)) {
    type(SD_CARD_TYPE_SD1);
  } else {
    // only need last byte of r7 response
    for (uint8_t i = 0; i < 4; i++) status_ = spiRec();
    if (status_ != 0XAA) {
      error(SD_CARD_ERROR_CMD8);
      goto fail;
    }
    type(SD_CARD_TYPE_SD2);
  }
  ... }

In the code above, we see that there are some spi-related functions, like spiSend() or spiRec(). Here are those:

void spiSend(int b)
{
    received_from_slave = 0;
    unsigned short int busy;
    do 
    { 
        busy = *PORT_SPI_OUT_BUSY;
    } while (busy);
    *PORT_SPI_OUT = b; //send the byte to the SPI
    
    do 
    { 
        busy = *PORT_SPI_OUT_BUSY;
    } while (busy);
}

uint8_t spiRec(void) {
    send_spi(0xFF);
    return read_spi();
}
int read_spi()
{
    while (!received_from_slave || *PORT_SPI_OUT_BUSY) 
    {
    }
    return received_byte;
}

Now, when we look at the spi_irq_triggered() function, we see that whenever that interrupt routine is triggered by the incoming byte from the SPI, that byte is stored in the received_byte variable. That byte is returned from the read_spi() function to the spiRec() function, and from that to the caller function.

OK, what next? How is this used? All of the interaction with the SD card is done by sending card commands and reading and writing 512 bytes of data, in so-called blocks:
uint8_t cardCommand(uint8_t cmduint32_t arg) {
  // end read if in partialBlockRead mode
  readEnd();

  // select card
  chipSelectLow();

  // wait up to 300 ms if busy
  waitNotBusy(300);

  // send command
  spiSend(cmd | 0x40);

  // send argument
  for (int8_t s = 24; s >= 0; s -= 8spiSend(arg >> s);

  // send CRC
  uint8_t crc = 0XFF;
  if (cmd == CMD0) crc = 0X95;  // correct crc for CMD0 with arg 0
  if (cmd == CMD8) crc = 0X87;  // correct crc for CMD8 with arg 0X1AA
  spiSend(crc);

  // wait for response
  for (uint8_t i = 0; ((status_ = spiRec()) & 0X80) && i != 0XFF; i++);
  return status_;
}

uint8_t readData(uint32_t block,
        uint16_t offsetuint16_t countuint8_tdst) {
  uint16_t n;
  if (count == 0return true;
  if ((count + offset) > 512) {
    goto fail;
  }

  #ifdef FAT_DEBUG
  printf("block: %d, offset: %d, count: %d\n", block, offset, count);
  #endif

  if (!inBlock_ || block != block_ || offset < offset_) {
    block_ = block;
    // use address if not SDHC card
    if (get_type()!= SD_CARD_TYPE_SDHC) block <<= 9;
    if (cardCommand(CMD17, block)) {
      error(SD_CARD_ERROR_CMD17);
      goto fail;
    }
    if (!waitStartBlock()) {
      goto fail;
    }
    offset_ = 0;
    inBlock_ = 1;
  }

  // skip data before offset
  for (;offset_ < offset; offset_++) {
    spiRec();
  }
  // transfer data
  for (uint16_t i = 0; i < count; i++) {
    dst[i] = spiRec();
//    printf("%x ", dst[i]);
  }

  offset_ += count;
  if (!partialBlockRead_ || offset_ >= 512) {
    // read rest of data, checksum and set chip select high
    readEnd();
  }
  return true;

 fail:
  chipSelectHigh();
  #if FAT_DEBUG
  printf("read data error code: %d\n", errorCode_);
  #endif
  return false;
}

uint8_t writeData(uint8_t tokenconst uint8_tsrc) {
  spiSend(token);
  for (uint16_t i = 0; i < 512; i++) {
    spiSend(src[i]);
  }
  spiSend(0xff);  // dummy crc
  spiSend(0xff);  // dummy crc

  status_ = spiRec();
  if ((status_ & DATA_RES_MASK) != DATA_RES_ACCEPTED) {
    error(SD_CARD_ERROR_WRITE);
    chipSelectHigh();
    return false;
  }
  return true;
}

uint8_t writeBlock(uint32_t blockNumberconst uint8_tsrcuint8_t blocking) {
  #if FAT_DEBUG
  printf("Write block number: %d\n", blockNumber);
  #endif
//  return true;
  // don't allow write to first block
  if (blockNumber == 0) {
    error(SD_CARD_ERROR_WRITE_BLOCK_ZERO);
    goto fail;
  }

  // use address if not SDHC card
  if (get_type() != SD_CARD_TYPE_SDHC) {
    blockNumber <<= 9;
  }
  if (cardCommand(CMD24, blockNumber)) {
    error(SD_CARD_ERROR_CMD24);
    goto fail;
  }
  if (!writeData(DATA_START_BLOCK, src)) {
    goto fail;
  }
  if (blocking) {
    // wait for flash programming to complete
    if (!waitNotBusy(SD_WRITE_TIMEOUT)) {
      error(SD_CARD_ERROR_WRITE_TIMEOUT);
      goto fail;
    }
    // response is r2 so get and check two bytes for nonzero
    if (cardCommand(CMD13, 0) || spiRec()) {
      error(SD_CARD_ERROR_WRITE_PROGRAMMING);
      goto fail;
    }
  }
  chipSelectHigh();
  return true;

fail:
  chipSelectHigh();
  return false;
}

Now that we are able to read and write 512-sized blocks, we need to figure out how the data is organized on SD cards. Well, the format is FAT32. That is an ancient format from Microsoft, but it is quite simple and is used everywhere.

The format can be found on Wikipedia and on this excellend blog post: https://codeandlife.com/2012/04/02/simple-fat-and-sd-tutorial-part-1/

So, if we want, for example, to list all files in the root folder, here is the code:
file_descriptor_t fd;
int next = 0;
while ((next = getDirEntry(&fd, next)) != 0)
{
    printf("%s %d bytes, cluster: %d (%d)\n"fd.dir_entry.filenamefd.dir_entry.filesizefd.curr_clusterfd.dir_entry.first_cluster);
}

The key code is in the getDirEntry() function:
uint32_t getDirEntry(file_descriptor_tfduint32_t index)
{
  int i,j;
  uint16_t cluster;
  uint32_t file_size;
  uint8_t b;
  uint8_t *buf = g_block_buf;
  char filename_upper[12];
  uint32_t counter = 0;

  for (i = 0; i < (dataStartBlock_ - rootDirStart_); i++)
  {
    b = readBlock(rootDirStart_ + i, g_block_buf);
    for(j = 0; j < 16; j++)
    {
      if (*(buf + j*32)==0 || *(buf + j*32)==0x2e || *(buf + j*32)==0xe5 || *(buf + j*32 + 0x0b) == 0xf)
      { 
        continue// free, or deleted file/folder, or phantom entry for long names?
        if (counter > index)
          return 0;
      }
      
      if(counter == index)
      {
        file_size = *(buf + j*32 + 0x1c);
        file_size += *(buf + j*32 + 0x1c + 1)<<8;
        file_size += *(buf + j*32 + 0x1c + 2)<<16;
        file_size += *(buf + j*32 + 0x1c + 3)<<24;
        cluster = *(buf + j*32 + 0x1a);
        cluster += *(buf + j*32 + 0x1a + 1) << 8;
        cluster += *(buf + j*32 + 0x14 + 0) << 16;
        cluster += *(buf + j*32 + 0x14 + 1) << 24;

        strncpy(filename_upper, (char*)(buf+j*32), 11);
        filename_upper[11] = '\0';

        // fill in dir_entry
        memmove(fd->dir_entry.filename, filename_upper, 12);
        fd->dir_entry.attributes = *(buf + j*32 + 0x0b);
        memmove(fd->dir_entry.unused_attr, buf + j*32 + 0x0c14);
        fd->dir_entry.filesize = file_size;
        fd->dir_entry.block = rootDirStart_ + i;
        fd->dir_entry.slot = j;
        fd->dir_entry.first_cluster = cluster;
        fd->curr_cluster = cluster;
        return counter + 1;
      } else if (counter > index) {
        return 0;
      }
      counter++;
    }
  }
  return 0;
}

The code above loads chunks of 512 bytes from the root directory start block, and then tries to iterate through the directory structure until it finds the right entry, given by its index. The directory structure is this:
typedef struct
{
  char filename[12];  /** The file's name and extension, total 11 chars padded with spaces. */
  uint8_t attributes;  /** The file's attributes. Mask of the FAT_ATTRIB_* constants. */
  uint8_t unused_attr[14]; /** Attributes in directory which are unused or unsupported */
  uint16_t first_cluster;     /** The cluster in which the file's first byte resides. */
  uint32_t filesize;   /** The file's size. */
  uint32_t block; /** The number of a block from the rootDirStart_ where this entry resides. */
  uint32_t slot; /** The number of the slot in the block where this entry resids. Each slot is 32 bytes large. */
dir_entry_t;


Since my FPGA computer is big endian, I couldn't just read bytes for file size and cluster address. Instead, I had to compute those numbers byte-by-byte.

Conclusion

Initial implementation of the SPI was simple enough. It is what you can do with it what matters. I was able to use the SPI to integrate SD card into my FPGA computer. That way, I don't need the Arduino/ESPP32 anymore to do the role of SD card reader, as I used to have.