This is a follow-up of my original FPGA computer post.
SPI interface is a kind of a standard when it comes to connecting various peripherals to a computer (or, at least to a microcontroller). There is also I2C interface, but I will focus on the SPI in this post.
SPI stands for Serial Peripheral Interface. It is organized as a master-slave communication. If we presume that our FPGA computer is master, then the peripheral will be slave.
It usually has four important pins:
1. MISO (Master In Slave Out) - a wire which is used to transport data from slave to the master device,
2. MOSI (Master Out Slave In) - a wire which is used to transport data from master to the slave device,
3. SCL - clock (all the data transport is synchronized using this clock line), and
4. SS (Slave Select) - when active, the slave is selected (sometimes it is called CS - chip select). With this wire, it is possible to connect several peripherals to the same three mentioned wires (MISO, MOSI and SCK) and to have separate SS wires to each peripheral.
Why did I choose to use the SPI on my computer? First of all, SD cards have SPI built-in. This means that every SD card is actually a SPI slave device. Next, I use the ENC28J60 Ethernet module for my Arduino/ESP32/RaspberryPi Zero devices for the Ethernet connectivity. That module has SPI interface, too.
How did I integrate SPI into my FPGA computer. I have found a very nice implementation in Verilog here:
https://github.com/nandland/spi-master
BTW, that guy has excellent YouTube channel here:
https://www.youtube.com/channel/UCsdA-aNqtMA1_2T15aXePWw
Next I had to integrate that module into my FPGA computer. I have decided to allocate an interrupt for the incoming data from the SPI and to ignore the module-controlled SS pin (I will manually activate SS signal from code, instead of letting that job to the SPI module):
// ####################################
// SPI Master instance
// ####################################
wire spi_start;
wire [7:0] spi_in;
reg [7:0] spi_out;
wire spi_ready;
wire spi_received;
reg [7:0] spi_in_r;
reg fake_CS;
SPI_Master_With_Single_CS spi0 (
.i_Clk(clk100),
.i_Rst_L(KEY[0]),
.i_TX_Count(1),
.i_TX_DV(spi_start),
.o_RX_Byte(spi_in),
.i_TX_Byte(spi_out),
.o_RX_DV(spi_received),
.o_TX_Ready(spi_ready),
.o_SPI_MOSI(gpio0[32]),
.i_SPI_MISO(gpio0[30]),
.o_SPI_Clk(gpio0[28]),
.o_SPI_CS_n(fake_CS)
);
The code above creates a SPI module named spi0 and connects it to a set of wires and registers. Next, in the main interrupt part, when the spi_received wire goes high (a byte has arrived on SPI), the IRQ_SPI interrupt is triggered:
// ##################### IRQ3 - SPI Master #####################
if (spi_received) begin
spi_in_r <= spi_in;
// if we have received a byte from the MISO,
// we will trigger the IRQ#3
irq[IRQ_SPI] <= 1'b1;
end
else
begin
irq[IRQ_SPI] <= 1'b0;
end
In the CPU module, the IRQ_SPI interrupt causes processor to go to the predefined interrupt handler routine at the address of 56:
else if (irq_r[IRQ_SPI]) begin
// SPI byte received
pc <= 16'd56;
addr <= 16'd28;
irq_r[IRQ_SPI] <= 0;
end
All you have to do is to put some code at the address of 56 and to return from the interrupt handler routine using the IRET assembly instruction:
spi_irq_triggered:
push r0
ld.w r0, [PORT_SPI_IN] # PORT_SPI_IN.5_1, PORT_SPI_IN
ld.s r0, [r0] # _2, *PORT_SPI_IN.5_1
zex.s r0, r0 # _3, _2
st.w [received_byte], r0 # received_byte, _3 mov.w r0, 1 # tmp29,
st.w [received_from_slave], r0 # received_from_slave, tmp29
pop r0
iret
Now that I have the C compiler, the SPI interrupt handler routine can be written in C:
void init_spi()
{
*SPI_HANDLER_INSTR = 1;
*SPI_HANDLER_ADDR = (int)&spi_irq_triggered;
}
void spi_irq_triggered()
{
received_byte = *PORT_SPI_IN;
received_from_slave = 1;
asm
(
"mov.w sp,r13\npop r13\niret"
);
}
In order to read the received byte, and to send some byte to the SPI, we need to implement some IO operations. As usual, I have done that in both direct and memory-mapped way. Here is the direct way using the IN and OUT assembly instructions:
// OUT [xx], reg
4'b0100: begin
`ifdef DEBUG
$display("%2x: OUT [%4d], r%-d",ir[3:0], data_r, (ir[15:12]));
`endif
case (mc_count)
0: begin
// get the xx
addr <= (pc + 2) >> 1;
pc <= pc + 2;
mc_count <= 1;
next_state <= EXECUTE;
state <= READ_DATA;
end
1: begin
mbr <= data_r;
mc_count <= 2;
end
2: begin
case (mbr)
...
PORT_SPI_OUT: begin
spi_out <= regs[ir[15:12]];
spi_start <= 1'b1;
end
...
default: begin
end
endcase // end of case (data)
mc_count <= 3;
end
3: begin
tx_send <= 1'b0;
spi_start <= 1'b0;
spi_start1 <= 1'b0;
state <= CHECK_IRQ;
pc <= pc + 2;
end
default: begin
end
endcase
end // end of OUT [xx], reg
What happens above? The OUT instruction is written in memory using four bytes. First two bytes are OPCODE of the instruction, and the second two bytes hold the port number (limiting the total number of available ports to 65536, but I think it is enough).
In the first cycle (step 0) of the OUT instruction, the CPU sets the address to be read to be next two bytes after those two OPCODE bytes. Then the CPU waits for those two bytes to arrive (step 1).
Then the CPU checks which IO port has been read from the memory, and of the port number is PORT_SPI_OUT, it means that we are trying to send some byte to the SPI, and the CPU sends the data to that port (step 2). In step 3 the CPU finishes sending and sets the next CPU state to be the IRQ check.
And, here is the memory-mapped IO way:
// Memory mapped IO
case (addr & 32'h3FFFFFFF)
...
PORT_SPI_OUT/2: begin
spi_out <= data_to_write;
spi_start <= 1'b1;
end
...
endcase
Memory-mapped is a bit simpler, but does the same job of sending a byte to the SPI.
OK, now that we have the working SPI interface, how can we use it to work with the SD card? I have made a Frankenstein-like code merging the original Arduino SD card code (written in C++) with some other pieces of code from the github in a way that now I have some elementary support for the SD cards. For example:
uint8_t sdcard_init(){
writeCRC_ = errorCode_ = inBlock_ = partialBlockRead_ = type_ = 0;
// 16-bit init start time allows over a minute
uint32_t t0 = (uint32_t)get_millis();
uint32_t arg;
// must supply min of 74 clock cycles with CS high.
for (uint8_t i = 0; i < 10; i++) spiSend(0XFF);
chipSelectLow();
// command to go idle in SPI mode
while ((status_ = cardCommand(CMD0, 0)) != R1_IDLE_STATE) {
if (((uint32_t)get_millis() - t0) > SD_INIT_TIMEOUT) {
error(SD_CARD_ERROR_CMD0);
goto fail;
}
}
// check SD version
if ((cardCommand(CMD8, 0x1AA) & R1_ILLEGAL_COMMAND)) {
type(SD_CARD_TYPE_SD1);
} else {
// only need last byte of r7 response
for (uint8_t i = 0; i < 4; i++) status_ = spiRec();
if (status_ != 0XAA) {
error(SD_CARD_ERROR_CMD8);
goto fail;
}
type(SD_CARD_TYPE_SD2);
}
...
}
In the code above, we see that there are some spi-related functions, like spiSend() or spiRec(). Here are those:
void spiSend(int b)
{
received_from_slave = 0;
unsigned short int busy;
do
{
busy = *PORT_SPI_OUT_BUSY;
} while (busy);
*PORT_SPI_OUT = b; //send the byte to the SPI
do
{
busy = *PORT_SPI_OUT_BUSY;
} while (busy);
}
uint8_t spiRec(void) {
send_spi(0xFF);
return read_spi();
}
int read_spi()
{
while (!received_from_slave || *PORT_SPI_OUT_BUSY)
{
}
return received_byte;
}
Now, when we look at the spi_irq_triggered() function, we see that whenever that interrupt routine is triggered by the incoming byte from the SPI, that byte is stored in the received_byte variable. That byte is returned from the read_spi() function to the spiRec() function, and from that to the caller function.
OK, what next? How is this used? All of the interaction with the SD card is done by sending card commands and reading and writing 512 bytes of data, in so-called blocks:
uint8_t cardCommand(uint8_t cmd, uint32_t arg) {
// end read if in partialBlockRead mode
readEnd();
// select card
chipSelectLow();
// wait up to 300 ms if busy
waitNotBusy(300);
// send command
spiSend(cmd | 0x40);
// send argument
for (int8_t s = 24; s >= 0; s -= 8) spiSend(arg >> s);
// send CRC
uint8_t crc = 0XFF;
if (cmd == CMD0) crc = 0X95; // correct crc for CMD0 with arg 0
if (cmd == CMD8) crc = 0X87; // correct crc for CMD8 with arg 0X1AA
spiSend(crc);
// wait for response
for (uint8_t i = 0; ((status_ = spiRec()) & 0X80) && i != 0XFF; i++);
return status_;
}
uint8_t readData(uint32_t block,
uint16_t offset, uint16_t count, uint8_t* dst) {
uint16_t n;
if (count == 0) return true;
if ((count + offset) > 512) {
goto fail;
}
#ifdef FAT_DEBUG
printf("block: %d, offset: %d, count: %d\n", block, offset, count);
#endif
if (!inBlock_ || block != block_ || offset < offset_) {
block_ = block;
// use address if not SDHC card
if (get_type()!= SD_CARD_TYPE_SDHC) block <<= 9;
if (cardCommand(CMD17, block)) {
error(SD_CARD_ERROR_CMD17);
goto fail;
}
if (!waitStartBlock()) {
goto fail;
}
offset_ = 0;
inBlock_ = 1;
}
// skip data before offset
for (;offset_ < offset; offset_++) {
spiRec();
}
// transfer data
for (uint16_t i = 0; i < count; i++) {
dst[i] = spiRec();
// printf("%x ", dst[i]);
}
offset_ += count;
if (!partialBlockRead_ || offset_ >= 512) {
// read rest of data, checksum and set chip select high
readEnd();
}
return true;
fail:
chipSelectHigh();
#if FAT_DEBUG
printf("read data error code: %d\n", errorCode_);
#endif
return false;
}
uint8_t writeData(uint8_t token, const uint8_t* src) {
spiSend(token);
for (uint16_t i = 0; i < 512; i++) {
spiSend(src[i]);
}
spiSend(0xff); // dummy crc
spiSend(0xff); // dummy crc
status_ = spiRec();
if ((status_ & DATA_RES_MASK) != DATA_RES_ACCEPTED) {
error(SD_CARD_ERROR_WRITE);
chipSelectHigh();
return false;
}
return true;
}
uint8_t writeBlock(uint32_t blockNumber, const uint8_t* src, uint8_t blocking) {
#if FAT_DEBUG
printf("Write block number: %d\n", blockNumber);
#endif
// return true;
// don't allow write to first block
if (blockNumber == 0) {
error(SD_CARD_ERROR_WRITE_BLOCK_ZERO);
goto fail;
}
// use address if not SDHC card
if (get_type() != SD_CARD_TYPE_SDHC) {
blockNumber <<= 9;
}
if (cardCommand(CMD24, blockNumber)) {
error(SD_CARD_ERROR_CMD24);
goto fail;
}
if (!writeData(DATA_START_BLOCK, src)) {
goto fail;
}
if (blocking) {
// wait for flash programming to complete
if (!waitNotBusy(SD_WRITE_TIMEOUT)) {
error(SD_CARD_ERROR_WRITE_TIMEOUT);
goto fail;
}
// response is r2 so get and check two bytes for nonzero
if (cardCommand(CMD13, 0) || spiRec()) {
error(SD_CARD_ERROR_WRITE_PROGRAMMING);
goto fail;
}
}
chipSelectHigh();
return true;
fail:
chipSelectHigh();
return false;
}
Now that we are able to read and write 512-sized blocks, we need to figure out how the data is organized on SD cards. Well, the format is FAT32. That is an ancient format from Microsoft, but it is quite simple and is used everywhere.
So, if we want, for example, to list all files in the root folder, here is the code:
file_descriptor_t fd;
int next = 0;
while ((next = getDirEntry(&fd, next)) != 0)
{
printf("%s %d bytes, cluster: %d (%d)\n", fd.dir_entry.filename, fd.dir_entry.filesize, fd.curr_cluster, fd.dir_entry.first_cluster);
}
The key code is in the getDirEntry() function:
uint32_t getDirEntry(file_descriptor_t* fd, uint32_t index)
{
int i,j;
uint16_t cluster;
uint32_t file_size;
uint8_t b;
uint8_t *buf = g_block_buf;
char filename_upper[12];
uint32_t counter = 0;
for (i = 0; i < (dataStartBlock_ - rootDirStart_); i++)
{
b = readBlock(rootDirStart_ + i, g_block_buf);
for(j = 0; j < 16; j++)
{
if (*(buf + j*32)==0 || *(buf + j*32)==0x2e || *(buf + j*32)==0xe5 || *(buf + j*32 + 0x0b) == 0xf)
{
continue; // free, or deleted file/folder, or phantom entry for long names?
if (counter > index)
return 0;
}
if(counter == index)
{
file_size = *(buf + j*32 + 0x1c);
file_size += *(buf + j*32 + 0x1c + 1)<<8;
file_size += *(buf + j*32 + 0x1c + 2)<<16;
file_size += *(buf + j*32 + 0x1c + 3)<<24;
cluster = *(buf + j*32 + 0x1a);
cluster += *(buf + j*32 + 0x1a + 1) << 8;
cluster += *(buf + j*32 + 0x14 + 0) << 16;
cluster += *(buf + j*32 + 0x14 + 1) << 24;
strncpy(filename_upper, (char*)(buf+j*32), 11);
filename_upper[11] = '\0';
// fill in dir_entry
memmove(fd->dir_entry.filename, filename_upper, 12);
fd->dir_entry.attributes = *(buf + j*32 + 0x0b);
memmove(fd->dir_entry.unused_attr, buf + j*32 + 0x0c, 14);
fd->dir_entry.filesize = file_size;
fd->dir_entry.block = rootDirStart_ + i;
fd->dir_entry.slot = j;
fd->dir_entry.first_cluster = cluster;
fd->curr_cluster = cluster;
return counter + 1;
} else if (counter > index) {
return 0;
}
counter++;
}
}
return 0;
}
The code above loads chunks of 512 bytes from the root directory start block, and then tries to iterate through the directory structure until it finds the right entry, given by its index. The directory structure is this:
typedef struct
{
char filename[12]; /** The file's name and extension, total 11 chars padded with spaces. */
uint8_t attributes; /** The file's attributes. Mask of the FAT_ATTRIB_* constants. */
uint8_t unused_attr[14]; /** Attributes in directory which are unused or unsupported */
uint16_t first_cluster; /** The cluster in which the file's first byte resides. */
uint32_t filesize; /** The file's size. */
uint32_t block; /** The number of a block from the rootDirStart_ where this entry resides. */
uint32_t slot; /** The number of the slot in the block where this entry resids. Each slot is 32 bytes large. */
} dir_entry_t;
Since my FPGA computer is big endian, I couldn't just read bytes for file size and cluster address. Instead, I had to compute those numbers byte-by-byte.
Conclusion
Initial implementation of the SPI was simple enough. It is what you can do with it what matters. I was able to use the SPI to integrate SD card into my FPGA computer. That way, I don't need the Arduino/ESPP32 anymore to do the role of SD card reader, as
I used to have.