Processor - L1 Data cache interface

Question

Sorry if the following looks like a very specialized (or programming) question, but I'm hoping there are people on this forum who have done VHDL/Verilog modeling, and might be able to answer:

I'm writing a simulation model of a multi-processor cache system. My processor model is a 32-bit Sparc V8 processor. I was trying to understand how the processor- L1 data cache interface looks like. I have the following doubts:

How wide is the processor-L1 interface? If it is 32 bits wide, then how are doubleword accesses handled atomically? Example: if the DoubleWord instruction is split into two word-accesses, can the block in the cache get invalidated between the first and the second word access? Doesn't it mean the instruction is not atomic? Is the load/store doubleword instruction required to be atomic?
How are atomic load/store or swap instructions implemented on this interface? Is there a signal going from the processor to cache that says "stall all other operations until I say so", and then execute a load followed by store?

I'd be thankful for any links pointing in this direction

The L1 data cache is a Write-Back cache, if that matters. – Neha Karanjkar Mar 19 '13 at 7:23

Paul A. Clayton · Accepted Answer · 2013-03-19 18:13:07Z

(I do not know any HDL, but I hope the following will be helpful anyway.)

One can use a 32-bit wide interface and implement atomic 64-bit loads/stores. For loads one can "cheat" by reading from the invalidated cache entry (only checking the tags on the first 32-bit load), since one knows that the two 32-bit accesses will be back-to-back and within the same cache block that is known to be a hit.

For stores, since the cache block must be in modified (or exclusive if silent updates are allowed) state to accept a store, an invalidate request (really read-for-ownership) generates a data response. Since a data response is provided and the total time of the write would typically only be two processor cycles, the data response could be delayed until the store has completed.

LDSTUB (load-and-store-unsigned-byte) and SWAP could be handled somewhat similarly to a 64-bit store by delaying the load until the cache block is in exclusive/modified state; the store part of the operation is known to be immediately after the read portion and a data response is required anyway, so the data response can be delayed slightly.

An alternative implementation of LDSTUB and SWAP could treat an invalidation between the load and the store as a miss for the load, effectively reissuing the load. However, this presents a danger of livelock. While livelock issues can be managed (e.g., various back-off techniques), the earlier mentioned implementation is probably much simpler.

asked	1 year ago
viewed	125 times
active	1 year ago

current community

your communities

more stack exchange communities

Processor - L1 Data cache interface

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged interface computer-architecture hdl cache processor or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Processor - L1 Data cache interface

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged interface computer-architecture hdl cache processor or ask your own question.

Related

Hot Network Questions