Take the 2-minute tour ×
Electrical Engineering Stack Exchange is a question and answer site for electronics and electrical engineering professionals, students, and enthusiasts. It's 100% free, no registration required.

Sorry if the following looks like a very specialized (or programming) question, but I'm hoping there are people on this forum who have done VHDL/Verilog modeling, and might be able to answer:

I'm writing a simulation model of a multi-processor cache system. My processor model is a 32-bit Sparc V8 processor. I was trying to understand how the processor- L1 data cache interface looks like. I have the following doubts:

  1. How wide is the processor-L1 interface? If it is 32 bits wide, then how are doubleword accesses handled atomically? Example: if the DoubleWord instruction is split into two word-accesses, can the block in the cache get invalidated between the first and the second word access? Doesn't it mean the instruction is not atomic? Is the load/store doubleword instruction required to be atomic?

  2. How are atomic load/store or swap instructions implemented on this interface? Is there a signal going from the processor to cache that says "stall all other operations until I say so", and then execute a load followed by store?

I'd be thankful for any links pointing in this direction

share|improve this question
    
The L1 data cache is a Write-Back cache, if that matters. –  Neha Karanjkar Mar 19 '13 at 7:23
add comment

1 Answer

up vote 2 down vote accepted

(I do not know any HDL, but I hope the following will be helpful anyway.)

One can use a 32-bit wide interface and implement atomic 64-bit loads/stores. For loads one can "cheat" by reading from the invalidated cache entry (only checking the tags on the first 32-bit load), since one knows that the two 32-bit accesses will be back-to-back and within the same cache block that is known to be a hit.

For stores, since the cache block must be in modified (or exclusive if silent updates are allowed) state to accept a store, an invalidate request (really read-for-ownership) generates a data response. Since a data response is provided and the total time of the write would typically only be two processor cycles, the data response could be delayed until the store has completed.

LDSTUB (load-and-store-unsigned-byte) and SWAP could be handled somewhat similarly to a 64-bit store by delaying the load until the cache block is in exclusive/modified state; the store part of the operation is known to be immediately after the read portion and a data response is required anyway, so the data response can be delayed slightly.

An alternative implementation of LDSTUB and SWAP could treat an invalidation between the load and the store as a miss for the load, effectively reissuing the load. However, this presents a danger of livelock. While livelock issues can be managed (e.g., various back-off techniques), the earlier mentioned implementation is probably much simpler.

share|improve this answer
    
That helps. Thanks. –  Neha Karanjkar Mar 19 '13 at 15:08
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.