Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...

<datarequest>
  <datatask view="vw_ContractData" db="reporting" index="1">
    <datefilter modifier="w0">
      <filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
    </datefilter>
    <filters>
      <alternation index="1">
        <filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
      </alternation>
      <alternation index="2">
        <filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
      </alternation>
      </filters>
    <series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
      <filters />
    </series>
    <series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
      <filters>
        <alternation index="1">
          <filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
        </alternation>          
      </filters>
    </series>
    <series column="Requested 2" aggregate="avg" split="0"  splitfield="" index="3">
      <filters />
    </series>
    <series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
      <filters />
    </series>
  </datatask>
</datarequest>

This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.

But the structure of this is kind of academic, the problem is more fundamental:

When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.

The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...

  • Shredding the XML into a de-normalised table
  • Using a CTE to pull all the existing data in the database into that same de-normalised form
  • Matching using a huge WHERE condition (34 lines long)

..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.

I have thought there must be better solutions to this eg

  • When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
  • Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.

Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)

share|improve this question
1  
I managed the issue similar to this using XML normalization (trimming and sorting nodes) and hashes. It worked pretty well and fast. So it was parsing XML first, applying a normalization and then calculate hash in general way. – Viktor Stolbin Oct 1 '12 at 13:39
@ViktorStolbin Thanks for your comment. So there is hope for me! I won't need to sort nodes as order is important - although, attribute order is not. Did you sort attribute order? – El Ronnoco Oct 1 '12 at 13:46
I was sorting everything due XML was a straightforward representation of DB query and I relied on the DB query optimization there. – Viktor Stolbin Oct 1 '12 at 13:50
One thing. Possibly hash implementation should be sensitive to the elements order. – Viktor Stolbin Oct 1 '12 at 13:51
Apparently SQLServer XML type cannot be trusted to preserve attribute order. I may need to sort my XML at application-side before passing as VARCHAR to stored-procedure. – El Ronnoco Oct 1 '12 at 13:59

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.