Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

With regard to overhead, memory consumption, resource usage and ease of code processing, which is the preferred method for parsing a large XML file?

  1. simpleXml_load_file
  2. simplexml_load_string
  3. Converting the simpleXML object to an array
  4. Converting simpleXML to cache

I am using simpleXML to parse a very large XML document that will return relevant search results requested by users.

$XMLproducts = simplexml_load_file("products.xml");

Along with ultimately producing requested search results, the simpleXML request will also produce links to further refine the obtained search results …

foreach($XMLproducts->product as $Product) {
if ($user_input_values == $applicable_xml_values) {
// all refined search filter links produced here, then displayed later
$refined_search_filter_Array1[] = URL code + (string)$Product->applicable_variable;
$refined_search_filter_Array2[] = URL code + (string)$Product->applicable_variable2;
}

… as well as help produce search results pages (because there will be 20 search results per page).

foreach($XMLproducts->product as $Product) {
//coding to produce pages number links for the search results pages number
}

Then we ultimately get to the actual search results requested by the user:

foreach($XMLproducts->product as $Product) {
if ($user_input_values == $applicable_xml_values) {
echo $Product->name ……    
}}

Since the user can click on a number of refined search filter links as well as page number links to go to the next search results page, is it correct that it would be more constructive to turn the initial simpleXML request into an array or into cache until the user finishes using the search results? This way, when the user clicks on a refined search filter link or clicks on a link to go to the next search results page, s/he would be accessing the array or cache to do so, instead of loading the entire XML file (with another simpleXML request) to do so.

Thanks for any advice.

share|improve this question
    
What exactly you want to cache? –  Toly Apr 13 at 16:23
    
Also, how large is XML file? How many users simultaneously requesting that file? –  Toly Apr 13 at 16:25
    
I want to cache the values from the result retrieved from the simpleXML request. The XML files will be about 6MB with the potential to grow. I believe the only one user can make the API call every 2 seconds. –  Dean Olsen Apr 13 at 16:32

2 Answers 2

up vote 0 down vote accepted

None of the four.

They're all variations of the same approach: smart processor that creates a complex data structure in memory with the complete data set. And you don't even try different smart processors, only simpleXML. The only XML library in PHP that scales is XMReader, assuming that you use it to write code that reads data sequentially, grabs what it needs and discard the rest. But of course it all comes at a price: better performance for more coding work.

share|improve this answer
    
Yeah! I'm finding out that I am going to have to learn how to use XMLReader or SAX. Probably going to use XMLReader to retrieve the XML, then use simpleXML to display it. Thanks! –  Dean Olsen Apr 15 at 0:02

Let's assume that the big XML file doesn't change often.

I will then suggest that you create chunks of the big XML file and store them separately. And every time the big XML file is updated - repeat the procedure of splitting it into parts.

From the big file - leave only the structure, so that you can still browse it. When user leaves the main file and goes to some branch - load corresponding smaller part..

<Products>
    <Clothes> - into separate XML file
    <Cars> - into separate XML file
    <Computers> - into separate XML file

That way, every request you do - you wont have to load the big file, saving on the memory.

However, if every action of the user can change the file - you have to use a database, because otherwise you can't guarantee the validity of the data - every 2 seconds a new request comes in, and you can't make sure that it will work with the newest record.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.