Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

What would be the most efficient way of parsing a css selector input string, that features any combination of:

  • [key=value] : attributes, 0 to * instances
  • #id : ids, 0 to 1 instances
  • .class : classes, 0 to * instances
  • tagName : tag names, 0 to 1 instances (found at start of string only)

(note: '*', or other applicable combinator could be used in lieu of tag?)

Such as:

div.someClass#id[key=value][key2=value2].anotherClass

Into the following output:

['div','.someClass','#id','[key=value]','[key2=value2]','.anotherClass']

Or for bonus points, into this form efficiently (read: a way not just based on using str[0] === '#' for example):

{
 tags : ['div'],
 classes : ['someClass','anotherClass'],
 ids : ['id'],
 attrs : 
   {
     key : value,
     key2 : value2
   }
}

(note removal of # . [ = ])

I imagine some combination of regex and .match(..) is the way to go, but my regex knowledge is nowhere near advanced enough for this situation.

Many thanks for your help.

share|improve this question
3  
regex is rarely the right solution for complex languages parsing. You should have a look at the many libraries doing this (like sizzle) –  dystroy Jul 26 '13 at 18:04
    
I know sizzle does it, but I'm looking to implement my own simple solution. The domain is not as complex as a language, there is no whitespace etc, and a limited format for delimiters (as listed in the question) –  ComethTheNerd Jul 26 '13 at 18:05
    
I was suggering to look at the source, not using it. If you want to parse css selectors, you should take whitespaces into account. –  dystroy Jul 26 '13 at 18:06
    
OK I will consult the source, but I'm talking about tokens already split by whitespace. This question is about the next step after splitting the tokens delimited by whitespace –  ComethTheNerd Jul 26 '13 at 18:07
    
@dystroy I think this is about parsing the selector "sub-syntax" for a single element match; I'm not sure what that's called. Also SCRIPTONITE note that it's not just splitting on whitespace - whitespace is an operator in the CSS selector syntax, comparable to the + and ~ connectors. –  Pointy Jul 26 '13 at 18:07

1 Answer 1

You might do the splitting using

var tokens = subselector.split(/(?=\.)|(?=#)|(?=\[)/)

which changes

div.someClass#id[key=value][key2=value2].anotherClass

to

["div", ".someClass", "#id", "[key=value]", "[key2=value2]", ".anotherClass"]

and after that you simply have to look how starts each token (and, in case of tokens starting with [, checking if they contain a =).

Here's the whole working code building exactly the object you describe :

function parse(subselector) {
  var obj = {tags:[], classes:[], ids:[], attrs:[]};
  subselector.split(/(?=\.)|(?=#)|(?=\[)/).forEach(function(token){
    switch (token[0]) {
      case '#':
         obj.ids.push(token.slice(1));
        break;
      case '.':
         obj.classes.push(token.slice(1));
        break;
      case '[':
         obj.attrs.push(token.slice(1,-1).split('='));
        break;
      default :
         obj.tags.push(token);
        break;
    }
  });
  return obj;
}

demonstration

share|improve this answer
1  
[key="val#ue"] –  Gumbo Jul 26 '13 at 18:13
    
This is a great start, though I do agree with @Gumbos point. Is there a way to make the attribute search 'greedier' than the other searches to avoid this problem? –  ComethTheNerd Jul 26 '13 at 18:15
1  
@Gumbo I answered the written question, not another question about any kind of CSS selector because trying to do it in a few lines of javascript would be doomed. –  dystroy Jul 26 '13 at 18:27

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.