0

type="checkbox" name="prdCdList" value="102001174" class="bnone" newfl="Y" cpnfl="N" catcpnfl="N" eventfl="N" catcd1="102000" catcd2="102001" prdimgl="/upload/product/320_1405497216907.jpg" prdnm="Dear my volume" prdvol="3.4g" prdlndesc="Limited Pink" selprc="10000" spsalprc="0" cpnprc="0" cashptrat="0" cashpt="0" discpt="0" salstatcdnm="Available" salstatcd="PS01" prdwidth="0" prdheight="0" prddepth="0" pricestr="" price="10000" prepromote="" endpromote=""


I am currently using bunch of regexes to parse above data into a structured array or hash. Actual tag includes much more values. Thought there must be a better way in Ruby like using split or something? There are spaces between attributes but also within certain values so..

Can any one suggest a good way to handle this type of string?

I would like the result be:

hash = { 
 type => "checkbox",
 name => "prdCdList",
... so on.
}

or

arr = [
 "checkbox",
 "prdCdList",
... so on.
]

Would appreciate any advice =]

Thanks,

2
  • have you tried nokogiri.org?
    – Uri Agassi
    Commented Jul 21, 2014 at 14:08
  • @UriAgassi Yep am using it. If you can let me know how I would be able to parse just the values with Nokogiri that would be great! They are inside one tag, so if I select that tag with xpath or CSS, it would return the whole tag. I am currently doing that + regex to get the inside values..
    – Rok
    Commented Jul 21, 2014 at 14:10

2 Answers 2

1
node.attributes.each_with_object({}) {|(k,v), acc| acc[k] = v.value }

where node is your tag.

0

Using Nokogiri, the attributes are already parsed for you - simply access them using []:

doc = Nokogiri::HTML.parse('<html><body><div type="checkbox" name="prdCdList" value="102001174" class="bnone" newfl="Y" cpnfl="N" catcpnfl="N" eventfl="N" catcd1="102000" catcd2="102001" prdimgl="/upload/product/320_1405497216907.jpg" prdnm="Dear my volume" prdvol="3.4g" prdlndesc="Limited Pink" selprc="10000" spsalprc="0" cpnprc="0" cashptrat="0" cashpt="0" discpt="0" salstatcdnm="Available" salstatcd="PS01" prdwidth="0" prdheight="0" prddepth="0" pricestr="" price="10000" prepromote="" endpromote=""></body></html>')
div = doc.css('div').first
div['prdnm']
# => "Dear my volume" 

From the documentation:

Nokogiri::XML::Node is your window to the fun filled world of dealing with XML and HTML tags. A Nokogiri::XML::Node may be treated similarly to a hash with regard to attributes. For example (from irb):

01.irb(main):004:0> node
02.=> <a href="#foo" id="link">link</a>
03.irb(main):005:0> node['href']
04.=> "#foo"
05.irb(main):006:0> node.keys
06.=> ["href", "id"]
07.irb(main):007:0> node.values
08.=> ["#foo", "link"]
09.irb(main):008:0> node['class'] = 'green'
10.=> "green"
11.irb(main):009:0> node
12.=> <a href="#foo" id="link" class="green">link</a>
13.irb(main):010:0>

See Nokogiri::XML::Node#[] and Nokogiri::XML#[]= for more information.

1
  • wow...that was so simple and neat.. thank you so much. That seems to be such a basic feature of Nokogiri.. thanks!
    – Rok
    Commented Jul 21, 2014 at 14:21

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.