I'm working on making a CLI for google searches for myself, and I've been using nodejs for a little while working on a chatbot, so I'd like to get it working with nodejs. I can pull the data just fine, and end up with a string that has all the html data from the page. It's even easy to sort out in the html what the results that I want are:
<div class="jd"><a class="p" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://leagueoflegends.com/&ved=0CBAQFjAA&usg=AFQjCNEEnWGHwxNnuwKenqm4ajKfTM6Xxw" ><b>League of Legends</b> - Free Online Game | <b>LoL</b> - <b>League of Legends</b></a> </div> <div class="kd">3 days ago … Official website for <b>League of Legends</b>. Join millions of players in an award winning Multiplayer Online Battle Arena. </div> <div class="qdlmxn"><a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://leagueoflegends.com/board&ved=0CBEQ0gIoADAA&usg=AFQjCNHpmmAdFFbTgm8C_gJvsjVhMzVKUQ" >Community</a> - <a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://signup.leagueoflegends.com/en/signup/redownload&ved=0CBIQ0gIoATAA&usg=AFQjCNFHGUtn4ItgQIzODgIZRv_237Mq0A" >PVP.NET</a> - <a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://na.leagueoflegends.com/board/forumdisplay.php?f%3D2&ved=0CBMQ0gIoAjAA&usg=AFQjCNHpycJ8WGh7xvWw1qNu8NjjU1EA0Q" >General Discussion</a> - <a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://na.leagueoflegends.com/champions&ved=0CBQQ0gIoAzAA&usg=AFQjCNEpeBzNefwag5xmkFcFhCW27FoAew" >Champions</a> </div><span class="c">leagueoflegends.com/</span> - <div class="txnles" onclick="_popup('web_result_popup_10836585','inline');"> <div class="wx4xyp" id="web_result_popup_10836585"> <div class="vfc7iu"><a class="s" href="/search?q=cache:GCRD1wy5e3QJ:leagueoflegends.com/" >Cached</a> <br/><a class="s" href="/m/?q=related:leagueoflegends.com/&ei=ZtZnT8CTOMy48AbDzgE&ved=0CBYQHzAA" >Similar</a> <br/><a class="s" href="/gwt/x?q=lol&ei=ZtZnT8CTOMy48AbDzgE&hl=en&source=m&u=http://leagueoflegends.com/" >Mobile formatted</a> </div> </div><a class="s" href="javascript:void(0)" >Options</a> <div class="m6u8fq"> </div> </div> </div> </div> <div> <div class="r ld"> <div class="jd"><a class="p" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://en.wikipedia.org/wiki/LOL&ved=0CBgQFjAB&usg=AFQjCNFOhgg5Y2E5SFuS5I-8830OJ9VR9Q" ><b>LOL</b> - Wikipedia, the free encyclopedia</a> </div> <div class="kd"><b>LOL</b>, an abbreviation for <b>laughing out loud</b>, or <b>laugh out loud</b>, is a common element of Internet slang. It was used historically on Usenet … </div><span class="c">en.wikipedia.org/wiki/LOL</span> - <div class="txnles" onclick="_popup('web_result_popup_30597472','inline');"> <div class="wx4xyp" id="web_result_popup_30597472"> <div class="vfc7iu"><a class="s" href="/search?q=cache:mhIpOeXQp38J:en.wikipedia.org/wiki/LOL" >Cached</a> <br/><a class="s" href="/m/?q=related:en.wikipedia.org/wiki/LOL&ei=ZtZnT8CTOMy48AbDzgE&ved=0CBkQHzAB" >Similar</a> <br/><a class="s" href="/gwt/x?q=lol&ei=ZtZnT8CTOMy48AbDzgE&hl=en&source=m&u=http://en.wikipedia.org/wiki/LOL" >Mobile formatted</a> </div> </div><a class="s" href="javascript:void(0)" >Options</a> <div class="m6u8fq"> </div> </div> </div> </div> <div> <div class="r ld">
Anything that is .jd is a result, so I first need to separate those out, then working on getting the URLs and the Descriptions separated out, too. I've never done string manipulation to this extreme, so I've got no idea where to start with this.
Here's the html in a more readable format, realize though that I'm dealing with just one long string.
<div>
<div class="r ld">
<div class="jd">
<a class="p" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://leagueoflegends.com/&ved=0CBAQFjAA&usg=AFQjCNEEnWGHwxNnuwKenqm4ajKfTM6Xxw" >
<b>League of Legends</b> - Free Online Game | <b>LoL</b> - <b>League of Legends</b>
</a>
</div>
<div class="kd">
3 days ago … Official website for <b>League of Legends</b>. Join millions of players in an award winning Multiplayer Online Battle Arena.
</div>
<div class="qdlmxn">
<a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://leagueoflegends.com/board&ved=0CBEQ0gIoADAA&usg=AFQjCNHpmmAdFFbTgm8C_gJvsjVhMzVKUQ" >Community</a> -
<a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://signup.leagueoflegends.com/en/signup/redownload&ved=0CBIQ0gIoATAA&usg=AFQjCNFHGUtn4ItgQIzODgIZRv_237Mq0A" >PVP.NET</a> -
<a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://na.leagueoflegends.com/board/forumdisplay.php?f%3D2&ved=0CBMQ0gIoAjAA&usg=AFQjCNHpycJ8WGh7xvWw1qNu8NjjU1EA0Q" >General Discussion</a> -
<a class="gg" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://na.leagueoflegends.com/champions&ved=0CBQQ0gIoAzAA&usg=AFQjCNEpeBzNefwag5xmkFcFhCW27FoAew" >Champions</a>
</div>
<span class="c">leagueoflegends.com/</span> -
<div class="txnles" onclick="_popup('web_result_popup_10836585','inline');">
<div class="wx4xyp" id="web_result_popup_10836585"> <div class="vfc7iu">
<a class="s" href="/search?q=cache:GCRD1wy5e3QJ:leagueoflegends.com/" >Cached</a>
<br/>
<a class="s" href="/m/?q=related:leagueoflegends.com/&ei=ZtZnT8CTOMy48AbDzgE&ved=0CBYQHzAA" >Similar</a>
<br/>
<a class="s" href="/gwt/x?q=lol&ei=ZtZnT8CTOMy48AbDzgE&hl=en&source=m&u=http://leagueoflegends.com/" >Mobile formatted</a>
</div>
</div>
<a class="s" href="javascript:void(0)" >Options</a>
<div class="m6u8fq"> </div>
</div>
</div>
</div>
<div>
<div class="r ld">
<div class="jd">
<a class="p" href="/m/url?ei=ZtZnT8CTOMy48AbDzgE&q=http://en.wikipedia.org/wiki/LOL&ved=0CBgQFjAB&usg=AFQjCNFOhgg5Y2E5SFuS5I-8830OJ9VR9Q" >
<b>LOL</b> - Wikipedia, the free encyclopedia
</a>
</div>
<div class="kd">
<b>LOL</b>, an abbreviation for <b>laughing out loud</b>, or <b>laugh out loud</b>, is a common element of Internet slang. It was used historically on Usenet …
</div>
<span class="c">en.wikipedia.org/wiki/LOL</span> -
<div class="txnles" onclick="_popup('web_result_popup_30597472','inline');">
<div class="wx4xyp" id="web_result_popup_30597472">
<div class="vfc7iu">
<a class="s" href="/search?q=cache:mhIpOeXQp38J:en.wikipedia.org/wiki/LOL" >Cached</a>
<br/>
<a class="s" href="/m/?q=related:en.wikipedia.org/wiki/LOL&ei=ZtZnT8CTOMy48AbDzgE&ved=0CBkQHzAB" >Similar</a>
<br/>
<a class="s" href="/gwt/x?q=lol&ei=ZtZnT8CTOMy48AbDzgE&hl=en&source=m&u=http://en.wikipedia.org/wiki/LOL" >Mobile formatted</a>
</div>
</div>
<a class="s" href="javascript:void(0)" >Options</a>
<div class="m6u8fq"> </div>
</div>
</div>
</div>