Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I'm working on a script that will separate the Registrar information from a domains whois. So far it's working enough but there are a few things that I want to remove in order for it to be a bit cleaner. It works on the majority of domains. Here's my code:

#!/bin/bash
reg=$(whois "stackoverflow.com" | egrep -i 'Registrar|Sponsoring Registrar|Registrant|!internic')
printf "Below is my best attempt at finding the Registrar info:\n"
printf "$reg\n"

And here's what it outputs:

Below is my best attempt at finding the Registrar info:
with many different competing registrars. Go to http://www.internic.net
   Registrar: NAME.COM, INC.
   Sponsoring Registrar IANA ID: 625
registrar's sponsorship of the domain name registration in the registry is
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.
Registrars.

I added some psudo-code in my grep to try and exclude the string "internnic", in order to snip off that first line. I'd also want to find a way to remove the secondary "registrar's sponsorship..." etc.

Is it possible to detect a string and not include that line? Thanks

share|improve this question
    
True, but if I don't have it it misses a lot that I do want. Good idea though I'll look into it. – Egrodo May 20 at 15:01
up vote 1 down vote accepted

Another option is to be more specific about what you are grepping for. For example:

whois stackoverflow.com | grep -E '^[[:space:]]*(Registr(ar|ant|y)|Sponsoring).*: '

This extracts only lines that begin with optional white space before either 'Registrar', 'Registrant', 'Registry', or 'Sponsoring', followed by any number (zero or more) of any character, followed by a colon and a space.

(BTW, this uses grep -E rather than the obsolete and deprecated egrep. They do the same thing.)

Output:

   Registrar: NAME.COM, INC.
   Sponsoring Registrar IANA ID: 625
Registry Domain ID: 108907621_DOMAIN_COM-VRSN 
Registrar WHOIS Server: whois.name.com 
Registrar URL: http://www.name.com 
Registrar Registration Expiration Date: 2016-12-26T19:18:07Z 
Registrar: Name.com, Inc. 
Registrar IANA ID: 625 
Registry Registrant ID:  
Registrant Name: Sysadmin Team 
Registrant Organization: Stack Exchange, Inc. 
Registrant Street: 110 William St , Floor 28 
Registrant City: New York 
Registrant State/Province: NY 
Registrant Postal Code: 10038 
Registrant Country: US 
Registrant Phone: +1.2122328280 
Registrant Email: [email protected] 
Registry Admin ID:  
Registry Tech ID:  
Registrar Abuse Contact Email: [email protected] 
Registrar Abuse Contact Phone: +1.1 7203101849 

BTW, while testing any form of text processing (incl. regular expressions) on text from slow sources (like a database query or from a remote source like whois or a http server), it's useful to run the slow command once and redirect output to a file, then test against the file. When you have what you want, make sure it works the same with directly-piped (fresh) data.

e.g.

whois stackoverflow.com > so.txt

Other useful things to do with whois output:

  1. extract Domain block at beginning of whos (field lines begin with 4 spaces and end with a colon):

    grep -Ei '^[[:blank:]]+.*:[[:blank:]]' so.txt

Output:

   Domain Name: STACKOVERFLOW.COM
   Registrar: NAME.COM, INC.
   Sponsoring Registrar IANA ID: 625
   Whois Server: whois.name.com
   Referral URL: http://www.name.com
   Name Server: CF-DNS01.STACKOVERFLOW.COM
   Name Server: CF-DNS02.STACKOVERFLOW.COM
   Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Updated Date: 26-nov-2015
   Creation Date: 26-dec-2003
   Expiration Date: 26-dec-2016
  1. extract Registrant block, beginning with `Domain Name' field and ending with 'Registrar Abuse Contact Phone' field:

    sed -n -e '/^Domain Name:/,/^Registrar Abuse Contact Phone:/p' so.txt

  2. both of the above together:

    sed -n -e '/^Domain Name:/,/^Registrar Abuse Contact Phone:/p /^[[:blank:]]+.*:[[:blank:]] /p'

  3. Output from all of the above can easily be further processed with awk or any other text-processing tool that can be made to use a colon (:) character as field-separator.

share|improve this answer
    
Wow, this is extremely informative and helpful to what I'm trying to do. I am going to look into this further when I get a chance to work on my script. I really appreciate your explanations for everything, just reading the post I've learned a ton. – Egrodo May 22 at 4:02

Use the -v flag:

reg=`whois stackoverflow.com | egrep -i 'Registrar|Sponsoring Registrar|Registrant' | grep -v internic`
share|improve this answer
    
This is what I was looking for, thanks! – Egrodo May 20 at 15:02

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.