Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I am trying to extract numbers out of some text. Currently I am using the following:

echo "2.5 test. test -50.8" | tr '\n' ' ' | sed -e 's/[^0-9.]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' '

This would give me 2.5, "." and 50.8. How should I modify the first sed so it would detect float numbers, both positive and negative?

share|improve this question
    
Should the solution also be able to handle integers (positive and negative) as well as numbers in 1e10 format? – Kusalananda yesterday
    
No, just negative and positive floats would be fine. – nimafl yesterday
up vote 4 down vote accepted

grep works well for this:

$ echo "2.5 test. test -50.8" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'
2.5
-50.8

How it works

  • -E

    Use extended regex.

  • -o

    Return only the matches, not the context

  • [+-]?[0-9]+([.][0-9]+)?+

    Match numbers which are identified as:

    • [+-]?

      An optional leading sign

    • [0-9]+

      One or more numbers

    • ([.][0-9]+)?

      An optional period followed by one or more numbers.

Getting the output on one line

$ echo "2.5 test. test -50.8" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' | tr '\n' ' '; echo ""
2.5 -50.8
share|improve this answer
    
Great, thanks for detailed explanation. Is it possible to get returned values all in one line rather than each on a new line? Cause I want append the result of this to an existing file. – nimafl yesterday
1  
Your welcome. And, yes, one line is possible. See update. – John1024 yesterday
    
Try it with 1.2.3 (results in 1.2 and 3) or 9-9 (results in 9 and -9)... – Kusalananda yesterday

A grep solution:

$ echo "2.5 test. test -50.8" | tr ' ' '\n' | grep -E '^[+-]?[0-9]*\.?([0-9]+)$'
2.5
-50.8
  • The tr just converts the line into multiple lines by replacing the spaces with newlines.

  • The grep command looks for strings that starts with an optional + or -, possibly followed by some digits and an optional decimal point. Then we require some digits at the end.

This will let through things like 00000123.91288000, which just looks strange. Is this a number we want to filter out or not? It's technically a floating point number, just oddly formatted.

EDIT: To properly check for numbers, do not write your own regular expression! Use a library routine from somewhere reliable.

In my case, I would use Perl's Scalar::Util package, which has a convenient looks_like_number() subroutine:

$ echo "2.5 test. test -50.8" | tr ' ' '\n' | perl -MScalar::Util -ne 'Scalar::Util::looks_like_number($_) && print'
2.5
-50.8

This has the added benefit of finding numbers on other forms, such as 1e3.

share|improve this answer
    
Try your solutions with First number is 1.2. Second is 2.4. (Yes, there are many special cases: Without further guidance on the input, it is hard to decide how each special case should be treated.) – John1024 yesterday
    
@John1024 Touché, you're correct there. – Kusalananda yesterday
1  
It is good to have alternatives, so +1. – John1024 yesterday

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.