Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jemo07

macrumors regular
Original poster
May 6, 2006
168
0
Madrid, Spain
Hi,

I am running into a small issue here trying to load a file to a DB.
Basically I am trying to get the numbers that are in a string with alot of garbage.
So I tried it with a simple awk hack but it fails. I can´t figure out what I am doing wrong but here is a sample using an echo so keep it short:

echo "$%$&&$··....aaaffff><SPP0022555445DFDDSDFvdbdbd" |awk ' { /[0-9]+/ ; print }'

I expected "0022555445" to print but instead I get:
$%$&&?·....aaaffff><SPP0022555445DFDDSDFvdbdbd :(

now, I thought, wait, I am not selecting correctly. So I did a simple test replace my selection with something, like this:

echo "$%$&&$··....aaaffff><SPP0022555445DFDDSDFvdbdbd" |awk ' { sub(/[0-9]+/, "<AAAAA>"); print }'

Here is what I get
$%$&&?·....aaaffff><SPP<AAAAA>DFDDSDFvdbdbd :mad:

As you can see I am selecting all the digits and replacing them with a "<AAAAA>" correctly.

I am a little rusty in figuring this one out. Again, all I want it the print all the numbers in the range [0-9] from a string.

Thank you all in advanced for helping out.

Regards,

Jose
 
I recommend that you read the awk man page.

http://developer.apple.com/mac/library/DOCUMENTATION/Darwin/Reference/ManPages/man1/awk.1.html

awk ' { /[0-9]+/ ; print }'
This does not mean what you seem to think it means.

A line-matching pattern goes outside the {}'s. But then this:
Code:
awk ' /[0-9]+/ { print }'
won't do what you say you want, either. Which is why you should Read The Fine Man page.

I think you will need the match() builtin function, or possibly the split() builtin function, or both.

Yet another strategy is to gsub all non-[0-9] chars with a blank, then strip the whitespace from the resulting string.

It's also unclear to me what should happen if the input contains multiple digit-sequences separated by non-digits, e.g. "xyzzy987foo42at694". Again, that would be something match() or split() might best be applied to.
 
echo "$%$&&$··....aaaffff><SPP0022555445DFDDSDFvdbdbd" |awk ' { /[0-9]+/ ; print }'

I expected "0022555445" to print but instead I get:
$%$&&?·....aaaffff><SPP0022555445DFDDSDFvdbdbd :(

If you know all the lines have that format then a dirty hacky method based on that is

Code:
awk -F '[^0-9]+' '{ print $2 }'

but you are probably better off doing something like:

Code:
awk '{ gsub(/[^0-9]+/, " "); print }'

which will replace all the non-numbers on the line with a space and then print out the string. Exactly what you want to use depends on the format of the incoming data and exactly what you want to get out of course.

HTH,

Andrew
 
but you are probably better off doing something like:

Code:
awk '{ gsub(/[^0-9]+/, " "); print }'



Andrew

Andrew, that is what I was looking for. I went the much longer route and created a long nested exclusion with:
Code:
awk '{ sub(/[ffa-z\$.................]+/, " "); print }'

But your code is exactly what I was trying to accomplish.

thank you very much! I will be able to get this done and a little quicker now since I can reuse this example in many other ways to parse the same file!

Jose
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.