Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

exi

macrumors 6502
Original poster
Oct 16, 2012
451
81
Posting here looking for the more technical folks who might know how to do this.

Have hundreds of contacts. Looking to transition away from Apple Contacts.

Exporting vCard, version 3.0.

Imported into any other software, some phone numbers are formatted (123) 456-7890; others, 1234567890.

Looking at the vCard file in TextEdit shows the same for the same people, of course.

Question is: how could I run through the vCard file and somehow automatically apply formatting such that every ten-digit string of numbers is formatted as (123) 456-7890?

There are far too many to do this manually.
 
What I would do is convert the vcard file to a comma separated value (CSV) file. Then open the CSV file in a spreadsheet application and do the search and replace you want and save. Then convert the CSV back to vcard and import it to your new app.

If you Google "convert vcard to csv" you can find apps and even web sites that will convert for you. Sorry I don't have a specific recommendation since I have not used any of the apps.
 
That sounds great, but the search and replace would require manual entry of hundreds of phone numbers. Suppose there's a way to search for ten-digit strings and then add formatting?
 
An 'awk' script might work. Or perl, which I don't know.

Awk can be told to match a pattern. The pattern is given as a regular expression, and "10 digits" is pretty simple as regex's go.

If you're willing to post some sample data, I can take a shot at an 'awk' script.

The data should be vcard text. If it's really long, post it as an attachment, otherwise up to ~50 lines can be inline in a post if it's pasted within CODE tags. The CODE tags don't have an intrinsic length limit AFAIK, it's just a pain to deal with copy/paste when a file download is an option.
 
  • Like
Reactions: Weaselboy
Thanks, both of you.

@chown33: thanks for the offer. Have only cursory familiarity with regex but not with awk or any real implementation of such things.

Here's some sample data with placeholder text by me, of course. It just repeats in one big wall of text, as you know, for following contacts -- and with additional information where relevant (type=WORK and whatnot).

The actual vCard file in question has many hundreds of entries. Happy to do whatever I can to help make sense out of it.

Code:
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.11.6//EN
N:lastname;firstname;;;
FN:firstname lastname
EMAIL;type=INTERNET;type=HOME;type=pref:email@domain.com
TEL;type=CELL;type=VOICE;type=pref:(123) 456-7890
ADR;type=HOME;type=pref:;;street;city;two-letter state;zip;country
BDAY:yyyy-mm-dd
X-ABOrder:FIRST
CATEGORIES:category
UID:[string]
X-ABUID:[string]:ABPerson
END:VCARD

Some contact entries in the vCard file read as such:
Code:
...
TEL;type=CELL;type=VOICE;type=pref:1234567890
...

Of course, I'm trying to make the number formatting consistent.
 
Thanks for posting the example vcard data. It clarifies exactly what output format you want. I'll post the conversion commands a little later.

First, I'm wondering how many entries this will need to deal with, and whether it might miss anything simply because you're not aware of them. So please do the following and post the results.

1. Export all the Contacts records to "test.vcf" stored on your Desktop.

2. Paste the following commands into a Terminal window, exactly as given:
Code:
grep 'TEL' ~/Desktop/test.vcf | grep -E -c '[0-9]{5,}'
grep 'TEL' ~/Desktop/test.vcf | grep -E -c '[0-9]{10}'

3. The output should be two numbers. They should be the same. Please post them.​
[doublepost=1473791833][/doublepost]
Below is the 'awk' conversion script, and instructions on how to use it.

1. Paste the following into a plain text file:
Code:
#!/usr/bin/awk

## Input is a vcard file or stream.
##
## Output on stdout is the input with all 10-digit TEL numbers
## converted to: (nnn) nnn-nnnn
##
## Only lines whose 1st field contains "TEL" are converted.
## Only the last field on a TEL line will be converted.
## Only numbers with exactly 10 digits are converted.
## All other lines, values, and numbers are copied verbatim.


## Set delimiters for breaking lines into fields and
## building back into lines to be a semicolon.
BEGIN  {
  FS=";"
  OFS=";"
}


## Only convert lines whose 1st field contains "TEL".
$1 ~ /TEL/  {
  ## Only where last field has a 10-digit number.
  match( $NF, /[0-9]{10}/ )
  if ( RSTART != 0 )  {
    prefix = substr( $NF, 1, RSTART - 1 )
    numstr = substr( $NF, RSTART, RLENGTH )
    suffix = substr( $NF, RSTART + RLENGTH )

    num_A = substr( numstr, 1, 3 )
    num_B = substr( numstr, 4, 3 )
    num_C = substr( numstr, 7 )

    result = "(" num_A ") " num_B "-" num_C
 
    $NF = "" prefix "" result "" suffix
  }

  print $0
  next
}


## Any lines not matching a pattern above will print the line verbatim.
{  print $0;  }

2. Save this plain text file on your Desktop as "tele-10.txt".

3. In Contacts.app, export all your data as vCard, storing it in a file on your Desktop named "all.vcf".

4. In a Terminal window, paste this exact command line:
Code:
awk -f ~/Desktop/tele-10.txt ~/Desktop/all.vcf >~/Desktop/new.vcf

5. If the command runs with no error messages in the Terminal window, the output should now be in "new.vcf" on your Desktop.

6. You can drag "new.vcf" onto TextEdit.app and it will open it as a text file. Find one of the entries that you knew to be a 10-digit number, and confirm that the TEL line is now formatted as desired.

7. Use "new.vcf" as your new full list of contacts.


If there are any error messages from the command in #4, copy and paste the complete exact message and post it here.
 
Last edited:
Thanks again for your help!

As for the two initial commands: the numbers are 80 and 78, respectively -- what's that all about / will your script still function as expected? Have not yet tried. For context, this will be running through a vCard file with about 700 entries, many hundreds of which have phone numbers and not just other contact information, and many of those have more than one number. Am looking up the details of the functions you're using. Good to learn.
 
Last edited:
Thanks again for your help!

As for the two initial commands: the numbers are 80 and 78, respectively -- what's that all about / will your script still function as expected? Have not yet tried. For context, this will be running through a vCard file with about 700 entries, many hundreds of which have phone numbers and not just other contact information, and many of those have more than one number. Am looking up the details of the functions you're using. Good to learn.
The first number (80) is the count of "TEL" lines that have 5 or more digits in a row. Note that 10-digit "TEL" lines fall into this category.

The second number (78) is the count of "TEL" lines that have exactly 10 digits in a row. This is the category you wanted to be reformatted.

I chose 5-or-more because the format you wanted "(nnn) nnn-nnnn" has at most 4 digits in a row. That is, no properly formatted phone number will have 5 or more digits in a row. Only improperly formatted phone numbers will.

The conclusion drawn from these two numbers is that you have 2 "TEL" lines with 5 or more digits in a row (80 minus 78), and they aren't 10-digits (78). As a result, there are 2 "TEL" lines that will not be formatted in your desired format.

This is only an approximation; it might be wrong. It's conceivable that the substring "TEL" appears in some non-telephone lines, along with a number of 5 or more digits. The 'awk' program will NOT convert such lines, as its search for "TEL" is more specific than the 'grep' commands shown.

I recommend that you test "new.vcf" in whatever your Contacts replacement will be. Don't commit to it until you're certain it works correctly.


If you want to find those 5-or-more "TEL" lines, paste this command line into a Terminal window:
Code:
grep 'TEL' ~/Desktop/new.vcf | grep -E '[0-9]{5,}'

The output will be only the "TEL" lines. You can then open "new.vcf" in TextEdit.app and search for the numbers, to see what the complete vcard entries are.

Note that it uses the converted "new.vcf" file as input.
 
Ah, I see. I have some numbers left which are pager numbers which may not always follow convention and/or are nonstandard. Easily 98%+ of the file is the same case -- mostly (nnn) nnn-nnnn, some nnnnnnnnnn, all of which should be the former.

Getting an error message using the script in the post above. It's pasted below.

Code:
awk: syntax error at source line 1 source file /Users/Exi/Desktop/tele-10.txt

context is

    >>> {\ <<< rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf470

awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt

awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt
 
Getting an error message using the script in the post above. It's pasted below.

Code:
awk: syntax error at source line 1 source file /Users/Exi/Desktop/tele-10.txt

context is

    >>> {\ <<< rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf470

awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt

awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt
Yeah, I thought that might happen.

You're not converting it to plain text first. Instead, it's being saved as an RTF file (I can tell by the "rtf1" in the error message).

Make absolutely sure it's plain text before saving it.
 
Sigh. I saw that too and thought I corrected. Silly.

Ran the script. Have the output file. Ran your additional line and found the two TELE lines in question -- one, a number formatted as nnnnnn-nnnn (!?); the other, an international number formatted as nnn nn nnnn nnnnnn. Fixed them in the output file by hand.

Spot checked a few contacts with numbers I know are formatted incorrectly in the original file. They now appear as they should as (nnn) nnn-nnnn.

Thanks again for your help. Obviously, you know far more about such things than I do -- the one thing I would wonder is whether there's anything that could be done or if there is any utility in somehow running something to verify data in the new file. My very rudimentary understanding of grep and your script is that it shouldn't be destructive in any way or modify anything aside from formatting as you've mentioned in the comments in the script, but I know what I don't know, so to speak.

As an aside, just to give me an idea, what would be required if I were to want to append a country code -- +1 in my case -- to all numbers throughout the file? If it takes more than two minutes, no bother. You've gone way beyond as it is.
 
Thanks again for your help.
You're welcome. It was an interesting diversion.

Obviously, you know far more about such things than I do -- the one thing I would wonder is whether there's anything that could be done or if there is any utility in somehow running something to verify data in the new file.
One thing I learned when looking into vCard is that there's really no guarantee of compatibility. It all depends on what app is producing or consuming it. For example:
https://alessandrorossini.org/the-sad-story-of-the-vcard-format-and-its-lack-of-interoperability/

Reading the vCard specs is just as disheartening. At least the conversion task in this case was simple and narrowly defined. I'd hate to have to write a more general "reformat phone numbers" conversion (or worse: street addresses).


My very rudimentary understanding of grep and your script is that it shouldn't be destructive in any way or modify anything aside from formatting as you've mentioned in the comments in the script, but I know what I don't know, so to speak.
I intentionally made the 'awk' script be very discriminating about what patterns to match, and what to modify when it found a fully qualified match. The 'grep' cmds were less discriminate, but useful for testing.

If you want, I can modify the 'awk' script so instead of modifying the vcard data it simply outputs the lines that match. Then you can visually confirm that only those lines will be converted.

Let me know if you want that, it's pretty easy to change the script for it.


As an aside, just to give me an idea, what would be required if I were to want to append a country code -- +1 in my case -- to all numbers throughout the file? If it takes more than two minutes, no bother. You've gone way beyond as it is.
Well, it's definitely more than two minutes work.

The main reason is that everything non-trivial has to change:
1. The pattern-matching is different.
It has to match a "(nnn) nnn-nnnn" pattern, rather than 10-digits.

2. The action taken on finding a match is different.
The breaking and reassembly is completely different.​

So pretty much everything other than the script line with BEGIN and the last catch-all action will have to be changed. Plus there's the testing.
 
You're welcome. It was an interesting diversion.


One thing I learned when looking into vCard is that there's really no guarantee of compatibility. It all depends on what app is producing or consuming it. For example:
https://alessandrorossini.org/the-sad-story-of-the-vcard-format-and-its-lack-of-interoperability/

Reading the vCard specs is just as disheartening. At least the conversion task in this case was simple and narrowly defined. I'd hate to have to write a more general "reformat phone numbers" conversion (or worse: street addresses).



I intentionally made the 'awk' script be very discriminating about what patterns to match, and what to modify when it found a fully qualified match. The 'grep' cmds were less discriminate, but useful for testing.

If you want, I can modify the 'awk' script so instead of modifying the vcard data it simply outputs the lines that match. Then you can visually confirm that only those lines will be converted.

Let me know if you want that, it's pretty easy to change the script for it.



Well, it's definitely more than two minutes work.

The main reason is that everything non-trivial has to change:
1. The pattern-matching is different.
It has to match a "(nnn) nnn-nnnn" pattern, rather than 10-digits.

2. The action taken on finding a match is different.
The breaking and reassembly is completely different.​

So pretty much everything other than the script line with BEGIN and the last catch-all action will have to be changed. Plus there's the testing.

Ah, yes, I actually meant data integrity in the sense that something which could verify that the numbers before and after are the same -- that is, nobody's phone number was somehow changed in the process. Which of course is almost certainly prevented by a discerning script, but just curious.

Thinking of exporting to Fastmail, actually, which I find to be very well-supported and coded as far as what they do.

I would definitely be interested in that modified awk script -- for that and for my own edification. I enjoy these sorts of things and am a tech sort of guy in general, but it's not my profession, and so once things turn to code and the more nitty gritty...
 
Here's the modified awk script and instructions.

1. Paste the following into a plain text file:
Code:
#!/usr/bin/awk

## Input is a vcard file or stream.
##
## Output on stdout is only lines with 10-digit TEL numbers.
##
## Only lines whose 1st field contains "TEL" are output.
## Only the last field on a TEL line will be tested for 10 digits.


## Set delimiters for breaking lines into fields.
BEGIN  {
  FS=";"
  OFS=";"
}


## Only for lines whose 1st field contains "TEL".
$1 ~ /TEL/  {
  ## Only where last field has a 10-digit number.
  match( $NF, /[0-9]{10}/ )
  if ( RSTART != 0 )  {

    print $0
  }
}

2. Save this plain text file on your Desktop as "teller.txt".

3. Use the same "all.vcf" as before.

4. In a Terminal window, paste this exact command line:
Code:
awk -f ~/Desktop/teller.txt ~/Desktop/all.vcf >~/Desktop/tens.txt

5. If the command runs with no error messages in the Terminal window, the output should now be in "tens.txt" on your Desktop.

6. Open it in TextEdit.app and you'll see the TEL lines with the 10-digit numbers.


You can manually confirm every line in "tens.txt" with "new.vcf", but here's an automated way. You can do this without producing "tens.txt" at all. That's more for your own uses.

In a Terminal window, paste this exact command line:
Code:
diff ~/Desktop/all.vcf ~/Desktop/new.vcf >~/Desktop/diffs.txt

Then open "diffs.txt" in TextEdit. It will be a list of the differences between the files.

Lines starting with '<' show what's in the 1st file (all.vcf). Lines starting with '>' show what's in the 2nd file (new.vcf). Only "TEL" lines with 10-digits should be shown coming from "all.vcf", and the changed output should be on the line after it.

A "magic code" before each < line looks like this:
157c157​

This means line 157 in the 1st file changed to line 157 in the 2nd file, and the changes are shown as the < and > lines following it.

The 'diff' cmd is capable of a lot more, so you might want to play with it. Here's its man page:
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/diff.1.html
 
This is fantastic stuff. Will play more later, but I think that should about do it. I had tried to figure out how to do this years ago when I migrated from something else to iCloud for mail/contacts/calendars and had no luck there either. Thanks again.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.