macOS perl/sed/awk/grep on "strings" file, plz

zeppenwolf · Dec 12, 2015

I have a "strings" file, like this:

Code:

//
// DMGeneralErrorDomain
kDevelopmentTestError           = "Unless you are the programmer, you should not be seeing this.";
kNilOrEmptyDictionary           = "Dictionary was nil or empty.";
kNilOrMissingResource           = "Resource was nil or missing.";
kOSFileOperationError           = "A file system operation error occurred.";
kUnreachableCodeBlock           = "An unreachable code block was reached.";

I want to grep/sed/awk/perl/??? this file into xargs, then into PListBuddy, to create a plist file, an array whose items are just the keys:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
    <string>kDevelopmentTestError</string>
    <string>kNilOrEmptyDictionary</string>
    <string>kNilOrMissingResource</string>
</array>
</plist>

The first problem is that grep chokes, because although this file is English, the other files used in localizing these keys are in Japanese & French... and so they are UTF16. In UTF16, the first two bytes are "??", whatever they are exactly, and grep thinks the whole thing is binary.

I suppose I need perl, but perl has always been a bit too complicated for me. Can you plz help me spit out just the keys from a file like this into xargs then plistbuddy... Thx.

PS: I can't find that "resolved" option???

chown33 · Dec 13, 2015

First, use the iconv command in Terminal to convert the UTF16 data to UTF8. This will put it into a usable form for subsequent processing.

This wouldn't be a difficult program to write in C using stdio. Read a line with fgets(), find the first "word" by looking for whitespace (strpbrk() and friends), then output the word using fprintf().

Here's an awk program.

File "awkstr":

Code:

## Input is a "strings" file.
## Output is an XML plist on stdout.

BEGIN {
  print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
  print "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" " \
        "\"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">"

  print "<plist version=\"1.0\">"
  print "<array>"
}


{
  if ( NF > 2 )  {
    print "    <string>" $1 "</string>"
  }
}


END {
  print "</array>"
  print "</plist>"
}

Read awk's man page to understand the code.

Example input "strings.txt":

Code:

//
// DMGeneralErrorDomain
kDevelopmentTestError           = "Unless you are the programmer, you should not be seeing this.";
kNilOrEmptyDictionary           = "Dictionary was nil or empty.";
kNilOrMissingResource           = "Resource was nil or missing.";
kOSFileOperationError           = "A file system operation error occurred.";
kUnreachableCodeBlock           = "An unreachable code block was reached.";

Command line:

Code:

awk -f awkstr  strings.txt

Example output:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
   <string>kDevelopmentTestError</string>
   <string>kNilOrEmptyDictionary</string>
   <string>kNilOrMissingResource</string>
   <string>kOSFileOperationError</string>
   <string>kUnreachableCodeBlock</string>
</array>
</plist>

Other than its pattern-matching and regex facilities, which can get pretty involved, awk is a fairly straightforward language to learn if you already know C. It's well worth learning the basics of, even though many might consider it puny compared to perl.

Online man pages:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/awk.1.html
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/iconv.1.html

zeppenwolf · Dec 14, 2015

Chown, that works great. Thank you so much for your time and patience in helping dipwits like me. Again.

In hindsight, ( of course ??!? ), it is clear to me now that the whole UTF16 "problem" was a red herring, more or less. After all, the only part of these strings files that is really UTF16 is the, er, "string" part of it, the part that is Japanese or French translated text... And that is the part that I am ignoring completely here. In point of fact, the only part I am concerned with is the "key" value, which by definition is the same exact string which exists elsewhere in my C/ObjC code and is by definition strictly ANSI C or somesuch... So converting first to UTF8 should have been obvious to me from the start.

At any rate... thanks again.

chown33 · Dec 14, 2015

You're welcome. It was an interesting little diversion. This time. And I hope I've encouraged you to study and play with awk. I think it's within the reach even of "dipwits".

Search

Search

macOS perl/sed/awk/grep on "strings" file, plz

zeppenwolf

macrumors regular

chown33

Moderator

zeppenwolf

macrumors regular

chown33

Moderator

Our Staff