Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mikep001

macrumors newbie
Original poster
Feb 2, 2010
4
0
I wrote a small recursive algorithm to generate all permutations/combinations of a string with any of 89 characters. On my linux box with a core 2 duo and 2 gigs of ram, the program runs exponentially faster than on my imac with an i7 processor and 4 gigs of ram.

I took a look at the processor usage on the mac and it's around 13%.

I read somewhere that the disk IO on OSX takes longer than on linux, so I removed all the writing to the disk and the program is simply generating strings, not storing them anywhere.

When I run the program with the clock command on the two separate systems, the output is like this:

Linux
1 10000
2 10000
3 10000
4 110000
5 9640000

OSX 10.6.2
1 3043
2 4377
3 125890
4 9480674

Where the number on the left is the number of characters allowed in the string and the number on the right is the output of the clock function. You'll notice that 5 is missing from the mac section. This is because after 10 minutes, the mac had not finished generating strings of length 5. The linux box with inferior raw computing power generated strings of length 5 in under 30 seconds.

here is the code, it is exactly the same on both machines

Code:
#include <stdlib.h>


#include <iostream>
//#include <iomanip>
#include <string>
#include <stdlib.h>
#include <algorithm>
#include <vector>
#include <fstream>
#include <ctime>

using namespace std;

//10+26+26+27 = 89 CHARACTERS

void Tokenize(const string str,
                      vector<string>& tokens,
                      const string& delimiters = "-")
{
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

void generatePasswords(int depth,char chars[89], string passwd, vector<string> start, vector<string> stop)
{




	for(int x = atoi(start[depth].c_str()); x<atoi(stop[depth].c_str()); x++)
	{
		string currpass = passwd + chars[x];
		if(depth!=0)
		{
			generatePasswords(depth-1, chars, currpass, start, stop);
		}
		else
		{
		//cout<<currpass<<endl;
		//ofstream file;
		//file.open("test.txt",ios::app);
		//file <<currpass<<endl;
		//file.close();
		}
	}
}


int main()
{
 char chars [89]={'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t',
 'u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','~','!','@','#',
 '$','%','^','&','*','(',')','_','+','=','?',';',':',',','<','>','{','}','[',']','\\','/','-'};

  vector<string> start;
  vector<string> stop;

    string str("00");
    string str2("89");
    for (int i = 1; i<6; i++)
    {
        Tokenize(str, start);
        Tokenize(str2, stop);
        int length=start.size();
        int depth = i;
        generatePasswords(depth-1, chars,"", start, stop);
        cout<<i<<" "<<clock()<<endl;
        str=str+"-00";
        str2=str2+"-89";
    
    }
}

any ideas why my i7 powered osx machine is being so severely whipped by my core 2 duo linux machine?

thanks,
Mike
 
any ideas why my i7 powered osx machine is being so severely whipped by my core 2 duo linux machine?

A few possibilities, mostly derived from info you haven't provided:

1. CPU arch. 10.6.2 defaults to x86_64. I don't know what your linux box defaults to.

2. Compiler optimization level. Or compiler options in general.

3. Runtime environment. You didn't say how you're running the Mac OS build. Is it an Xcode Debug build? Release build? Under debugger or not? Command-line? Something else?

4. Different lib implementations. The linux libs could simply be more efficient for some reason.

If you want to find out where the Mac version is spending its time, I suggest profiling it. The tools are there. You may not like what the profile tells you, but it's the best way to find out why.
 
chown33 made some suggestions. I ran this briefly through instruments. Most of the time is spent in the string processing, mostly destroying the string at the end of each generatePasswords instance. Note that for 5 characters, this is 5.5 billion passwords. That's a lot of string copies, creation, destruction of strings. You might be able to tweak the way you're doing things to help this, but you're not likely to speed up the string library.

-Lee
 
1. My linux kernel is 64 bit
2. I'm not running this on either machine with any extra compiler options. Should I be?
3. I'm running the build from the command-line
4. I ran Shark and as lee1210 said, it looks like its spending most of its time in the string libraries. Is the conclusion therefore that the OSX libraries are just slower than the linux ones?

Mike
 
This is single threaded. Is one core on your core 2 duo faster than one core on your i7? What about memory setup/speed? I don't know about cache setup on either of those chips, but it's possible that the one on your linux machine may fit this task better.

There could be major differences with the libraries. I guess I'd say to be sure you'd have to put Linux on the iMac and test apples to apples. On my core duo mbp, this took 40 minutes. Memory use was very low, CPU was extremely high (using a whole core all the time).

-Lee
 
This is currently running single threaded. The core 2 duo cores on the linux box are clocked at 2.20 and the i7 cores are clocked at 2.8. The OSX machine has DDR3 ram and the linux box is using DDR2.

The memory usage should be very low as its actually storing very few variables. I would expect it to use a whole core, but the load average on my i7 is about .4 . I'm about to boot up knoppix on my imac and run the code and see how it goes.

Mike

P.S.

I changed the lines
string currpass = passwd + chars[x];
if(depth!=0)
{
generatePasswords(depth-1, chars, currpass, start, stop);
}


to

if(depth!=0)
{
generatePasswords(depth-1, chars, passwd+chars[x], start, stop);
}

Now it seems to be running faster on the OSX machine.

Linux
1 10000
2 10000
3 10000
4 50000
5 3800000

OSX
1 2844
2 2900
3 3252
4 35796
5 2886320
 
Why don't you use Shark and answer the question yourself?

Are you building a debug version by any chance?
 
I don't know why your program performs differently on the different systems. A few possibilities are:
1. There's some kind of logging/debugged going on behind the scenes on OS X -- you can look into compiler and linker options. I asusme you're not running in the debugger.
2. string class might be implemented with the copy-on-write technique in the libraries on the linux box but not in the libraries on the OS X box. See http://en.wikipedia.org/wiki/Copy-on-write for an explanation.
3. There might be differences in the compilers in terms of how they optimize code.

If you want to optimize your program, there are a few general rules of thumb you could apply.

1. Don't do heap allocation in the inner loop.
2. Move processing that is the same for every iteration of the loop outside the loop.

In this case, because iof the recursion, the whole of generatePasswords() is your inner loop.

Notice that:
* The parameters passwd, start, and stop are complex type that are passed by value. You only read from them, so you can simply pass them by reference. That way you'll avoid copying them on every iteration.

If string uses COW the improvement won't be as dramatic but it's a simple change. Also an optimizing compilter might be able to notice your code doesn't modify these values and not copy their values anyway -- but it might not. It's better to be explicit.

* You access the values in start and stop as integers -- but since they are stored as strings, you end of converting them to integeres over and over. Instead, have start and stop store ints to begin with.

Finally, you've got the expression passwd+chars[x]. That's allocating a new string which is a copy of passwd with a character appended. To avoid allocating a new sting, you couild try to convert your code to reuse a buffer allocated before the top-level call to generatePassword().

----

If your main goal is to generate these permutations as fast as possible, it is probably easier to write efficient non-recursive code that efficient recursive code -- Don't know if you are interested in a rewrite, though.
 
Theres no debugging running on either machine.

I got rid of start and stop and replaced them with integers, but the issue isn't so much with the efficiency of the code (i'll worry about that later). My issue is with the performance difference.


I'm looking into COW and some compiler options now.

Thank you,
MIke
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.