Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Spanky Deluxe

macrumors 603
Original poster
Mar 17, 2005
5,291
1,832
London, UK
I have some c code right now that is very memory intensive and uses doubles all the way through. Its simulation code and for some experiments single precision could be sufficient. Is there a way to simply change one flag or something so as to tell the whole code to use floats or doubles? Or do I have to do a search/replace and create two copies of my code?

If there is a way to do it, I'd guess it would have to use something like #DEFINE. Is there a way then to pass that from the command line. I.e application -single or application -double?

Your help would be greatly appreciated!!
 
I guess
Code:
double d=3.142;
float f=(float)d;

would work, but that isn't really what you want to do :eek:.
 
Lol, no I'm afraid not. I basically have hundreds of variables and fftw_arrays that use the size double. I think I can do something like #define SIZE double or #DEFINE SIZE float and then replace all the mentions of double with SIZE but I'm not sure.
 
I suppose you could do something along these lines:

Code:
#ifdef USE_FLOAT
typedef my_float float;
#else
typedef my_float double;
#endif

and then declare all your floats using my_float. Then, you could either add a #define USE_FLOAT to activate floats and comment it out to use doubles. I think there's also a gcc parameter that can define preprocessor macros... IIRC it's something like -D.
 
I have some c code right now that is very memory intensive and uses doubles all the way through. Its simulation code and for some experiments single precision could be sufficient.


Do you want to be able to make the switch at run time or compile time? Run time switching is harder and I'd suggest a compile time switch then you build two version (from the same code) and decide which to run

The way to go is to define a macro that defines the type and which calls to FFTW to make

#ifdef USE_DOUBLE
#define REAL double
#else
#define REAL float
#endif

Later you define you variable like this
REAL somebigarray[100000];


Then on the command line when you compile and you want to use doubles ad the command switch "-DUSE_DOUBLE"

I doubt you will notic much speed up by going with 32-bit floats
 
Do you want to be able to make the switch at run time or compile time? Run time switching is harder and I'd suggest a compile time switch then you build two version (from the same code) and decide which to run

The way to go is to define a macro that defines the type and which calls to FFTW to make

#ifdef USE_DOUBLE
#define REAL double
#else
#define REAL float
#endif

Later you define you variable like this
REAL somebigarray[100000];


Then on the command line when you compile and you want to use doubles ad the command switch "-DUSE_DOUBLE"

I doubt you will notic much speed up by going with 32-bit floats

That sounds doable. Its not the speedup I'm particularly interested in. This code uses a hell of a lot of memory. At the resolution that we want it to run it needs about 100gb of memory right now although we hope to bring that down to around 75gb at least with a bit of better memory optimisation. By switching from doubles to floats users would be able to get double the resolution but with the same memory usage. Crazy amounts of memory, I know!!
 
You also need to call things like printf and scanf with the appropriate format strings, as well as calling the proper version of functions when that is the case, so you would have to write wrappers.
 
Hi

If you are feeling brave you could do this:-

#define double float

but I think you would have to be very careful with library header includes, ie make sure you only redefine double for your source files and not during library include files.

I must admit I think it would cause more trouble than it's worth. Best stick with other posters solutions, eg ChrisA and kpua!

b e n
 

Finite element analysis or perhaps finite volume. The variables can really stack up. To model a car in a collision you would need to break it up into millions of little tetrahedrons to analyze. Each tetrahedrons would have its own shape defined by 4 points each with 3 coordinates. Each of these points will have their own velocities in 3 dimensions(to handle deformation). Then there are material properties to keep track of and if you want to do it correctly you can't just have one global Young's Modulus for the steel because it can change based on how much deformation the piece has already gone through.

/misses doing this stuff
 
Young's Modulus

Then there are material properties to keep track of and if you want to do it correctly you can't just have one global Young's Modulus for the steel because it can change based on how much deformation the piece has already gone through./misses doing this stuff

I think I remember this from A-Level Physics, Stress / Strain isn't it?

F
 
Hi
#define double float

Yikes... follow lazydog's own suggestion and do NOT do this. You may think it's working but it could be failing in unexpected ways.

Also, use kupa's suggestion rather than ChrisA's -- the #define will not play well with the debugger.
 
Not stress strain but something equally complex... turbulence. Luckily the machine I'm running this stuff on has 32 processors and currently 128gb of RAM although I think they want to increase the RAM. What I find amusing though is that right now this code actually runs faster on my Mac Pro than this hpc server! Probably because the server has 1.5 or 1.6Ghz Itanium 2 processors and while the fpu of those beasts is pretty decent, the fpu in the core line of processors is also pretty good and my system has 2.66GHz processors.
Unfortunately my code's not scaling very well right now over multiple processors. On my system Going from one to two processors gives me a 50% speedup. On the server I only get 25%. On the server going from two to four only gives me an extra ~10%. I haven't been able to test four processors on my machine yet. I'm using the threaded fftw3 library and OpenMP as much as is possible though. :(
 
Not stress strain but something equally complex... turbulence.

CFD. I took this job thinking I would be doing that. No such luck. They wanted me for my AppleScript.

Unfortunately my code's not scaling very well right now over multiple processors. On my system Going from one to two processors gives me a 50% speedup. On the server I only get 25%. On the server going from two to four only gives me an extra ~10%. I haven't been able to test four processors on my machine yet. I'm using the threaded fftw3 library and OpenMP as much as is possible though. :(

I haven't done much with paralellization in a long time and then it was mostly done by the compiler.

Did you go to WWDC? There was a small piece in one of the talks about a group that had a program to line up cat scans and search for differences. Their code originally took almost a day to run. They rewrote it to ship most of the grunt work off to the video card and it started spitting out answers in 15 minutes.
 
CFD. I took this job thinking I would be doing that. No such luck. They wanted me for my AppleScript.



I haven't done much with paralellization in a long time and then it was mostly done by the compiler.

Did you go to WWDC? There was a small piece in one of the talks about a group that had a program to line up cat scans and search for differences. Their code originally took almost a day to run. They rewrote it to ship most of the grunt work off to the video card and it started spitting out answers in 15 minutes.

I didn't go to WWDC, a bit out of my budget!! This stuff isn't really well suited to video card computation right now due to the memory it needs. Although video cards are supposedly not half bad at fourier transform type work, this code deals with some arrays several gigabytes in size so they couldn't be loaded into the video card memory in one go.
I've now got to go through the code and manually profile it due to a lack of profiling tools on the server. Hundreds of printf statements, here I come!!
 
I didn't go to WWDC, a bit out of my budget!! This stuff isn't really well suited to video card computation right now due to the memory it needs. Although video cards are supposedly not half bad at fourier transform type work, this code deals with some arrays several gigabytes in size so they couldn't be loaded into the video card memory in one go.[/QUOTE]

One of the reasons for speed they gave was the incredible bus width of video cards.

I've now got to go through the code and manually profile it due to a lack of profiling tools on the server. Hundreds of printf statements, here I come!!

And don't forget the time that profiling code can add to the runs...
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.