Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

cruzrojas

macrumors member
Original poster
Mar 26, 2007
67
0
USA
Hi all
I'm in the need to learn some distributed programming for C. I will probably need to develop some multithread code for either a cluster or a big multiprocessor machine in the future, and I was wondering what is the best place to start. Open MP or MPI, also do you have any book recommendations? something that covers both schemes perhaps?

Best Regards
Jesus Cruz
 
Before you get started, what is at a premium in your problem, CPU time, memory, storage, etc? Are you confident that multiprocessing isn't an option?(I seem to get slammed for this, but if you're just wanting to learn multithreading, maybe it isn't the best option for you) Especially if you intend to work on a cluster, multithreading isn't the way to go, since you can't run threads across machines (in any manner i am familiar with). Also, while you're at it, why is C the way to go? Is there no room for anything else? That's to say, is C++ exception handling too expensive? Is a JVM too heavyweight, etc?

If you give us some information on the problem, we might be able to help more in directing you towards a straight-forward, maintainable solution.

-Lee
 
Dear Lee
Thanks for your reply. What do you mean by multiprocessing? basically what I work on is physics calculations, these don't require a big amount of storage and only rarely memory is a problem, the main constrain is CPU time.

I think I used the term multithreading incorrectly. These calculations are straightforward and not user interactive. Some times these simulations are easy to parallelize in one or more constrains, which is the part I'm interested in right now, eventually a more complicated paralyzation scheme will be required but not for now. I need to use a lightweight programming language because these are long runs. And given that I have the choice I like C syntax better than C++ but that is just a personal opinion.

I'm looking into openMP and MPI but I'm not sure which fits the bill, and what are good books on the topic.

Best Regards
Jesus Cruz
 
Hi Jesus

I have been working in that field, facing similar decisions. My problem basically consisted of solving partial differential equations on a grid.

basically what I work on is physics calculations, these don't require a big amount of storage and only rarely memory is a problem, the main constrain is CPU time.
Don't get bitten by that assumption. When you have that 4GB RAM workstation next to your desk, all runs that make sense to start are cpu bound. But if you move to a massively parallel supercomputer with several thousand processors, where each computing node has access to only half a gig, and you scale your computational domain accordingly, you may be surprised that all of the sudden, memory can become an issue.

I think I used the term multithreading incorrectly. These calculations are straightforward and not user interactive. Some times these simulations are easy to parallelize in one or more constrains, which is the part I'm interested in right now, eventually a more complicated paralyzation scheme will be required but not for now. I need to use a lightweight programming language because these are long runs. And given that I have the choice I like C syntax better than C++ but that is just a personal opinion.
This decision depends on a lot of different factors, like what machines you have access to (especially the number of processors, rather a couple dozens or a couple thousand). OpenMP's underlying assumption is that the code runs on a shared memory machine, which means that all processors can access one large chunk of memory equally fast. While this is generally true for all "normal" computers and fairly small supercomputers, this approach doesn't scale well up to several thousand processors. On a distributed memory machine (like the IBM BlueGene and others), each processor has its own small piece of memory, where access is superfast, and data exchange between processors goes through some connector, which is orders of magnitude slower. Here, your mileage varies also: In a "boxed" supercomputer, the connection is as fast as technology allows, in the often found "cheap cluster with 400 nodes", the connection is typically so slow that communication between processors is the major bottleneck.

I'm looking into openMP and MPI but I'm not sure which fits the bill, and what are good books on the topic.

If you are on a shared memory machine and the code has only one (fairly simple) loop that can be parallelized easily, try OpenMP, since it is very easy to use and gives you speedups very quickly (little coding time). On a distributed memory machine, use MPI. Don't go for hybrid solutions, they are generally not worth the hastle.

As far as language is concerned: Use what you feel comfortable with (as long as it is C :rolleyes:). Most people underestimate the time for development and testing and overestimate the time the runs take. For OpenMP, almost any language will do, it may even be Java. For MPI on a supercomputer, the installed libraries tend to support only Fortran, C and C++. The whole MPI standard is very low level, and while C++ bindings are included in the standard, some implementations do not...um, perform well (C file output several thousand times faster than C++ streams on our platform). We had a C-version of our code and a C/C++-hybrid-version, and the hybrid was faster in the end and easier to maintain. But it took some performance tuning, because the original hybrid was a factor three slower! And the different compilers sometimes produced obscure warnings for legitimate C++-code, which tended to be a nuisance. And the core routines had to be written C-style anyway ;). And my initial language of choice for any project is always C++ over C...

As fas as books go, I don't have any recommendations.

Sorry for the long post, I hope, I could help.

--clemensmg
 
hi clemensmg, thank you for your long reply.

I think MPI is more popular in the field I just never knew why. I have work before with similar software on a bluegene system, so I know the memory can get limiting at times, but I never had to develop anything so far so I wasn't sure which one to learn first. If openMP/MPI hybrid is such a hassle I think i better if I skip openMP altogether, or is there any good reason to learn them together?

in your application with C/C++ implementation, where did the major slowdowns came from?

Best Regards
Jesus Cruz
 
I think MPI is more popular in the field I just never knew why. I have work before with similar software on a bluegene system, so I know the memory can get limiting at times, but I never had to develop anything so far so I wasn't sure which one to learn first.

MPI is popular because it is standardized and gives the programmer the full flexibility and the control of every bit going from one specific processor to another. Unfortunately, it pretty much remains at the bit level: you can merrily send fundamental datatypes (and arrays thereof) all around and about, but if you want to send user defined types or aggregates (classes), not so easy anymore. It is of course possible, but very error prone, since you have to know the memory layout of your type. And this varies from platform to platform, between compilers (some fill in blanks or reorder because of alignement), meaning that your solution is not portable. I may be wrong, but four years ago, that was what I was told by a standard comittee member. I tried it once and then decided not to go there. Whatever data we wanted to send, we just send around the according fundamental (arrays of) datatypes.

If openMP/MPI hybrid is such a hassle I think i better if I skip openMP altogether, or is there any good reason to learn them together?
OpenMP, there isn't much to learn, so you might as well play around with it, so you have it at your disposal when you need it (and the knowledge to decide when you need it). It is very handy if you have a serial programm or prototype and you want to check whether your algorithm scales well or you want to just want to make a port for a few processors or you want to go farming (you start 40 runs with 40 different parameters, all take about the same time to finish, each runs on 8 processors with shared memory). Just look out for the race conditions.

The learning and implementation curve for MPI is much steeper until you know all the different communication options and added the necessary methods. It makes sense if you know that the majority of your runs will be done with a couple hundred procs and not only once or twice, but repeatedly. If you need that one run and it takes six weeks on 8 processors, just go for it with OpenMP, because it may take you longer to learn MPI, rewrite your code and find a supercomputer to run it on. We did only MPI, though, and in our case, it definitely paid off.

in your application with C/C++ implementation, where did the major slowdowns came from?

Basically from the tradeoff expressiveness vs. speed. C++ sometimes wants or needs to create temporary variables, calls copy constructors or the calls to your overloaded operators take time and all that, and if that happens in an inner loop, it can cost some significant amount of time. Sometimes the compiler won't optimize something because of aliasing. Once you found your hotspots, you just replace that one expressive line and replace it with your C-style function, and all the rest stays nice and tidy. Sometimes, you can even help the compiler more to optimize certain things (talking about aliasing again), which can result in an overall speedup. It depends on how well you know the language and what the compiler will do with it. You can write down your code in Fortran and the compiler can safely assume that you won't trick him and optimize whatever it considers helpful.

If you do C/C++, the compiler can't assume anything, and you need to negotiate and promise the compiler that you really won't change the value of that certain variable nor its adress between the end of one line and the beginning of the next ;) If you put the effort in, you may be rewarded. Talk about high maintenance relationships...

Also the C++-stream-IO was absurdly slow for reasons that were beyond me (or anyone I asked). But this must have been the library that was provided. It did come from a big company (whose name I won't tell, even though Steve ditched them :rolleyes:) and was optimized for the hardware, so we were stuck with it.

Cheers,

clemensmg
 
Hey Cruz, glad to see you are going to parallel computing.

I just finished up a grad level course in parallel computing working in C, and it was interesting... We mentioned OpenMP in it, but it's a bit hard to work with, and most parallelized code is in MPI. So out of all of our homework all but 2 were in MPI. So I'd learn MPI before OpenMP, also, you probably want to do some background research in general parallel computing.

There's a great book on Amazon that I had to use, and it provides quite a bit of information and a few algorithms.

http://www.amazon.com/Sourcebook-Parallel-Computing-Kaufmann-Architecture/dp/1558608710/ref=sr_1_1?ie=UTF8&s=books&qid=1241761506&sr=8-1

Also, if you want to check out our course website, by all means: http://www.mgnet.org/~douglas/Classes/na-sc/2009s-index.html

Under the notes section, you'll find examples and tutorials, along with a MPI matrix multiplication code, plus Cuda code.

Good luck! I will tell you this, I struggled with the course on parallelizing things because I had formal training in programming.

Lastly, I've never heard a tenured professor say, "Use goto statements where possible."
 
As an Amazon Associate, MacRumors earns a commission from qualifying purchases made through links in this post.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.