Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

astrostu

macrumors 6502
Original poster
Feb 15, 2007
391
32
I'm running some high-end simulations of Saturn's rings. The code is written and highly optimized in C++, though it only runs on a single processor. I have an 8-core Mac and I want to run 6 simulations at once, leaving 2 processors for other stuff.

What I want these simulations to do is use all of the possible processor power they can - 100% of a CPU each (they're not multi-threaded due to the nature of the simulation). However, watching them using top shows they're averaging between around 90-98%, and only rarely topping 99% or 100%.

They have a very small RAM footprint (52 MB VSIZE) so I don't think it's a memory issue. They're only outputting a few lines of text about once every 20 seconds, and they're outputting a 5.3 MB binary file every ~2-3 minutes. So I know that once every 20 seconds they'll slow a tiny bit, and once every 2-3 minutes they'll slow a lot since disk I/O is the slowest thing you can do.

But is there a way I can make them use more processor power while they're actually doing the simulation computations? When I was running them last week, they seemed to be finishing in about 36 hrs. Now they're taking 40, but as far as I can tell, it's the same load. That may seem like a small increase, but I have about 300 of these things to do, and they're only going to get more complicated.
 

Mechcozmo

macrumors 603
Jul 17, 2004
5,215
2
EDIT: Yeah, yeah, I misread the OP's post. I'll go sit in the corner. But first, read the thread. I didn't mess up entirely. I apologize profoundly.

100% CPU means one core is running at maximum.
200% CPU means two cores are maxed out.
etc...
Example: on my MacBook Pro, I can hit ~150% pretty easily by having an iMovie export (100% CPU) as well as having iTunes, etc. running (the other 50%). OS X automatically divvies up the threads so each core is working, and it isn't all being dumped onto one before the other starts to execute code.

If your code isn't threaded, you can't run it on more than one CPU at once. If you want to see speed increases, I'd suggest reworking your code to spawn multiple threads, one per core.
 

astrostu

macrumors 6502
Original poster
Feb 15, 2007
391
32
100% CPU means one core is running at maximum.
200% CPU means two cores are maxed out.
etc...
Example: on my MacBook Pro, I can hit ~150% pretty easily by having an iMovie export (100% CPU) as well as having iTunes, etc. running (the other 50%). OS X automatically divvies up the threads so each core is working, and it isn't all being dumped onto one before the other starts to execute code.

If your code isn't threaded, you can't run it on more than one CPU at once. If you want to see speed increases, I'd suggest reworking your code to spawn multiple threads, one per core.

I don't think you quite understand my question. I realize that 100% is per core, so theoretically if I was using everything that 800% would be used.

These simulations are single-threaded. The code cannot theoretically be adapted - at the moment - to multiple threads because of the nature of the simulation (there's a computer science professor at Trinity College working on it).

So I realize that each simulation can only run at 100% max. My problem is that they're currently averaging in the low 90s%, and I want to know if there's a way to get them to run slightly faster, up to 99% or so.
 

Mechcozmo

macrumors 603
Jul 17, 2004
5,215
2
Just re-read your post... ahh....

OK, so you've got 6 of these running, OS X puts each on a separate core, and you're wondering why you've got 6 processes at 90% instead of 6 at 98%?

I honestly don't know... you could try renicing the processes, so they'll assume a higher priority over others. ("man renice" in the Terminal) I'd try that and see what it does; the other 10% of each core is probably just load-balanced threads from other things.

EDIT: This was bound to happen...
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
your fate is basically in the scheduler's hands. You can set the priorities higher (lower in terms of the actual priority) so the scheduler will give them more time, though it sounds like they are getting a lot as is. I don't know if os x allows for processes to be given processor affinity, but it probably wouldn't help much.

-Lee
 

Sun Baked

macrumors G5
May 19, 2002
14,941
162
If your code isn't threaded, you can't run it on more than one CPU at once. If you want to see speed increases, I'd suggest reworking your code to spawn multiple threads, one per core.

Seems he wants one to make one of those resource hog apps, that use all available CPU power he can get. To get around all the stuff the OS put up to prevent it, and use about 75% of an Octo's total resources.
 

tacoman667

macrumors regular
Mar 27, 2008
143
0
I believe you would need to multithread to even come close. This is why running iMovie and iTunes by the previous poster caused 150% utilization. OS X will automatically scale down the power to applications requesting more then the OS will allow outside of it's own "selfishness". The only way to use 100% in one application would be to run an infinite loop. At least that was my experience in Windows applications when maintaining crappy legacy code from someone who had no clue what they were doing.
 

chunkyks

macrumors newbie
May 21, 2008
1
0
The correct answer to the original question is actually "No, you cannot force an application to eat more cpu cycles than it already is, if it's already being given as much cpu as it wants". In your case, I suspect it has all the CPU it wants since you have eight cores that spend most of their time doing not much except generating heat.

A process typically will eat as much as it's able to unless it's voluntarily sleeping [most applications you ever use spend most of their time voluntarily sleeping]. If it isn't eating 100% actual processor cycles, then it's probably doing something that means it can't eat those cycles.

Since this is presumably a well-written application, it's probably grabbing stuff from core memory. If you're reading stuff from hard disk, then you definitely will find less than 100% utilisation because the process requests something from disk then sits on its ass while the machine grabs the data and shuttles it into where you need it. [although as you mention, hard drive access probably less of a priority]

They have a very small RAM footprint (52 MB VSIZE) so I don't think it's a memory issue
Well, your processors only have chaches that are 4 [or 8 or 16 or something] M in size. If the simulation is thrashing stuff that didn't fit in the cache, it's still likely to be waiting cycles while it grabs other stuff.


To other people: Multiple processes is just as valid a way to increase utilisation as threading, usually better because waiting for locks is really sucky. Appropriate work division is key. I don't know if OSX explicitly supports processor affinity, but as a general rule schedulers are smart enough to take the benefits of that into account.


The short version of all this is that your system's scheduler is smarter than you, and 100% utilisation is freaking hard if not outright impossible in the real world. If you want more performance, you're sitting on two cores used for stuff that could be done on my crappy old G3 [email, browsing, ui stuff]... suck it up and run eight processes and just suffer almost imperceptible ui lag.

Gary (-;
 

hazmatzak

macrumors regular
Apr 29, 2008
135
0
What is the total CPU utilization when you are running six simulations at once, with "nothing else" running? Is it just (90% * 6 == ) 540%? Or is it really close to 600% because the other two processors are about 30% busy? Or any other combination that puts you at or slightly over 600%? If so, then that's all you're going to get. It doesn't matter that the scheduler isn't dividing the work in the simplest way possible, because in fact, the situation is not that simple.

You might compare a simple two-line program that increments the same word of memory in an infinite loop -- the simplest, busiest kind of program possible that requires some cache interaction. Run six of those at once and see if it is scheduled any differently.
 

astrostu

macrumors 6502
Original poster
Feb 15, 2007
391
32
I tried renicing, and it didn't really do much. I thought it did since they all started to run at 98% for a few seconds, but then half of them dropped down to 84% for two seconds, etc. ...

Gary - Thanks for the info. It's an incredibly optimized application (like, down to replacing divisions with decimal multiplications because it takes less processor power to do multiplication than division) because it really is for doing cutting-edge research simulations and speed is the limiting factor on what I can run.

As to multithreading, as I said, it's not really possible at the moment. The guy's been working on it for about 2 years and is close, but not there yet. The problem with multi-threading the code (as he's explained it to me) is that gravitational effects on a given particle are calculated not just from all other particles in the simulation cell, but also from mirror cells on all sides. So while individual collisions between particles could be threaded, the bulk of the code - searching the tree structure for gravitational effects and finding the collisions - has yet to be multi-threaded. I'm probably not explaining it completely correctly, but that's my understanding.

Also, at the moment, at least, it's actually better for me to run several simulations rather than one large one across multiple processors. It's just a matter of using each processor to its fullest capability.

As for using the remaining two processors, I guess that's where the scheduler is "smarter" than I. I tried doing 7 simulations at once, leaving 1 processor free. When running 7, ALL of the simulations dropped down to using around 80-85% of the processing power. A 10% average drop in speed isn't really worth running 1 more simulation.
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
<snip>

As for using the remaining two processors, I guess that's where the scheduler is "smarter" than I. I tried doing 7 simulations at once, leaving 1 processor free. When running 7, ALL of the simulations dropped down to using around 80-85% of the processing power. A 10% average drop in speed isn't really worth running 1 more simulation.

Actually, it seems like it is worth it. Even in the case that you were using 92% of 6 CPUs for the simulations, and adding one means using 80% of 7 CPUs, you're still getting more CPU time in the latter case spent doing real work. I'd say try it with 8 and see what you get. When the machine has the time, it should be able to get them all running.

There's always going to be some idle time when fetching from memory, but having the ability to have your real "work" on every core at once seems like the best case scenario. As long as you have enough "other" resources like memory and bus bandwidth such that the processes are not competing, you are unlikely to see dramatically diminishing returns. If you see that you're only using 400% CPU time when running 8 compared to the ~550% you're getting with 6, obviously scale it back.

-Lee
 

skinnybeans

macrumors newbie
Dec 6, 2007
17
0
I would say the more optimized the application is from a computation standpoint (like you say, replacing division with multiplication) the LESS cpu cycles you would expect it to use, because it is achieving the same result, but by using less cycles.

Leaving.... memory optimization. As a previous poster mentioned, depending on how you are hitting that 57mb of ram, you might be loading/unloading a lot of data from the cache. It would probably be possible to run a profiler and see how many cache misses you are getting while the app is running. I know that when I was doing game dev, this was big issue and there was usually someone allocated the task of fixing up our memory accessing.
 

astrostu

macrumors 6502
Original poster
Feb 15, 2007
391
32
Okay, I've learned a new skill, it's called "top -ca" which, as I understand it, basically gives a running average of how much each process uses (%) since the command was executed.

Using it, I've determined that my "visual" averaging was a little off. When running 6 of these simulations on an 8-core machine, each one uses about 99.1% of the CPU. Since they are not multi-threaded, I don't think I really can get any more out of them.

When I run 7, they average at just around 94.5-95.1%. But, I've been using the other two cores for the past few days to run the analysis on the simulations that have already finished. But, I've gone to watching YouTube entertainment on my laptop, instead. ;)

I have one more question (which if no one responds to I may start a new thread in the OS section) - What could cause Finder to take up nearly 19% of a CPU? I haven't any idea why the Finder is eating up so much processing power ... nothing's being copied or no massive emptying of the trash. :confused:
 

antibact1

macrumors 6502
Jun 1, 2006
334
0
If a process isn't hitting 100% CPU usage, there must be a bottleneck somewhere. Try profiling your code, check for cache misses, page outs, etc. Also, maximum CPU usage is a terrible indicator for overall process performance. Optimize your code if you want better performance. With respect to Finder, do you have any Finder windows open at all?
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
What could cause Finder to take up nearly 19% of a CPU? I haven't any idea why the Finder is eating up so much processing power ... nothing's being copied or no massive emptying of the trash. :confused:

That does seem pretty odd.

My machine has been up for 3 days and since then Finder has gotten a total of 1 minute 17 seconds of CPU time. The only thing that springs to mind is if you have a lot of folders with folder actions on them... or maybe (and I'm not sure on this) Spotlight indexing. If you are generating a lot of files or constantly changing files with your simulations, it may be triggering indexing and that's why Finder is so busy. it might not hurt to turn it off and see if that changes anything while you are running your sims.

-Lee
 

exabytes18

macrumors 6502
Jun 14, 2006
287
0
Suburb of Chicago
I'm kind of surprised to see that adding the 7th really doesn't degrade the performance of all the processes as much as you originally stated. It sounded like the number of memory accesses was becoming the bottleneck (which I'd expect) and the memory simply couldn't supply all the cores with data to stay 100% busy.

I suppose it's the day when you are concerned that each of 6 cores aren't hitting 100% utilization, but instead just approaching that figure. :p
 

astrostu

macrumors 6502
Original poster
Feb 15, 2007
391
32
That does seem pretty odd.

My machine has been up for 3 days and since then Finder has gotten a total of 1 minute 17 seconds of CPU time. The only thing that springs to mind is if you have a lot of folders with folder actions on them... or maybe (and I'm not sure on this) Spotlight indexing. If you are generating a lot of files or constantly changing files with your simulations, it may be triggering indexing and that's why Finder is so busy. it might not hurt to turn it off and see if that changes anything while you are running your sims.

Yeah, I got it. The issue was that I had the Finder window open that has the folders with the simulations. I have my Finder set to list the size of every folder, and I guess it takes 25% of a 2.8 GHz processor to do that. I closed the folder, and it closed the issue.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
If a process isn't hitting 100% CPU usage, there must be a bottleneck somewhere. Try profiling your code, check for cache misses, page outs, etc. Also, maximum CPU usage is a terrible indicator for overall process performance. Optimize your code if you want better performance. With respect to Finder, do you have any Finder windows open at all?

This is pretty much the first answer and key answer to take head of in this thread.

Just because there are more time slices, doesn't mean you will take it when your app runs. When you start into I/O, your CPU usage drops while your thread waits for the results of the I/O. You call into the kernel, and so time spent there is counted as part of the kernel process, not yours. There are lots of reasons why you can be running full tilt and still not be hitting 100% on a CPU.
 

Animalk

macrumors 6502
May 27, 2007
471
19
Montreal Canada
Several months ago, I had written a program which was searching through an n-ary tree with millions of nodes and hundreds of depth levels. Some scenarios took close to 30 hours to solve. I was running this on my MBP which is obviously a dual core machine. Now each core can handle two threads (Thank you HyperThreading) so I was able to run 4 instances of my program at any given moment. I experienced almost no performance loss between running 1 or 2 instances of my program. On the other hand, i experienced between 5 and 15 percent performance loss while running more then 2 instances.

I was having similar questions about cpu usage. The only thing that I could think of as a possible reason for not having close to 100% cpu usage on each core is that the scheduler needs to run as well and its job can get quite heavy in your circumstance. Also, I suspect that given the high throughput nature of my program (and most likely yours as well), having several instances of it running at the same time, would be very likely to create a bottleneck over the bus between cores as well as the bus between the cpu and the main memory. A would be tempted to argue that a multi-cpu system such as a mac pro would even be more likely to starve its processors in such conditions due to its heavy dependence on much slower system buses.

Also, if your search algorithm employs backtracking then this would also be another possible cause for a drop in cpu usage. For example, suppose your search goes down a wrong path and has to back track so far that it requires several calls to higher level caches and memory, you will lose a significant amount of cycles.

I am not sure exactly how the scheduler works in os x when scaling to n cores. Would it rather have 5 processes on 5 cores or on less? Why?

What I am trying to say is that it is highly likely that you are at, or very close, to the performance limits of your hardware for such a task. Consider this post an educated guesstimate at best.

Suggestion: Why not simply leave your core program as is and only thread out all the I/O from your program. I am certain you will get noticeable performance gains through just this. Anytime you need to write some text to console or write out to a file, use a thread just for it. This way your program can keep chugging along at full steam even during I/O.
 

elppa

macrumors 68040
Nov 26, 2003
3,233
151
Ask whoever wrote Adobe Media Player.

They seem to have it down to a fine art.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
Several months ago, I had written a program which was searching through an n-ary tree with millions of nodes and hundreds of depth levels. Some scenarios took close to 30 hours to solve. I was running this on my MBP which is obviously a dual core machine. Now each core can handle two threads (Thank you HyperThreading) so I was able to run 4 instances of my program at any given moment. I experienced almost no performance loss between running 1 or 2 instances of my program. On the other hand, i experienced between 5 and 15 percent performance loss while running more then 2 instances.

The Core Duo and Core 2 Duo don't support HyperThreading. Your loss in performance is really from that (and you would see it even with HyperThreading-enabled CPUs).

Suggestion: Why not simply leave your core program as is and only thread out all the I/O from your program. I am certain you will get noticeable performance gains through just this. Anytime you need to write some text to console or write out to a file, use a thread just for it. This way your program can keep chugging along at full steam even during I/O.

This isn't always a good thing. If you are spawning threads for each one, you have a high level of overhead. If you have a single I/O thread, it could help, as long as you aren't blocking your main thread waiting for I/O to come in from the I/O thread anyways.

This is tricky to pull off, depending on the nature of your program design.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.