Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Engali

macrumors member
Original poster
Jul 7, 2012
82
0
So, I'm gonna be getting a computer for data analysis. I'm a PhD student in a quant-heavy program and I need to get a desktop (already have a June 2012 Retina Pro) to run analyses while I work. I will be using LatentGOLD, a program that does Latent Class Analysis. I've run some analyses on an oldish Dell and it has literally taken days to estimate all the parameters. This may be because of the file size (145,000 cases), or the number of variables (seven scores per case), or the number of parameters that needs to be estimated and compared against each other (20 to 50).

In any case, from a bit of research I've done, the biggest bottle neck for data analysis processing seems to be RAM then the processor speed. If I get a mid-level ($799) Mac Mini with the 2.6ghz processor, install 16gb of RAM, and maybe even add a Samsung 840 Pro SSD to create a fusion drive, will that serve my needs? I need something that will last me 5 years. Or should I go for a base 21.5 inch iMac.

LatentGOLD is only made for Windows, so I will have to Bootcamp. I don't know if that is relevant. Thanks for you help.
 
First of all, your advisor should be paying for you to have a desktop at work, then just remote in to do the work.

A mac mini or imac might sound like a good computer to buy, but they are not made for running computations, which use 100% CPU and a lot of GPU. The fans are small and when they run at full speed, the computer will become very loud and hot. Since your analysis takes many hours, the lifetime of the computer will be low, not to mention the annoying loud sound it will make. Also upgrading to a SSD and 16 GB memory is not cheap. It looks like you'll save money and also have a much better CPU and GPU if you buy a desktop PC. It should easily last 5 years. Windows is not requiring high specs nowadays and mid range computers from 2007 still run fast on windows 8.
 
So, I'm gonna be getting a computer for data analysis. I'm a PhD student in a quant-heavy program and I need to get a desktop (already have a June 2012 Retina Pro) to run analyses while I work. I will be using LatentGOLD, a program that does Latent Class Analysis. I've run some analyses on an oldish Dell and it has literally taken days to estimate all the parameters. This may be because of the file size (145,000 cases), or the number of variables (seven scores per case), or the number of parameters that needs to be estimated and compared against each other (20 to 50).

In any case, from a bit of research I've done, the biggest bottle neck for data analysis processing seems to be RAM then the processor speed. If I get a mid-level ($799) Mac Mini with the 2.6ghz processor, install 16gb of RAM, and maybe even add a Samsung 840 Pro SSD to create a fusion drive, will that serve my needs? I need something that will last me 5 years. Or should I go for a base 21.5 inch iMac.

LatentGOLD is only made for Windows, so I will have to Bootcamp. I don't know if that is relevant. Thanks for you help.

Data analysis is my area of expertise. (used to build large databases for data mining)

Many will disagree with me but running proper data analysis for long periods will bring the temperature of your mac mini up to close to 100C and the CPU will throttle. High temperatures will also shorten the life of the componentry.

It is like using a Ferrari to do the work of a Mack Truck, horses for causes and heavy lifting is not the Mac mini's design brief. If it was then there is no need for the Mac Pro.

edit: your biggest bottleneck will be getting the data in and out of the CPU so the speed of the databus and how many data paths there are. Number of clockcyles in your RAM becomes important too. Depending on the size of your data you may not be able to keep it in RAM and then transfer speed from the permanent storage becomes an issue. Check if your software is using the graphics processor for number crunching (some programs do...)
 
Last edited:
First of all, your advisor should be paying for you to have a desktop at work, then just remote in to do the work.

A mac mini or imac might sound like a good computer to buy, but they are not made for running computations, which use 100% CPU and a lot of GPU. The fans are small and when they run at full speed, the computer will become very loud and hot. Since your analysis takes many hours, the lifetime of the computer will be low, not to mention the annoying loud sound it will make. Also upgrading to a SSD and 16 GB memory is not cheap. It looks like you'll save money and also have a much better CPU and GPU if you buy a desktop PC. It should easily last 5 years. Windows is not requiring high specs nowadays and mid range computers from 2007 still run fast on windows 8.

He is :)

What specs would you recommend for a PC. I was looking at some Dells and the specs compared to a Mac Mini seem comparable even after I upgrade the Mini to 16gb ram and add an SSD. Maybe I'm looking in the wrong place?

----------

Data analysis is my area of expertise. (used to build large databases for data mining)

Many will disagree with me but running proper data analysis for long periods will bring the temperature of your mac mini up to close to 100C and the CPU will throttle. High temperatures will also shorten the life of the componentry.

It is like using a Ferrari to do the work of a Mack Truck, horses for causes and heavy lifting is not the Mac mini's design brief. If it was then there is no need for the Mac Pro.

edit: your biggest bottleneck will be getting the data in and out of the CPU so the speed of the databus and how many data paths there are. Number of clockcyles in your RAM becomes important too. Depending on the size of your data you may not be able to keep it in RAM and then transfer speed from the permanent storage becomes an issue. Check if your software is using the graphics processor for number crunching (some programs do...)

If you could point me in the direction of a PC or the specs I should look for then I would greatly appreciate it. My advisor wants me free up one of the lab computers I've been using to run LatentGOLD. The current analysis is taking like 4 days and counting...
 
It is true that the mac mini runs relatively hot, but they are sold in a server version that regularly run 24/7 which has no physical differences between the normal models. There are free programs like smcFanControl that you can use to bump up the fan speeds beyond the normal profile if you want to run it a little cooler. The CPU is designed to go to 105*C and the mini will keep it below that.

That said, make sure that the computations you are doing are only CPU based and not GPU based, as the mini does not have a GPU. If that is the case, I think you will be impressed with the mini. The 2.6 i7 model will give you over 80% of the performance of a top of the line i7-3770 desktop processor, and is more powerful than the iMacs in that regard except the top of the line 27" model which is equipped with that processor. The reason is that the mini processor supports hyperthreading whereas some of the iMac processors do not. You would have to get a multi-CPU workstation to go even faster.
 
What specs would you recommend for a PC. I was looking at some Dells and the specs compared to a Mac Mini seem comparable even after I upgrade the Mini to 16gb ram and add an SSD. Maybe I'm looking in the wrong place

If you could point me in the direction of a PC or the specs I should look for then I would greatly appreciate it. ..

Check what kind of computers are used for (intraday) trading (the futures market) - those people do lots of this kind of number crunching. www.elitetrader.com has a forum on hardware recommendations.

edit: I do not know your application and what you're trying to do but you may want to re-examine your approach. I'm concerned that you're running into a trap called "overfitting" and are having far too many variables. Most processes (decision making) can be controlled with less than 5 variables and it drops off rapidly afterwards. Been there, done that.
 
Last edited:
I work in data analysis - computational models, mostly of epidemics, but it's all roughly the same stuff. Without some notion of what's causing your stuff to be slow, it's hard to tell whether or not a Mac Mini will be right for you.

What's being maxed out on the current Dell, and what are it's specs? Is it fine for memory, but the CPU is running at 100%? Or are you out of memory and your processor is idling waiting for things to clear up?
 
Check what kind of computers are used for (intraday) trading (the futures market) - those people do lots of this kind of number crunching. www.elitetrader.com has a forum on hardware recommendations.

edit: I do not know your application and what you're trying to do but you may want to re-examine your approach. I'm concerned that you're running into a trap called "overfitting" and are having far too many variables. Most processes (decision making) can be controlled with less than 5 variables and it drops off rapidly afterwards. Been there, done that.

I'm running latent class analysis. Long story short, we're trying to find unique configurations of score ratings for 7 measures of aspects of personality that may indicate "classes", "types", or "profiles" of certain types of people. The way the software works is it estimates models with X number of "clusters", which is just another way of saying that it tries to fit the data to X number of clusters which would represent the potential classes I referred to that may emerge from the data.

The issue is that we have to compare the Log Likelihood between models to see which has the best fit aka the least amount of unexplained variance. So we need to explore the data by running a number of models and discerning the one with the best fit. We did that already and what's keeping the thing chugging now is doing 10 models times 5 levels of a grouping variable.
 
I'm running latent class analysis. Long story short, we're trying to find unique configurations of score ratings for 7 measures of aspects of personality that may indicate "classes", "types", or "profiles" of certain types of people. The way the software works is it estimates models with X number of "clusters", which is just another way of saying that it tries to fit the data to X number of clusters which would represent the potential classes I referred to that may emerge from the data.

The issue is that we have to compare the Log Likelihood between models to see which has the best fit aka the least amount of unexplained variance. So we need to explore the data by running a number of models and discerning the one with the best fit. We did that already and what's keeping the thing chugging now is doing 10 models times 5 levels of a grouping variable.

These days I'm involved in another class of time critical data analysis but this sounds like work I did on air travelers' habits and telephone users' habits. If you get too many variables then the time involved to analyze gets just too much and there will be no other solution than starting to rank the most important variables and then drill down in groups. Good luck, sounds like an interesting assignment. Have you thought about splitting up in age groups as a first step and afterwards by sex?
 
data analysis is a good job for the creating the website designing i also used the dreamweaver for creating the websites.
 
I'm running latent class analysis. Long story short, we're trying to find unique configurations of score ratings for 7 measures of aspects of personality that may indicate "classes", "types", or "profiles" of certain types of people. The way the software works is it estimates models with X number of "clusters", which is just another way of saying that it tries to fit the data to X number of clusters which would represent the potential classes I referred to that may emerge from the data.

The issue is that we have to compare the Log Likelihood between models to see which has the best fit aka the least amount of unexplained variance. So we need to explore the data by running a number of models and discerning the one with the best fit. We did that already and what's keeping the thing chugging now is doing 10 models times 5 levels of a grouping variable.
Hoo wudda thunk cumputer's cud doo stuf like dat? I thot dey wus ony fur games an streemin kool stuf.
 
These days I'm involved in another class of time critical data analysis but this sounds like work I did on air travelers' habits and telephone users' habits. If you get too many variables then the time involved to analyze gets just too much and there will be no other solution than starting to rank the most important variables and then drill down in groups. Good luck, sounds like an interesting assignment. Have you thought about splitting up in age groups as a first step and afterwards by sex?

We aren't really looking at age because we're focusing on cultural differences and similarities. Since we're looking at personality variables, I think we're working off the conclusions of extant research that seem to suggest these variables will be stable over time. We also only have rating scores of working adults by the nature of the data, so it's unlikely we'll see any significant developmental changes over time. Having said that, I will definitely bring it up with my advisor and see what he has to say about it. Thanks for your help.
 
Does your university not have unix servers that could run this? What department are you in? Could be worth talking with other departments if you're department doesn't normally do this kind of thing. Any high spec desktop machine would do okay, but it does depend on exactly what is limiting performance. How is your data stored and how big is the base data? Is it in a database on some kind of proprietary file or other flat file? If you could get the data into a structured database entirely in RAM, you might see big improvements (or you might not). The mac mini is super-quick (for its size) but as others have said it will just get hot and noisy if you run it flat out for more than about ten minutes.
 
Does your university not have unix servers that could run this? What department are you in? Could be worth talking with other departments if you're department doesn't normally do this kind of thing. Any high spec desktop machine would do okay, but it does depend on exactly what is limiting performance. How is your data stored and how big is the base data? Is it in a database on some kind of proprietary file or other flat file? If you could get the data into a structured database entirely in RAM, you might see big improvements (or you might not). The mac mini is super-quick (for its size) but as others have said it will just get hot and noisy if you run it flat out for more than about ten minutes.

It is not the database access, it is moving the data around in memory. So the amount of memory, latency in memory, data paths from memory to CPU, size of internal CPU cache and speed of the memory bus are important. Not to talk about how many variables you're using - processing goes up exponentially for every condition added.
 
The i7 mini will go to 100degC and max the fans out long before you hit 100% cpu. More like 35% cpu will get you there. But maybe this won't bother you. If it does't than the mini will do the job.

For hard core number crunching you could build an i7 cpu Windows machine in a few hours from great parts for $1000 - a bunch less too if you care to. With a less than $100 after market cpu cooler you can keep deltaT on the CPU at 100% load to 25degC or less.

If it must be a Mac - Mac Pro is the best choice - even an old one (cost less than a mini!). I have benchmarked the 2013 3.4GHz i5 iMac and at 70% load the CPU temps were in the 70's with no increase in fan speed. If you really max it out though I fear the temps will be right up there with the mini - fans too.

This is all about thermal design - the MacPro will have delta T on the CPU of less than 25degC running max load with very little if any increase in fans. The CPU cooler in the old MP is bigger than the entire mini!
 
Last edited:
Honestly Data processing on windows.

Just get a PC.

You don't need a pretty box, you are not going to be using the OS. You are going to be doing very heavy heat inducing work which the mini may choke on.

http://configure.dell.com/dellstore...e&model_id=xps-8700&c=ca&l=en&s=dhs&cs=cadhs1

Similar price (can get the model with 16 Gb RAM if you want but the 12 GB is in proper dual channel mode) much stronger CPU. Will cool properly. Half decent GPU compared to HD 4000 if you ever need anything
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.