Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
haha you told me 214 so I only read that one, if I would it just scroll up just a little bit more then I would it see your name and your previous post

yes you did you mentioned the drives

about the barefoot test they did use AHCI drives for the benchmarks but they did talk about NVME
I know the card is a bit pricey but I believe is worth the money
the only thing I don't like is the green PCB but I will try my best to try to hide the squid carrier card behind my Nvidia 1080

yes I know about the chip and the function but is good we talk about all this for the ones that don't know about it
I wanted to thank you because you help me to complete the puzzle
thanks to you I found the last missing part of the puzzle

most likely I will received the card on Monday because it was shipped today
if I have time I will upload some benchmarks

yes there are many cards out there

ASUS HYPER QUAD M.2
that one is very cheap but I think is locked to ASUS board and the whole thing with buying a separate key, but that one uses bifurcation, so it needs a compatible motherboard

HP Z TURBO QUAD PRO
that one is even more expensive than the squid, I will say probable the best looking all them all
but I don't know if the sell the card alone, but for that price I think it bring some drives inside

DELL ULTRA SPEED DRIVE QUAD
this one is bios lock to dell boards only, brown PCB

HIGH POINT SSD7100 SERIES
this one they sell it with drives and with out drives , also very nice looking unfortunately no Mac support

the dell and the hp are only capable of 9 gbps , supposedly that is the highest they can go
the ASUS and the high point can go very high up to 12,000 and beyond depending on the drives and the CPU

there is also another quad card out there but I don't remember the name right now and I lost the page
the main problem is that many of those card has not been tested on Mac OS or a hackintosh
many are simply windows only and some of them are not even bootable
most of them simply will not work on Mac OS for one reason or another
that's why I settle for the squid card even having that ugly green pcb that I don't like
but at least the card is good and worth the money and is bootable on Mac OS
be back Monday with some test results

Hackintosh.jpg
 
Last edited:
The dell and the hp are only capable of 9 GBps , supposedly that is the highest they can go. The ASUS and the high point can go very high up to 12,000 MBps and beyond depending on the drives and the CPU
They probably all have similar performance if they all support PCIe 3.0 x16. The difference is just in the benchmark test used and the SSD model (manufacturer, size, etc). For example, the main page for the HP says 9 GB/s. But then they have a FAQ showing 12 GB/s. The main page might have taken the read (12 GB/s) and write (6.6 GB/s) and reported the average (9 GB/s).
It's difficult to tell if those products use bifurcation or a switch.
 
  • Like
Reactions: iamtheonlyone4ever
They probably all have similar performance if they all support PCIe 3.0 x16. The difference is just in the benchmark test used and the SSD model (manufacturer, size, etc). For example, the main page for the HP says 9 GB/s. But then they have a FAQ showing 12 GB/s. The main page might have taken the read (12 GB/s) and write (6.6 GB/s) and reported the average (9 GB/s).
It's difficult to tell if those products use bifurcation or a switch.
yes I thought that too but I didn't' wanted to risk it
there were too many variables to consider that's why I took my time before buying anything
but if they advertise their card at a maximum of 9 gbps then I don't expect 12 gbps
unless they used older slower drives to test the card maximum speed
they really need to update their item description and run a benchmark with newer faster drives
I thought those cards are probably first generations and maybe that's why they were a bit slower than the new ones, but I wasn't really interested in a card for windows, I really wanted a card primary for Mac and the squid card was the best option even it was green and a bit expensive but I was looking for some one to show benchmark results like you did, showing the real numbers, not only that but you also have the same model and match the same # of drive that I have, 4 samsung 960, at first I wasn't even sure about the squid card either, I had my doubts about the speed until I read your post then that help me decide
I don't like buying something, guessing, expecting or hoping , I really need to know and make sure first.

i think when a card requieres you to change the pic-e to 4x4x4x4 then it requires bifurcation like the Asus card,
that's why that particular adapter requires a board that support that option or feature
the Asus card is the cheapest m.2 quad of them all. most likely is because it doesn't have the switch chip,
that's why is cheaper than the others
 
Last edited:
I really wanted a card primary for Mac and the squid card was the best option even it was green and a bit expensive but I was looking for some one to show benchmark results like you did, showing the real numbers, not only that but you also have the same model and match the same # of drive that I have, 4 samsung 960, at first I wasn't even sure about the squid card either, I had my doubts about the speed until I read your post then that help me decide
I don't like buying something, guessing, expecting or hoping , I really need to know and make sure first.
You won't have any problems with the Squid Gen 3 in a Hackintosh. The sound from the fan may be noticeable. They maybe should have used a larger slower side blowing fan.

i think when a card requieres you to change the pic-e to 4x4x4x4 then it requires bifurcation like the Asus card,
that's why that particular adapter requires a board that support that option or feature
the Asus card is the cheapest m.2 quad of them all. most likely is because it doesn't have the switch chip,
that's why is cheaper than the others
Agreed.
 
  • Like
Reactions: iamtheonlyone4ever
received the card today , is smaller than it looks in the pictures , I kind of new that but I didn't expect to be that small, I'm not complaining is actually better that is small, I hide the card real good behind my Nvidia 1080 card, the benchmark result was 10,800 with 4 samsung evo, almost 11,000 and around 5000 to 6000 in write speed, I added my 5th drive that I use for W10 now I'm hitting almost 13,000 with 5 drives and 7000 in write, for my 5th drive I simply used one of my pic-e 4x cards, I'm going to buy another drive to make it 6 and probably will get another squid card probably the 8x version gen 3. I copied paste 60 gigs in just 20 seconds, 180 gigs a minute , not bad right with just 5 drives, I know raid has to be even , that's why I need drive #6, no change in speed changing the stripe size from 16 to 256 , no change from HFS+ to APFS , the speed was pretty much the same the whole time, I do have some thermal pads that I bought for my 4 pci-e 4x cards , I used that instead of the ceramic thermal pads that the squid card brings, sorry for my broken down English , maybe I will upload a picture with the benchmark result later and I also a video showing proof, sorry I was very busy today, thanks joevt for your valuable information. I couldn't have done it with out you, thanks
[doublepost=1512443549][/doublepost]I have the raid in HFS+ because I'm using the card as the bootable device, last time I checked raid wasn't working with APFS, I will give it another try this week when I'll get some time

here is the picture
TEST.png
 
received the card today , is smaller than it looks in the pictures , I kind of new that but I didn't expect to be that small, I'm not complaining is actually better that is small, I hide the card real good behind my Nvidia 1080 card, the benchmark result was 10,800 with 4 samsung evo, almost 11,000 and around 5000 to 6000 in write speed, I added my 5th drive that I use for W10 now I'm hitting almost 13,000 with 5 drives and 7000 in write
Which raid utility/driver did you use? SoftRaid gave me better results than Apple's DiskUtility.

for my 5th drive I simply used one of my pic-e 4x cards, I'm going to buy another drive to make it 6
What's your CPU and motherboard and graphics card? What's in each PCIe slot? You have to know how the slots are connected to decide where to put your drives and how many you can get full performance from.

and probably will get another squid card probably the 8x version gen 3.
I don't think the 8x is much cheaper than the 16x. Let us know what price quote you get for that. The 16x will work in an 8x electrical slot if the slot is open or if the slot is 16x physical.

I know raid has to be even , that's why I need drive #6
I don't think so. Raid 0 works best when the drives are the same speed. The number of drives shouldn't matter.
Look at my results for 1,2,3,4 drives: 3198, 6165, 8733, 11141 MB/s. The increases are not the same but they are pretty close:
+3198, +2967, +2568, +2408
There is little penalty from using an odd number of drives. Looking at the decreasing of the increasing:
-231, -399, -160
Well, the increase dropped the most going from 2 to 3 drives, so maybe there is some penalty? But a sample of one is not enough to make a conclusion. My Hackintosh motherboard wouldn't let me get max performance from more than 5 fast NVMe drives anyway (CPU x16, DMI/PCH x4).
 
I know about soft raid but is not a free app, I used disk utility, while soft raid might give better performance I'm not going to buy the app unless is a very nice jump in speed, when I get a chance I will try to make a raid volume with soft raid just to run a quick test to make a comparison, the free trial will do just fine for that task

yes I know how my pci-e lanes works and how to connect them , my graphic card and the m.2 card they are both running at 16 x each , I still have 2 slots available on the board but I can only use 1 slot if I want to keep my graphic card and the m.2 card at 16 x each , if I use two 4x pic-e card just for the moment they will do just fine but my graphic card will be downgraded to 8 x only because I'm using all four slots
16x 16x 8x = 40 , this is the configuration that I want = 3 way
8x 4x 16x 4x while this one is not 40 lanes but if I connect anything to the second slot it will downgrade the graphic card or what ever device I have connected to the first pci-e

right now I'm using 16x 16x 4x I need to replace the 4x card for and 8x card and not install anything on the 2nd pic-e
so I can keep my graphic card and m.2 card both at 16x

here is a picture

PCI-_E.png


yes I was thinking about getting another 16x and run my graphic card at 8x
so I can have both m.2 card at 16x with 4 drives each maybe I can reach 20,000
but I think I will be happy with 14,000 or 16,000 and I don't want to sacrifice my graphic card speed
that's why I settle for just a 8x only

yes is not that much difference but is like 200 dollars less

I also know about the penalty that you talk about
let me see how can I explain this
but this is something that is normal in raid and also sli
you never get the exact double is always a little less and while you keep adding
yes it will increase all around but it will decrease individually

let me give you an example but I'm sure you already know this
if I run a single evo I get 3000 in read and 1500 in write
but if I put 2 in raid0 then that will take like 100 or 200 mbps less from each drive
for a total of 5800 or 5600 in read and 2800 or 2600 in write
if you add more disk let say 3 then it will be like 300 or 400 less from each drive and so on
the more drives you add the slower each drive will be
of course when you add them together they will add up to an incredible speed
but each drive is loosing speed or giving up some of the speed

what I notice is that my 4x pci-e is really holding back the 16x m.2 card
I installed just to run a test but I just want to get rid of it and get another amfeltec card

I had the quote for the 8x gen 3 but I erased by mistake
but It was like 300 to 400 something like that
but the 16x is almost 600
so is a 200 dollar save plus 2 drives less that's another 300

yes I know that for raid the drives has to be the same speed, but I have always seen raid like dual channel memory
is not just the same speed but is better to have the same model that is how it works best

is not the same to have 2 memories of the same speed than have 2 equal perfectly match ram modules
same brand, same speed , same size , same model = dual channel mode

while some people might use 2 different modules , same speed , same size is not the same
so raid is the same, it works better with the same model

raid0 works best with even numbers
I know that maybe you might not agreed with this and that is completely fine but is true

2-4-6-8 etc

yes you can use 3-5-7 etc but is always one drive short or one drive extra
there are some raid types that requires uneven numbers of drives
as soon as i order my 6th drive then I will run another test
but I will only use two pci-e 4x cards temporary because I don't want my graphic card running at 8x
I will do it just to run a test until I get the other amfeltec card
but if I get the money I will go for the 16x even if I have to run it at 8x
just to be future proof
[doublepost=1512481763][/doublepost][QUOTE="Well, the increase dropped the most going from 2 to 3 drives[/QUOTE]
that just explained what I told you about the uneven numbers
 
Last edited:
I had the quote for the 8x gen 3 but I erased by mistake
but It was like 300 to 400 something like that
but the 16x is almost 600
so is a 200 dollar save plus 2 drives less that's another 300
Are you talking about the 2 M.2 card (which can be x4 or x8) or the 4 M.2 card (which can be x8 or x16)?
It's hard to imagine that changing the connector would cost $200. Maybe the switch is also changed?
The 2 M.2 card probably uses a 16 lane switch for x8. It could use a 12 lane switch for x4.
The 4 M.2 card uses a 32 lane switch for x16. I supposed it could use a 24 lane switch for x8.
 
You have basically two choices (if you want to use all your lanes):

x16 graphics
x24 m.2 (x8*3, or x16 + x8)

x8 graphics
x32 m.2 (x16*2, or x16 + x8*2)

Going to x8 graphics will reduce frames per second by only 2%. You can find benchmarks that tell you that.
Going from x24 m.2 to x32 m.2 will increase hard drive speed by up to 33%. Even if it were more like 20% it might still be worth it.
 
  • Like
Reactions: iamtheonlyone4ever
You have basically two choices (if you want to use all your lanes):

x16 graphics
x24 m.2 (x8*3, or x16 + x8)

x8 graphics
x32 m.2 (x16*2, or x16 + x8*2)

Going to x8 graphics will reduce frames per second by only 2%. You can find benchmarks that tell you that.
Going from x24 m.2 to x32 m.2 will increase hard drive speed by up to 33%. Even if it were more like 20% it might still be worth it.
thanks for the reply, I can't use all 4 lanes because if I do then only one of the pci-e slot will be 16x (pci-e slot #3) and all the others will be 8x

the only way I can get dual 16x is if I do the 3 way configuration and don't connect anything to pci-e slot #2 even if I connect a 1x pci-e card in the second pci-e slot that will reduce slot 1 to 8x

your numbers are correct but there is a limitation on my board if I use the 2nd pci-e slot
and if I connect anything in there then I will only have one 16x slot
the only way I can get dual 16x on the board is if I don't use pci-e slot #2

I had two identical Nvidia 970 with a 5820k 28 lanes cpu and I had them running at 8x each on sli because I had another 8x card, rocket raid 4520 with 8 identical ssd on raid0

first I sold the ssds and the raid card and bought all 4 samsung 960
then I sold the 5820k and bought the 5930k to change from 28 lanes to 40 lanes
then I was able to use both graphics card at 16x each and yes in some games the difference between 8x sli and 16x sli was minimal just a few frames but in some games the difference was around 20 fps

when I find out that a single 1080 out perform 2 970's , I sold both of the cards and bought the 1080
and anyway Microsoft was changing somethings in sli and anyway I only use windows to play some games
I only could use both card in Mac for rendering using compressor and final cut pro
but there is no sli in Mac OS, so I decided to get a more powerful card and say good bye to sli
the 2nd card was pretty much useless in os , it was only useful for encoding or rendering as long as the app had support for dual GPU like FCP
I was very upset when I find out about the 3.5 gb instead of 4 gb I did got the settlement money
30 dollars for each card , not much but at least I got something back

when I wrote the m.2 card or the 16x m.2 i meant the amfetec squid card, I just didn't want to write the wrong name since I had little trouble remembering the name correctly
 
Last edited:
when I wrote the m.2 card or the 16x m.2 i meant the amfetec squid card, I just didn't want to write the wrong name since I had little trouble remembering the name correctly
I am still confused. Amfeltec has more than one PCIe 3.0 squid card:
They have a quad (four) m.2 card than can have an x16 connector (the one we use) or an x8 connector.
They have a dual (two) m.2 card that can have an x8 connector or an x4 connector.
They have a single m.2 card with x4 connector (we're not discussing that here).
So my question is, what x8 card are you referring to for that price quote that you don't have and that is cheaper than the quad x16?

your numbers are correct but there is a limitation on my board if I use the 2nd pci-e slot
and if I connect anything in there then I will only have one 16x slot
the only way I can get dual 16x on the board is if I don't use pci-e slot #2
My numbers take that into consideration. I will break them down here:

1) Option 1: x16 graphics
x24 m.2 (two or three cards):
3-way: x16 + x8
4-way: x8 + x8 + x8​

2) Option 2: x8 graphics
x32 m.2 (two or three cards):
3-way: x16 + x16
4-way: x16 + x8 + x8​

The amfeltec card can be x16 quad or x8 dual. You can use an x16 quad as an x8 dual.
The 4-way requires you to have 3 amfeltec cards. So you probably want to use one of the cheaper 3-way methods.

in some games the difference between 8x sli and 16x sli was minimal just a few frames but in some games the difference was around 20 fps
So this means you would use the 3-way method with x16 graphics to keep those 20fps? In that case, I would consider a second x16 quad card for the x8 slot, and just put two drives in that to fill your x24 m.2. This way you can try the x8 graphics with x32 m.2 option later.

I was very upset when I find out about the 3.5 gb instead of 4 gb
What does this refer to?
 
hi joevt
maybe I'm not explaining good or my English is not that good and you are not understanding me
let me see if I can explain a little better
your first question
answer = squid gen 3 8x
but the line that you quote from me to ask this question i was referring to the card that I just bought the squid gen 3 16x

the second line, to not complicate things here , I think I'm going for another 16x squid gen 3 even if I run it at 8x but I think I will have both amfeltec running at 16x each then the Nvidia card at 8x still a powerful card even at 8x

I know my wife will kill me but after tasting a little over 10,000 I think I'm going to double that, now I'm aiming to 20,000 but I will definitely stop there, that's a very good speed not to mention that to get to that speed is little bit over 2,000 dollars
8 m.2 drives and 2 amfeltec cards

third line ok in that line there I think there is a confusion, I was talking about my benchmark results when I had 2 identical Nvidia 970's cards, the difference between running 2 cards in SLI at 8x vs running both cards at full 16x .there always been an assumption that the difference between 8x and 16x is minimal, ok let me how can I explain this
is obvious that 16x is double the bandwidth of 8x and a single card is not the same as SLI
maybe some games has better optimization than others, use more cores ,etc
so maybe the results wasn't really about the bus speed 8x vs 16 and the difference in the results was due to the game being optimized better than other games
that's why not all games show the same pattern but in some games the difference was more than just 2%
now I have a single card, a 1080 , if you go to you tube you will see that this single card will beat both 970's in sli
maybe not by much but at least I have more power in Mac OS because Mac could only use one single card
and 2 cards or dual GPU only for rendering in FCP or compressor, that's why I decided to sell the 970's and get the 1080, also less power consumption and less heat

I will go for this , 2 amfeltec gen 3 16x both at 16x and my Nvidia card at 8x but before I'll do that I have to install windows and run a few benchmarks test to compare the difference in speed between 8x and 16 on a single card
remember I don't have sli now, so sli will not interfere with the results, it will be the same game with the same single card the only difference will be the pci-e bus speed 16x vs 8x I just hope is not more than 10 fps
3 to 5 frames is alright I can live with that, if it turns out to be like that then I will do both amfeltec at 16x and the graphic card at 8x

the last line it is known as the Nvidia fiasco, the 970 were advertise as a 4 gb card but they really wasn't
4gb they were actually 3.5 instead of 4
3.5 fast and .5 slow , then there was a law suit against Nvidia and Nvidia decide to give everybody that bought any of those card a compensation to settle the dispute
since I bought 2 cards I received a 60 dollar check
now I have a 8gb graphic card

now I'm going to break the 5 drive raid0 so I can use one of those 960 to install windows to do the bench test in the graphic card

for now I will only use the amfeltec card with the 4 samsung 960 for Mac OS until the other card arrives
by the time that 2nd card arrives I will have 6 samsung 960
the last 2 will have to wait a little but at least I will have the 2nd card that is more expensive
and is a little harder to get
the other 2 drives I can order any time after Christmas
probably at the end of January
then I will upload the result with 2 amfeltec cards at 16x and 8 samsung 960
thanks for everything
 
Last edited:
maybe I'm not explaining good or my English is not that good and you are not understanding me
let me see if I can explain a little better
your first question
answer = squid gen 3 8x
but the line that you quote from me to ask this question i was referring to the card that I just bought the squid gen 3 16x
Maybe I'm not explaining good:
"Squid gen 3 8x" is an insufficient description, because the following two Squid items have an 8x option:
http://amfeltec.com/products/pci-express-gen-3-carrier-board-for-4-m-2-ssd-modules/
http://amfeltec.com/products/pci-express-gen-3-carrier-board-for-2-m-2-ssd-modules/

Here is a list of all the Squid items:
http://amfeltec.com/squid-pci-express-carrier-boards-for-m-2-ssd-modules/?view=list

So I was wondering if you got the quote for an 8x card that takes two m.2 drives, or if you got the quote for the 8x card that takes four m.2 drives. I'm guessing your answer is the former, because the latter only has a connector change (8x) which shouldn't be much more expensive than the 16x connector.

third line ok in that line there I think there is a confusion, I was talking about my benchmark results when I had 2 identical Nvidia 970's cards, the difference between running 2 cards in SLI at 8x vs running both cards at full 16x .there always been an assumption that the difference between 8x and 16x is minimal, ok let me how can I explain this
is obvious that 16x is double the bandwidth of 8x and a single card is not the same as SLI
maybe some games has better optimization than others, use more cores ,etc
so maybe the results wasn't really about the bus speed 8x vs 16 and the difference in the results was due to the game being optimized better than other games
that's why not all games show the same pattern but in some games the difference was more than just 2%
Sorry, I didn't consider the SLI option. I was thinking only of single card graphics. It's possible that SLI may benefit more from 16x because the software needs to upload all the textures to both cards? Or some other reason... Anyway, I was asking if you wanted to use the better performance of x16 (which exists for single card or SLI, but you're just using single card because of macOS, because it takes fewer slots, because it is faster than two slower SLI cards, because its cheaper, etc) or if the performance of x8 would be good enough for you. You've answered that now.

Are all of your Samsung 960 EVO? Is there much point getting the 960 Pro?
960 EVO has good performance, but not as good as 960 Pro. However, the price per performance is better with the EVO. So it depends on how much money you're willing to part with. Search for benchmarks to determine that price per performance ratio.
 
  • Like
Reactions: ekwipt
which shouldn't be much more expensive than the 16x connector.

It costs the same; which is why practically no one orders with the x8 (or x4) uplink connector.
[doublepost=1512663516][/doublepost]
It's possible that SLI may benefit more from 16x because the software needs to upload all the textures to both cards?

SLI does not benefit from x16 link and most CPUs do not allow this anyway; even Skylake only has 16 GPU lanes. In fact, to get 16/16 SLI, you needed a Nforce chipset until some years ago, same as for the rare x4 SLI enabling.

Rerouting SLI via PCH lanes is however very ugly (eg. on Z270 with 24 lanes off the PCH) unless both cards are on the same. Notably interesting "problem" boards for this are the SR-2 and certain X58 platforms with dual 5520 chipsets.

TL;DR is that for SLI x8 is plenty, and required by Nvidia, while CF will also work fine on x4 - CF relies a LOT more on the bus so x4 is a bad idea, SLI with the HB bridge has no need to PCIe bus much. I run SLI with 2 1070s at x8 2.0 and downgraded to 1.1 (thus effectively x4 2.0, 1.1 x8 *does* allow SLI, x4 2.0 not) all drop i see in games seems just random as always.

For computing use (like mining, not reading/writing GPU memory from host) even a single unshared x1 lane is often overkill.
 
ok this starting to get very interesting, I'm going to ask for another quote then I will post the price here
but I think the price quote that I had was for squid gen3 8x two m.2
but that quote was after I received and installed the first card that I ordered
joevt the first link that you provide is the one that I bought, gen 3 16x up to 4 m.2

I think there won't be much difference between 16 and 8x on a single card but the new 1080 can go a little higher than previous cards , so maybe there might be a little bit more difference or impact in performance from that generation upward

I couldn't install windows to run the graphic card benchmark because 10.13.2 came out yesterday so I was upgrading Mac OS, but today later on I will install windows to run the benchmark for the graphic card

in the past i had 2 Samsung 950 pro but the first driver from macvidcards was a little buggy so I returned the drives
later a better solution came out but needed to be upgrad it every time apple released an update for Mac OS
finally high Sierra solved that problem by adding native 3rd party m.2 support

yes I did wanted the 960 PRO they are a bit faster in both read and write but they are also a bit more expensive so in dollar for performance I think the EVO is the best bang for the buck, but of course PRO's are faster but for the little bit of extra speed you have to paid almost double the money, that's why I settle for the EVOs
also if it would be just one or two drives then I would have gone with the PRO'S but since I wanted 4 drivers for raid0 then I had to go with the evos because I also needed money for the 16x m.2 card, so what I saved by buying the EVO's instead of the PRO's I used that money to buy the card, keep in mind all these parts are expensive and after you start adding it can go up to thousands of dollars, so while some of them you might consider them cheap but once you put all of them together then cheap goes flying out the window

but for me since I had the 950 pro's and the 960 EVO is even faster than a 950 PRO then it felt like an upgrade anyway
ok I did setup the raid0 yesterday with just the squid card 16x gen3 and the 4 EVO and I removed the 5th drive to install windows later, with that out of the way these are the results and also what I find out about the penalty like joevt call it

since is too much trouble for me to go 1 by 1 to see how much it drops every time I add a drive
I simply made the calculation with all four drives

what it should be and what it is
but we all know that when we add drives together in raid0 it will never be the exact double the speed
but at least I know how much I'm loosing individually and how much each drive is giving up in speed in order to run and mantain the raid array

Write

1500x4=6000

1400x4=5600

100 mb less from each drive


Read

3000x4=12000

2700x4=10800

300 mb less from each drive

ok here is the tricky part since I'm going for another squid 16 x gen3 4 m.2
and most likely I will run both cards at 16x
let just say that both card will run around the same speed
1 card with 4 evos in raid0 running at 16x
and another card also with 4 evos running at 16x

ok since each card already paid for their penalty they should run about the same speed , each card individually
1 tb raid0 each card, but what would happen if instead of creating two 1 tb raid0
I create a 2tb raid0 using both cards and all 8 960 evo
will that create a new penalty if I add alll of them together
this is the million dollar question
maybe it will maybe it won't
that is what in want to find out, to close this up
lets just say that both cards will run at 10800 like my first one normally do
then by adding both of them together , that will be 21600 , if no penalty is added because each card already paid for the penalty individually but that doesn't mean that a new or a little higher penalty will be add it if I put both of the card together, anyway if no extra penalty is add it then it will be around 21,600 but if and extra penalty is additional then it will be less , probably like 20, 000 either way I'm happy as long as it doesn't go less than 20,000

hope you understand what I'm trying to say some times I don't complicate things, they are just bit complicated or complicated to explain but after you read and analyzed then you understand that they re not complicated at all

side note
it does bothers a little knowing that I could it gone a bit higher by buying pro's instead of evos but 20,000 is more than enough, so I'm happy after all even with the evos

thanks everybody
 
Last edited:
Maybe I'm not explaining good:
"Squid gen 3 8x" is an insufficient description, because the following two Squid items have an 8x option:
http://amfeltec.com/products/pci-express-gen-3-carrier-board-for-4-m-2-ssd-modules/
http://amfeltec.com/products/pci-express-gen-3-carrier-board-for-2-m-2-ssd-modules/

there is only one true 8x card , the other one is really 16x , yes the 16x can also be run at 8x but is really 16x
the 8x card can also be run at 4x
the confusion is that you are seeing the card compatible speed and not the card true speed
so when I said the 8x I meant the one that runs natively at 8x and not the 16x that can be downgrade to 8x
but in a way you are also correct, they both can run at 8x
but one is 8x native while the other one is really 16x but can also be downgraded to run at half of it's maximum speed in this case 8x
we are both correct is just depend on the point of view
but I finally understood what you were trying to say
but the quote that I had was for the 8x gen 3 two m.2 only
I will get another quote right now because I will be ordering the 2nd card very soon
probable next week
I'm going to order another 16x but for the sake of science I will also ask for a quote for the 8x gen 3 two m.2

I think there is no point in ordering the 16x one to run it at 8x because you won't be able to reach the maximum speed running at 8x if you are using 4 drives, specially Samsung 960
maybe with older ahci version that are slower like Samsung sm951
4 of those won't saturate the the 8x pic-e lane

I think running the 16x at 8x will saturate the pci-e with just 2 drives or 3 the most using Samsung 960
the only reason I see for a person that has 4 Samsung 960 to buy a 16x to run it at 8x is because the person wants a little bit more storage space but it won't be able to utilize the speed from the other 2 drives, only the space

with four Samsung 960 running at 16x you will be able to use the full speed and also have all the available space
 
Last edited:
SLI does not benefit from x16 link and most CPUs do not allow this anyway; even Skylake only has 16 GPU lanes. In fact, to get 16/16 SLI, you needed a Nforce chipset until some years ago, same as for the rare x4 SLI enabling.
iamtheonlyone4ever's motherboard does have two x16 slots because the CPU has 40 lanes. So if you're saying SLI doesn't benefit from multiple x16 links because they don't exist then that's wrong. Are you saying SLI won't use x16, or that x16 doesn't improve anything? Probably you mean the latter.

since is too much trouble for me to go 1 by 1 to see how much it drops every time I add a drive
I simply made the calculation with all four drives
This was really simple for me. I used SoftRaid. Created a raid, did the benchmark, unmount, deleted the raid. Repeated for raidx1, raidx2, raidx3, raidx4 without restarting the computer. I repeated the benchmark to make sure I was getting the best score (repeated until the numbers stopped going up).

I think there is no point in ordering the 16x one to run it at 8x because you won't be able to reach the maximum speed running at 8x if you are using 4 drives, specially Samsung 960
maybe with older ahci version that are slower like Samsung sm951
4 of those won't saturate the the 8x pic-e lane

I think running the 16x at 8x will saturate the pci-e with just 2 drives or 3 the most using Samsung 960
the only reason I see for a person that has 4 Samsung 960 to buy a 16x to run it at 8x is because the person wants a little bit more storage space but it won't be able to utilize the speed from the other 2 drives, only the space

with four Samsung 960 running at 16x you will be able to use the full speed and also have all the available space
The best reason the quad at x8 is better than the dual at x8 is space. If the user isn't using raid then the user can still get full performance from each of the 4 drives or from two raids with two drives each.

But there is also a slight speed benefit from a quad at x8 over the dual at x8. If the drive uses only 2/3 of an NVMe connection (x4), and the card has bandwidth for 2 NVMe connections (x8 / x4 = 2) to the slot, then you can connect 3 drives in raid to fill that (2/3 * 3 = 6/3 = 2).
 
  • Like
Reactions: iamtheonlyone4ever
Quotation (Prices in USD)

---------------------------------------------------------------------------------------------------------------------

SKU Item Qty Unit Price

---------------------------------------------------------------------------------------------------------------------

SKU-086-32 SQUID PCIe Carrier Board for up to 2 M.2 SSD modules 1 $297.12 USD

(M.2 key M) Gen 3

(x8 PCIe upstream interface, full size bracket)

---------------------------------------------------------------------------------------------------------------------

(Shipping not included)


SKU Item Qty Unit Price

---------------------------------------------------------------------------------------------------------------------

SKU-086-34 SQUID PCIe Carrier Board for up to 4 M.2 SSD modules $487.35 USD

(M.2 key M) Gen 3

(x16 PCIe upstream interface, full size bracket)

---------------------------------------------------------------------------------------------------------------------

(Shipping not included)

This was really simple for me. I used SoftRaid. Created a raid, did the benchmark, unmount, deleted the raid. Repeated for raidx1, raidx2, raidx3, raidx4 without restarting the computer. I repeated the benchmark to make sure I was getting the best score (repeated until the numbers stopped going up.

yes I done this millions of times, but since I was thinking about physically removing and adding the drives I forgot all about it, is just that I have many things to do at once and for some reason I didn't see what was right in my face the whole time, haha yes you are right, that's the way to do it , I also do that to test the stripe size but I can only do that when the raid is not bootable or if I have a spare backup after I finished testing then I build the raid and use carbon copy cloner to clone the single drive to the raid aray

now that I think about maybe the reason why I didn't do it is simply because I wasn't interested in testing each drive individually, either that or simply because I already had a bootable raid setup and I didn't want to break it
on top of everything I had to upgrade to HS 10.13.2 last night so maybe I was a little overload

now just finished installing and upgrading windows so I can run the video card benchmark in windows, I could do it in HS but I just want to test in both system

anyway I included the quotes for each card
 
Last edited:
amtheonlyone4ever's motherboard does have two x16 slots because the CPU has 40 lanes. So if you're saying SLI doesn't benefit from multiple x16 links because they don't exist then that's wrong. Are you saying SLI won't use x16, or that x16 doesn't improve anything? Probably you mean the latter

Read what i wrote. I said it makes no sense, not that it is not possible. SLI runs on any x8 or higher config with whitelisted BIOS, CF on anything x4. x16/x16 is nothing, a quad E7 system can get you 128 PCIe lanes in x16 slots with 4 way x16 SLI at full CPU lanes but it has no improvement.

The x8 vs x16 has been tested long and since PCIe 3.0 which is at x8 faster than x16 2.0 it is pointless.

His CPU gen is also a weird one, where the 28 lane SKUs will reroute PCIe in weird ways, and only 40 are useful.

Lastlx, a 1080 Ti or even P100 Tesla will not use x16 fully - it certainly can, by memory speed, as could any HBM based card, but there is no end-user usage case for this even more so as local RAM is accessed faster than even x16 PCIe 3.0 and just slughtly faster in quad channel configs.
 
the squid card gen 3 with 4 m.2 running at 8x
5300 write
6400 read

the same card with the same drives , same stripe size etc
the only difference now is that the card is running at 16x
5600 write
10800 read

in this case is there is an increase on performance going from 8x to 16x
unlike the graphics cards that maybe won't benefit much going from 8x to 16x

I did test the graphic card but to avoid any conflict I'm not going to talk about the graphic card anymore
and since the topic is about the Amfeltec card
then from now on I'm sticking to the topic which is the Amfeltec card

ok moving on

my 6th drive arrives today
I already have a bootable clone drive with HS 10.13.2
so I can break the raid and create a new on with all six drives
using 2 extra spare pci-e 4x cards until the 2nd Amfeltec 16x gen 3 up to 4 m.2 drives arrives

as soon as I get the second card with the last 2 drives then I will run the last benchmark with both cards Amfeltec cards at 16x and all 8 drives on raid0

my goal
20,000 read speed
10,000 write speed

200 gb copy paste per minute
 
Last edited:
The x8 vs x16 has been tested long and since PCIe 3.0 which is at x8 faster than x16 2.0
In #214, I got about 100 MB/s faster from PCIe 2.0 x16 than from PCIe 3.0 x8 from the NVMe drives. Maybe graphics cards are different because they don't need x16?

Maximum throughput (disregarding overhead):
PCIe 3.0 x8 is 8 * 8 GT/s * 128/130 / 8 bits/byte = 7.88 GB/s
PCIe 2.0 x16 is 16 * 5 GT/s * 8/10 / 8 bits/byte = 8 GB/s

the squid card gen 3 with 4 m.2 running at 8x
5300 write
6400 read

the same card with the same drives , same stripe size etc
the only difference now is that the card is running at 16x
5600 write
10800 read

in this case is there is an increase on performance going from 8x to 16x
Good stuff. Those are similar to my numbers in #214 .

unlike the graphics cards that maybe won't benefit much going from 8x to 16x

I did test the graphic card but to avoid any conflict I'm not going to talk about the graphic card anymore
There's no conflict. We all agree that PCIe 3.0 x16 adds very little or nothing to graphics performance. We are discussing the history, extent, and details of that nothingness. You could add to that by showing numbers you've obtained. The topic of PCIe bandwidth and it's relationship to storage and graphics performance is interesting as it influences your choice of resource allocation in a limited system.
 
  • Like
Reactions: iamtheonlyone4ever
In #214, I got about 100 MB/s faster from PCIe 2.0 x16 than from PCIe 3.0 x8 from the NVMe drives. Maybe graphics cards are different because they don't need x16?

This is, most likely, encoding overhead on the change to 128b on 3.0 - in THEORY 2.0 is a bit faster as you noted, but the reality where you actually need the speed (and the improved random access times especially) is negligible.

Yes, GPUs generally do not need x16, in single or multi configs, outside of very specific workloads that are likely better served on Quadro/Firepro anyway that offer *enterprise driver based* performance increases more than chip things.

Most mainboards sadly have the x16 slots hard wired on HEDT (X99, X299) so you cannot switch x8/x8 and get the often x8 only slots up to x16 - PLX cards like what Amfeltec does are a solution as we see here, but not a cheap one.

Notably if you use a PLX to convert 2.0 host to 3.0 device (not 3.0 host to 2.0 device which is flawless) you are doing something PLX does absolutely not recommend - this extremely taxes the chip with the 128b conversion needed both ways, cutting performance even on the 48/96 lane SKUs.
[doublepost=1512779752][/doublepost]
Now this is interesting...it makes me wonder what the "engineers" at Apple are thinking at times. There is plenty of space for more slots, but they'll decided it wasn't worth it.

Keep in mind this is lane sharing. You share a SINGLE x4 2.0 link FROM THE PCH (NOT CPU, NOT 3.0) with 3 x4 PCIe devices.

Apple, overall, does not like lane sharing like this as it cuts performance.
 
Notably if you use a PLX to convert 2.0 host to 3.0 device (not 3.0 host to 2.0 device which is flawless) you are doing something PLX does absolutely not recommend - this extremely taxes the chip with the 128b conversion needed both ways, cutting performance even on the 48/96 lane SKUs.
My tests of the gen 3 NVMe drives in my gen 2 Mac Pro (#214) didn't seem to have any problems. I suppose I have to read a gen 3 PEX data book to understand the implications of what you're saying or are these recommendations (and explanations for those recommendations) available elsewhere?

It sounds like you're saying that gen 3 to gen 3 switching doesn't require 128b conversion? Or more likely the receiver needs to decode 128b to do the switching, but it doesn't need to re-encode it to pass it on to the 3.0 device? Encoding of 8b is much less taxing (simple lookup for the 10b)? In the case of a 3.0 host and 2.0 device, doesn't the PLX need to encode 128b when transmitting from the 2.0 device to the 3.0 host? How is that less of a problem then going from 2.0 host to 3.0 device?
 
Apple, overall, does not like lane sharing like this as it cuts performance.
It doesn't really cut performance - it lets you get full performance for one device (or a limited number of devices), and you have the option of sharing that performance with more devices if TB is more important *to you* than GB/sec.

Enterprise servers massively overcommit PCIe and SAS/SATA lanes because the TBs are usually more important than the GB/sec.

The standard HPE ProLiant NVMe card is a 32 lane PLX switch wired as a PCIe 3.0 x8 slot to six PCIe 3.0 x4 drives. So, yes, you're limited to *only* 8 GB/sec - but that's really a boatload of bandwidth.

If I want GB/sec - I'll connect two drives. If I want TB, I'll connect six drives.

For systems set up as file servers, the host system has two or four 48 Gbps four lane SAS connectors per controller board. Each SAS connector can be daisy-chained to up to seven shelves (12 or 25 drives per shelf - depending on whether they're 3.5 inch or 2.5 inch). That's up to 175 drives per port, or 840 TB per port. (About 13.4 PB per low end server.)

In the real world, few (if any) workloads really require that every single component in the system must run at its theoretical max bandwidth. Some systems are OK with modest overcommitment, some with huge overcommitment.

And be sure to benchmark your actual applications. You really shouldn't care what "Black Magic" says - but instead focus on how fast your workflow runs. Synthetic disk benchmarks are terrible predictors of application performance.

The people here who diss "lane sharing" or "PCIe switches" are really victims of "can't see the forest for the trees".
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.