Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

dmr727

macrumors G4
Original poster
Dec 29, 2007
10,848
6,333
NYC
I've been gradually trying to break my dependence on Office apps, and migrated some of my documents over to Pages and Numbers. So far it's worked pretty well (my needs are simple!), and other than getting used to a different interface, it's been a decent experience. But I've also noticed that the documents are far larger - a spreadsheet I use for my budget is 16 KB with Excel but 827 KB with Numbers. A resume is 19 KB with Word but 758 KB with Pages.

Obviously we're talking trivial values by today's standards in an absolute sense, but the nerdy part of me is curious about what's going on behind the scenes. I can understand some level of bloat because of Apple's insistence of making everything look 'nice', but we're talking a 50x and 40x increase in size, respectively.

Any ideas?
 
Interesting. I never noticed this difference before but I can confirm that I got 25kb Excel vs. 350 kb Numbers on the same file. With all the extra functionality built into Excel, this result is quite counterintuitive.
 
  • Like
Reactions: dmr727
I know Microsoft's latest file format (docx, xlsx, pptx, etc) are zip files. For example, if you wanted to copy all images from a PowerPoint or Word file, you can rename to .zip and simply open (or extract) and copy the images off.
 
Newer Microsoft formats having a light weight compression. So a .doc is larger than a .docx. Compare Pages file with a .doc file and I bet they are similar.

[Edit]: Yeah, the Pages and Numbers are way bigger than even the older Microsoft formats. Never noticed that.
 
  • Like
Reactions: dmr727
The MS file format being a .zip file got me curious and I opened one of the Pages documents with BBEdit, which was smart enough to detect that it was actually a container of a bunch of different stuff. And right there is the answer - the container has a lot of .png and .jpg files that add up to 709 KB of the 758 KB. A couple are preview files so the OS can show you what the document 'looks like' inside, but curiously 6 of the .jpg files are titled "PresetImageFillx" (with x being 0-5), and appear to just be variously colored large squares of noise like this - each about 80 KB in size:

PresetImageFill0-24.jpg


Dunno what they're there for, but these squares are the bulk of the problem, at least in my case!
 
I know Microsoft's latest file format (docx, xlsx, pptx, etc) are zip files. For example, if you wanted to copy all images from a PowerPoint or Word file, you can rename to .zip and simply open (or extract) and copy the images off.

Correct. Another way to do it is open Terminal, type unzip (be sure there's one white space after) and then drag the office file into Terminal and hit Enter. That way you don't have to change the file extension twice. But note this will unzip the file to your home directory, not necessarily the directory where your Office file is stored, unless you type cd (again, be sure there's one white space after) and drag the directory into terminal first.

I've used that many times to reduce bloated PPT files (normally due to high res images that don't need to be high res - this makes it easy to identify the top size culprit assets).
 
I've been gradually trying to break my dependence on Office apps, and migrated some of my documents over to Pages and Numbers. So far it's worked pretty well (my needs are simple!), and other than getting used to a different interface, it's been a decent experience. But I've also noticed that the documents are far larger - a spreadsheet I use for my budget is 16 KB with Excel but 827 KB with Numbers. A resume is 19 KB with Word but 758 KB with Pages.

Obviously we're talking trivial values by today's standards in an absolute sense, but the nerdy part of me is curious about what's going on behind the scenes. I can understand some level of bloat because of Apple's insistence of making everything look 'nice', but we're talking a 50x and 40x increase in size, respectively.

Any ideas?
How are you migrating your documents? If you are simply opening the .DOCX files in Pages and then saving them as .pages files, then I suspect that there's a whole lot of cruff embedded because of the format conversion.

A good test would be:
  1. Open the .docx version of your resume in MS Word
  2. Select all, copy to clipboard, and paste into a text editor like "TextEdit" or "CotEditor".
  3. Create a new document in Pages
  4. Go to your text editor, select all, and copy to clipboard.
  5. Go to Pages and paste.
  6. Manually format the text: font, weight, colors, etc. to match the Word version.
  7. Save the file and compare the file sizes.
 
How are you migrating your documents? If you are simply opening the .DOCX files in Pages and then saving them as .pages files, then I suspect that there's a whole lot of cruff embedded because of the format conversion.

You're exactly right - I did as you mentioned and the PresetImageFill images are gone, making the file size 205 KB. Most of that are the remaining .jpg files used for the thumbnail and 'preview' of the document. Interesting!
 
Pages doesn't compress any image you insert. I think this accounts for the large file size, mostly.
It's not just that. I have a simple text-only document I wrote in Pages (about 600 words) and then had to export to .docx for submission. The .docx file is 10KB, while the .pages file is 888KB. No images in the document whatsoever.

How are you migrating your documents? If you are simply opening the .DOCX files in Pages and then saving them as .pages files, then I suspect that there's a whole lot of cruff embedded because of the format conversion.

The document above was created from scratch in Pages. No conversion involved!
 
You're exactly right - I did as you mentioned and the PresetImageFill images are gone, making the file size 205 KB. Most of that are the remaining .jpg files used for the thumbnail and 'preview' of the document. Interesting!
I wonder how it would work in the opposite direction, opening a Pages document in Excel and saving it as an Excel file.
 
I just did a test - I created a document in Pages with the text "This is a test document." and saved it. Did the same with Word. Here are the contents of the two unzipped files (the file on the bottom is the actual document package).

Word document (12 KB zipped / 90KB unzipped)
Screen Shot 2022-08-30 at 3.07.46 PM.png


Pages document (91 KB zipped / 131KB unzipped)
Screen Shot 2022-08-30 at 12.23.49 PM.png


It looks like DocumentStylesheet.iwa is the main culprit, followed by Metadata.iwa
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.