Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

electronpusher

macrumors newbie
Original poster
Nov 17, 2008
1
0
Hello, all

I'm parsing in large XML files (e.g. 50-200mb) and I find that the memory usage both during the parse and after releasing the NSXMLDocument or the NSXMLParser is still huge.

With alloc/init of a large XML file with NSXMLDocument I find that the memory consumed is about 7-10 times the size of the file.

With alloc/init of large XML with NSXMLParser the memory consumed is about 4X the size of the file.

After releasing either the NSXMLDocument or the NSXMLParser (and thus triggering dealloc) I find that about 80% of the memory used by that object is still consumed.

I am sure that my retain/release are balanced and I'm sure that dealloc is getting triggered on both types of parsers.

I am using Activity Monitor to view the memory consumption of the process. I've also used Instruments/Object-Alloc to see what AppKit is doing with the memory, the breakdown is the following:
GeneralBlock-<some number>: is taking up about half the memory or more consumed by the NSXML parser
CFString: is taking up about a quarter
CFDictionary: is taking up about a quarter if you use NSXMLParser, if you use NSXMLDocument as the parser the remaining quarter is in GeneralBlock

It seems to me like the memory in GeneralBlock-<some number> is a buffer that the NSXML parser uses to take bytes from the file and then create Foundation/AppKit objects from.

If this is true or not, why am I not recovering all of the memory once I'm done parsing the large XML? Note: I'm not making references or retaining any of the objects created in the object graph rooted at the NSXMLDocument or NSXMLParser instance.

Is Activity Viewer giving me a false picture of the memory consumption?

Thanks in advance for help!
Code on!
-Michael C Gilson
 

kainjow

Moderator emeritus
Jun 15, 2000
7,958
7
I'm not sure if this would help but you could try adding your own autorelease pool and releasing it when the XML is done parsing.
 

garethlewis2

macrumors 6502
Dec 6, 2006
277
1
Reboot your machine.

Write down the figures activity monitor records when displaying memory, e.g. Free, Wired, Active, Inactive and Used, run your program, then write down the figures gain. I am only guessing, but I would expect that most of the memory is in the wired section. This isn't lost. It can still be used by the OS and by you when need memory allocated.
 

Catfish_Man

macrumors 68030
Sep 13, 2001
2,579
2
Portland, OR
Why on earth would NSXMLParser/Document be wiring memory?

Anyway, how are you measuring memory usage? I've found that rprvt numbers in top provide a good first estimation. If that seems to be odd, the 'heap', and 'vmmap' tools can give more detailed views (along with instruments, etc... which you've already tried). One thing to consider when looking at heap output is the percentage of the heap that's in use after freeing the NSXMLParser; it's possible that the memory allocator is not returning the ram to the system (either intentionally for performance, or unintentionally due to heap fragmentation).
 

garethlewis2

macrumors 6502
Dec 6, 2006
277
1
Even if memory is wired after use, it is still free for another program to use, it is cached, e.g, since you loaded up such a massive amount of data, you might do it again. How the hell do you think that Apps load so much quicker than the very first time they load on OS X after a reboot?
 

garethlewis2

macrumors 6502
Dec 6, 2006
277
1
I bow to your superior knowledge.

Now you can answer the posters question about why the memory is being held. I believe it to be cached. So it doesn't show up in the free pool, but it can be allocated to another task if required. But if the OP runs the original program and OS X decides to load the program into the same memory location, then it can use the cached memory.
 

Catfish_Man

macrumors 68030
Sep 13, 2001
2,579
2
Portland, OR
Certainly could be caching, either at the filesystem level (iirc OSX's fs cache is the called the 'unified buffer cache') or the memory allocator level. I think ruling out mismeasurement or heap fragmentation first would be good though.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.