You might be surprised to find out that it understands the stream and re-encodes it to a matrixed pseudo surround output (prologic-like) that is compatible with stereo devices. Matrixing the sound is actually more complex then converting to AC3. The hacking community have clearly shown that the hardware can do this with very little overhead.
Actually this is not entirely correct. As I understand it, AppleTV only transmits the first two channels of a six-channel AAC bitstream. Those two channels, if they are from any one of most film soundtracks produced in the last 20 years, are sufficient to reconstruct a Dolby Surround analog mix. The reason this works is because the phase-shifted surround is already stereo matrixed into the Left and Right front channels.
As one who produces Dolby Digital content under a Trademark Service Agreement with Dolby Laboratories, I can tell you this for a fact: The majority of film soundtracks carry for sake of backward compatibility with Dolby ProLogic receivers a Dolby Surround mix embedded in the Left and Right front channels.
While it is not impossible to create a Dolby Surround analog mix on the fly, it also requires licensing. AppleTV may or may not contain the hardware or software to do Dolby Surround analog encoding, but a) it doesn't need to and b) they would STILL have to license the technology from Dolby Labs. Furthermore, the only "evidence" I've seen from the hacker community is their supposition as to what seems to be happening. But what I've just described to you is the more plausible and less complicated explanation: The Dolby Surround analog mix is already present in the Left and Right front channels.
That being said, if your thought is to be able to reconstruct the Dolby Surround analog mix, this makes it absolutely unnecessary to encode the audio as six-channel AAC when two-channel AAC will actually contain the same information! Don't believe me? Try it... take a few films and encode them in Handbrake 0.9.2 as AAC + AC-3. Now flip your receiver to Dolby ProLogic mode and set the AppleTV to output only the AAC 2-channel... it's not magic, and it's not the receiver interpolating. It's because Dolby Surround analog is IN that two channel mix. Another example of this can be found in the recordings of Isao Tomita, which can be purchased off iTunes. It's ordinary two-channel AAC... no special encoding required by iTunes or AppleTV. Just play it through as stereo AAC to your Dolby ProLogic capable receiver and it'll do the rest.
As far as the Dolby Digital end of things, it is actually more convoluted to have to transcode the AC-3 into AAC and then back into AC-3 again so a Dolby Digital decoder on a receiver can do something with it. It is even more convoluted to require receivers to be capable of decoding six-channel AAC... and for reasons I will point out below, a step backward in soundtrack reproduction. Of these various options, it is easiest to encapsulate the AC-3 and pass it through to the licensed decoder on a Dolby Digital receiver.
In summary, AAC gives the advantage of better file compression for the same quality, removes the need for a second sound stream and adheres to MPEG4 standards better then using AC3. Unless Apple wanted files to only play at full quality on an Apple TV, only licensing was in the way.
While it's true that AAC, developed jointly by Apple, Fraunhofer IIS and Dolby Laboratories, is a direct descendant of AC-3, and is superior to AC-3 at the same bitrates, AAC lacks certain parameters that make AC-3 more efficient and well-suited for film at those bitrates.
These parameters include:
1. Dialnorm - Dialogue normalization. In the encoding process, the mastering lab is advised by Dolby Laboratories to measure the A-weighted average loudness of the soundtrack (in -dB Full Scale, aka -dBFS). This value is input into the Dialnorm parameter during the encoding stage and stored as metadata in the AC-3 track. This metadata tells the Dolby Digital decoder what the average loudness of the track is so that from one Dolby Digital track to the next, the dialogue relative to the foley and music can be normalized. This parameter was developed for Digital Television applications so that the user would not have to constantly adjust the volume from one channel to the next to compensate for variations in dialogue levels. It has the added advantage of allowing dialogue to remain audible throughout a program no matter how loud the foley and score.
2. Dynamic Range Compression - There are several presets including Film Standard, Film Light, Music Standard and Music Light compression that are dictated by the sound engineer during mastering. This information, along with the reference monitor type (e.g. X-curve, commonly used in theatrical sound mastering) and peak monitoring level (in dB SPL), are stored as metadata. This metadata instructs the Dolby Digital decoder what profile of dynamic range compression to apply to extend the capabilities of Dolby Digital well past that of standard AAC in its ability to reproduce a wider dynamic range with minimal distortion. That is, to produce a wider range of softest to loudest sounds, while increasing or decreasing the compression above and below the baseline during different parts of the program to compensate for amplitude inconsistencies that would otherwise create distortion or drown out dialogue.
3. Low-pass filter - By default, a 20kHz low-pass filter is applied to completely eliminate frequencies above the A-weighted range (the range of human hearing) to comply with the basic principles of digital sampling techniques. Based on equations defined by Harry Nyquist of Bell Labs in the early 20th century, sampling above the Nyquist limit can induce aliased frequencies. Therefore, a 20kHz low-pass filter is applied to eliminate frequencies that could produce aliased frequencies that would falsely color the soundtrack. This procedure is also a standard practice in mastering of sound recordings when done professionally, for the exact same reason... and it is why most of the talk of aliasing amongst audiophiles is a load of nonsense.
There are other features including an RF intermodulation filter, a 120Hz high-pass filter for the LFE channel, a DC offset filter (which reduces noise induced by the 60Hz hum of DC-powered hardware), and so on.
AAC, in short, is not optimized for film soundtrack reproduction in the way AC-3 is. To encode AC-3, rather than six-channel AAC, is therefore preferable. Not only that, but because of the additional parameters mentioned above, the threshold for acoustic transparency is a lower bitrate in AC-3 than it is in AAC. AAC is superior to AC-3 only when not taking these parameters into account.