Distance from sensor to the lens I would think equates physics, but I'm open to being corrected if that's not the correct term.
The smallest 1080p sensor / lens array I'm aware of commercially is almost twice as thick as the MB camera and costs a small fortune (in webcam terms).
I for one would welcome a return to the external iSight camera for those who really "need" better cameras.
I usually avoid using my background as an argument, but half of my PhD was optics, and I have both designed optical systems and lectured optics. From that perspective, I would like to try to shed some light on what physics and engineering says about image quality and camera size.
Physics sets two hard limits:
1. The amount of light entering the system depends on the physical aperture (lens size in millimetres). The number of photons entering the system sets a lower limit to the quantum noise (shot noise, "graininess") of the image due to the quantum nature of light.
2. Image sharpness depends on the physical aperture ("diffraction limit") due to the wave nature of light.
Modern camera systems are diffraction-limited, i.e. they are limited by the second limit (not engineering). Modern camera elements are quite close to the theoretical maximum efficiency (a photon hitting a pixel very likely produces a charge carrier), as well, but there is a more room for improvement. Not easy and not an order of magnitude by any means.
Let's see some comparisons between the diffraction limit and the MBA camera module. I am afraid I have to guess most specifications for the module, but these are my rather optimistic guesstimates:
focal length: 1.5 mm
f-number: f/2
number of pixels: 1280 x 720
field of view: 50°
With the given f-number the diffraction limit on the sensor is approximately 2.5 um (Airy disc first minimum, green light.)
On the other hand, the image on the sensor is approximately 1.3 mm x 0.7 mm (FOV and focal length). When this is divided by the number of pixels, the pixel size should be around 1.0 um. This seems reasonable, smallest pixels sizes in any camera elements are just in that range.
Note that the pixel size is actually well below the diffraction limit. Smaller pixels would not give any more information, just more noise. No point adding pixels with such a small aperture (physical aperture 1.5 mm / 2 = 0.75 mm).
Coincidentally, the same problem (small physical aperture) causes image noise as very few photons find their way into the optical system. So, we could try to keep the focal length intact and increase the aperture. This both collects more light (less noise) and gives a sharper image.
Unfortunately, this is easier said than done, and here we enter the engineering part. With classical lens-based optics it is possible to get up to f/1 apertures depending on the field of view. However, doing that while maintaining the physical size of the optical system is very hard, and even with larger lenses the image quality usually suffers. Apple has worked extremely hard with the f/1.6 lens in iPhone 12PM.
In theory, it should be possible to use non-classical optics or combine images from a large number of sensors. Non-classical solutions (based on diffractive optics or even on negative refractive index metamaterials) are theoretically possible but years and years from being useful in this application. Combining image from several sensors is an interesting opportunity, but there would need to be a lot of them, and that would cause a lot of other problems.
It might be interesting to compare different setups. As I hate the MB webcam image "quality", I often use a Logitech StreamCam as a quick replacement. The image quality difference is day-and-night. The Logitech lens is specified as f=3.7 mm, f/2 (1.85 mm aperture). As the focal length is 3.7/1.5 = 2.5-fold and aperture number similar, the lens collects approximately 6 times (2.5^2) as much light as the built-in webcam, and the optical resolution is approximately 2.5 times better. And that shows.
Sometimes I have tried to use EpocCam and iPhone. My iPhone XSmax seems to have a f=4.25 mm, f/1.8 lens (2.4 mm aperture), which is again somewhat better than the StreamCam (more than the aperture number would indicate, but that comes from other factors). But when I really need decent image quality, I use a D7500 DSLR with a zoom lens. The lens is not a fast one, but at f=35, f/5.6 the physical aperture is 4.5 mm. The light collecting area is thus (4.5 mm / 0.75 mm)^2 = 36-fold compared to the built-in webcam. That is a huge difference, and with the 6-fold increase in sharpness, as well, it shows.
The sensor in D7500 produces a 2160p image, but I downscale it to 720p. Crisp, sharp, well-illuminated 720p is good for anything short of creating HD videos.
Now, there are things that can be done in image processing, and the M1 MBA utilises those. The image quality may become visually more pleasing, but there is no more information in the image. If someone really wants 1080p, the image can be super-scaled by using super-resolution algorithms, but it won't look any better. And noise is poison to those algorithms.
So, due to physics and known engineering limits, you can do the following:
1. Get enough light from the right direction. This makes the image tolerable.
2. Get a good webcam. Plus of course keep illuminating.
3. Get a real camera.
Apple could make the lid thicker and use a thicker module.
From my point of view there are two completely missed easy opportunities. Apple really should make it so that an iPhone could be used as a webcam without unstable third-party solutions. DSLR and compact camera manufacturers should create UVC protocol USB interfaces on their cameras; then any camera could be plugged in as a webcam. A $300/300€ compact camera would be a fabulous webcam with zoom and aperture control.