Physically, our eyes do two things when we focus at different distances:
- They use muscles to reshape their biological lenses to optically focus the light from a point at a certain distance onto our retinas-- this is how we make images sharp and clear at different distances and it's the part that glasses and contact lenses help us correct when our eyes are imperfect.
- They pivot relative to each other so both eyes are looking at the same thing, this is a dominant method (among others) that our brain uses to figure out how far away something is. (Our brain is heavily involved here though, so it's more than just a simple triangulation. We also use scale, lighting, parallax and other things to infer distance which is why we can get a sense of 3D space in a 2D image).
The cameras capturing live action 3D movies have to make similar choices-- they're spaced apart a certain amount to give a sense of depth, but they have to choose a focal point which is why even though you can look around a 3D movie and get a sense of depth, only part of it is in sharp focus.
The independent displays to each eye give the sense of depth by getting our eyes to pivot relative to each other. But that display is at one apparent distance, and that's where our eyes will focus their lenses.
I say apparent distance because the optics between the eyes and the display change optical light field and put the focal point further away than the actual display. We're not focusing an inch in front of our eyes even if that's where the display is because the optics are extending the focal distance to some further point. This is why even if you're near sighted (see well close but poorly far away) you need to have corrective lenses in AVP to see the displays that are physically close to your eyes-- the optics in between changed the focus.
The focus optics are a 3D phenomenon, so there's no way to adjust it by adjusting the image on the display itself. There's been some work done on
light field imaging that capture vector light fields so you can refocus a captured image later (or extend the depth of field)-- presumably a similar technique could be done at the display to change apparent point of focus but those trade pixel density to get that effect and the AVP displays are already remarkably dense to get retina resolution at that distance. This is why some folks are looking at raster lasers to scan the image directly to your retina, because there's no focal point in that case-- that just makes me uncomfortable for probably irrational reasons, though.
Unless the optics in the AVP are motorized, there's no real way to change the focus point for each eye. I'd guess they picked a neutral distance to minimize the amount of muscle strain in the eye needed to focus at that distance-- it will be different than looking at a display close to your face where you have constantly exercise your eye muscles just to stay focused.
So, I'm not an ophthalmologist, but all the above leads me to guess that there won't be eye strain in the same way we have it reading too close to our faces. What I don't know is if the brain is sensitive to the fact that your eyes are focusing at one distance and triangulating to another. It wouldn't surprise me if it is-- we use all sorts of hints in the brain to measure our surroundings. Some of these things our brain just seems to quickly adapt to, some of them lead to distress that takes the form or nausea or disorientation.
And since I've already written too much, I'll throw this in because I think it's an interesting theory about why we get nauseous in these situations: our brains are highly evolved to the natural world and knows how to expect the natural world to behave. When we sit in the back of a car reading a book, or a headset lags our head motion, our eyes and the fluid in our ears disagree about how we're moving-- but we didn't evolve for cars or head tracking displays. What in prehistoric nature would lead to mismatched sensor inputs? Neurotoxins. So our body's response it make itself puke up the poison.