graphics programming - Why are 3d projections on a 2d screen not like what the human eye sees?

Firstly, I'm sorry if this question doesn't make much sense or is poorly written - I have almost no experience with programming, and none at all with game development. I'm also not sure it's an appropriate question for gamedev stack exchange, so sorry if that's the case.

Anyway, the question is this - what are some ways of doing the vision in a fps game?

If I understand it correctly, the problem can be stated the following way - given a 3d environment given in 3d coordinates, and a camera with some direction and certain field of vision, our goal is to somehow project the environment, to a 2d screen, similar to how our eyes do it, or similar to how a camera does it.

One way to do this is the following - whatever is exactly in the direction of the camera will get displayed exactly to the middle of our 2d screen (let's presume the 2d screen is round for simplicity, with radius 1), and the rest in the field of vision will get projected linearly based on the angle. For example, if field of vision is 90 degrees (anything 45 degrees from the direction of the camera will be in the field of vision), then things 45 degrees away from the direction of the camera will get displayed to the outer circle/edge of the 2d screen, and things 22.5 degrees away will get displayed to the circle with radius 0.5. This might be similar to how a camera or our eye does it, but I'm not sure.

Now in reality, this doesn't seem to be how it's done. To give an example, if I take something like DOOM, or I think any modern fps game, and I were to stand in front of a very long wall that's of the same height everywhere, and I were looking perpendicularly towards it, the height of the wall will be the same everywhere in my 2d projection - in reality though, or a camera for example, would display the very furthest (furthest from the center of the vision) parts of the wall to be shorter in height. But, given a small enough field of vision, the difference seems negligible.

Either way, it seems like fps games don't really consider this fact, and I want to ask - why? Maybe the difference is barely noticeable, but much more power demanding? Either way, I would appreciate an explanation or information concerning how exactly the human eye or camera projects 3d environment to a 2d picture.

Answer

If we're willing to approximate the viewer as a cyclops whose single eye has a pinhole pupil (and this turns out to be a much better approximation than it sounds like - more on that later), then I'd argue that both projection methods you describe are correct. For different shapes of screen.

Thinking of the problem angularly gives the correct result if the screen is a spherical dish, hemisphere, or globe centered on the viewer's eye.

As the viewer sweeps their gaze around the image (taking care to keep their pupil at the exact center of the sphere to not upset the theoretical mathematician who put them here), each degree of travel translates to a constant length across the surface of our curved screen.

Since the entire monitor's surface is the same distance from their eye, no part of the image is foreshortened or otherwise scaled/distorted from their perspective. So as the infinite wall recedes away from them, taking up an ever narrower arc of their visual field, it must also span an ever smaller height on the screen, until eventually its top and bottom lines - which had been parallel dead ahead - meet at a sharp vanishing point 90 degrees to the left. The lines have to be curved to make this work (specifically, straight lines in the rendered world will map to great circles on our screen. Groups of parallel lines will map to great circles meeting at a common pair of diametrically opposite poles, like lines of longitude)

Now, I don't know about you, but I don't own a spherical screen. When I play games, they're typically on a flat screen. Which means it's not a constant distance from my eye at every angle: as a feature recedes away to my left, so does the screen itself!

I'll spare you the equations for now, but if our monitor/viewport is exactly parallel to the wall, and we use linear perspective to draw the wall at exactly the same height on the screen all the way across, then from the viewer's perspective it will shrink in visual angle in exact proportion to the virtual wall it represents, as the two recede into the distance to the left.

The two parallel lines never need to meet at a point in this view, because even the absurdly-widescreen monitor I've drawn in the diagram above still has finite width. We'd fall off the edge of the screen long before we ever reached the vanishing point (infinitely long before, in fact!) If we did somehow have an infinitely wide flat screen, we'd see the vanishing point arise naturally - not because it's built into the projection of the world onto our screen, but because the projection of distant screen in our eye affects lines on the screen in exactly the same way any other lines in the world, including the original wall we're rendering.

Of course, if we turn our viewport (game camera) so that it's no longer parallel to the wall, then linear perspective will replace those parallel lines of the wall with ones that literally converge to a vanishing point on our screen, as we'd expect, and again the net effect of the size of the feature on screen and the distance to the screen itself will exactly match the right amount of shrinkage/foreshortening for a given angle.

This is because both projections work by considering where a line from the viewer's eye to each feature point on the object (the cyan lines in my diagrams above) intersects the surface of the screen, and that's where they place the image's corresponding feature point. Since these lines point directly out from the eye, they don't travel across our visual field - each one maps to precisely one point in the 2D space of our vision, no matter the depth. So as long as we draw each feature on the same "eye ray" as its original, it will be placed in the correct part of our visual field for whatever kind of screen we're intersecting with / displaying on.

To see the equivalence another way, imagine nesting these two diagrams - so the spherical screen is displaying an image of the flat screen displaying an image of the wall, or vice versa. I used the same eye rays to feature points in both cases, so they'll line up exactly, and we get the same image on the spherical screen using angular rendering whether we're trying to draw the wall itself, or the image of the wall we captured on the flat screen (if we could extend it to an infinite flat plane).

So, neither linear perspective nor this spherical/angular projection is a "more correct" way to project the scene, they just depend on our rendering intent. The spherical projection has to work a little harder to "bake in" all of the perspective shrinkage effects for every screen normal direction instead of just one normal in the linear case, though in exchange it's able to render a complete wraparound view if you're lucky enough to have hardware that can display it. :)

So, using linear perspective on a flat monitor is no more approximate than using spherical perspective on a spherical monitor. They're both geometrically correct for the simplified viewer we set out to model.

It's not necessary to model the roundness of the retina, the angular swivel of our gaze, or the distortion of our lenses in the image we render on the screen, because you're already viewing the screen with your angularly swiveling, lens distorting, round-retina real eyes! So all that gets applied as a "post-post effect" by our own physiology. ;)

All we need to do - in either model - is correctly plot on the screen what you'd see if your view ray continued "through" the surface of the monitor in a straight line and hit the corresponding point in the virtual world on the other side.

Now of course, having two eyes and the ability to move in space complicates this - a single image will be correct from only one viewpoint. So in non-stereoscopic games we typically choose a compromise viewpoint that's "good enough" - and the approximation of the projection comes from that choice of viewpoint, not from the perspective projection math that uses this that viewpoint as an input. Stereoscopic games for VR headsets or the 3DS have better ability to locate the player's actual eyes relative to their display and show each one a custom-tailored image, so they can get much closer.

The last detail is that our pupils aren't infinitely narrow pinholes, so we can only focus on a single focal plane at a time, and experience depth of field blurring closer and further away, in addition to lens aberration effects. This affects the sharpness of features in our visual field, but not the location of the center of their circle of confusion, so again it's not a perspective projection approximation, and the pinhole cyclops turns out to be a pretty decent approximation for figuring out where to draw stuff. XD

One last note:

There had briefly been a comment below asking about the stretching we see at the edges of the screen in some games using linear perspective. Although it's deleted now I think it raises a point worth describing in a bit more depth:

As mentioned above, both of these styles of projection are only geometrically correct when viewed from one point, where all those "eye rays" intersect.

Since most games don't do head-tracking to move this point dynamically to match your actual position (though the effect can be dramatic and extremely convincing when they do!) they tend to pick this viewpoint based more on the aesthetics of the game and the informational needs of the gameplay.

Shooters tend to push this further, since having more peripheral vision available can be a life-or-death matter in these games. So they'll choose a large field of view value for their camera - that corresponds to a hypothetical viewer whose eyeball is very close to the screen, so they have to rotate their eye further to sweep their gaze from center to edge. When viewed from this idealized close position, the extra stretching at the edges of the screen gets foreshortened by the extreme angle at which we're viewing it, and the effect is correct - like standing in the right place for an anamorphic chalk painting, the perspective of the screen and the image on it exactly compliment each other and the scene snaps into correct perspective.

But if we move back from the screen to a more comfortable play position, we're bringing more of its area into the narrow forward cone of our vision, where it's more perpendicular to our view and less foreshortened than the math was made for, and the distortion becomes more apparent.

Fortunately, our brains are pretty malleable, and players will tend to get used to even fairly significantly exaggerated FoV values when playing on a flat screen, so we tend to mainly notice the effect in stills. This does not go the same for VR though, where matching the field of view to the actual device viewing conditions is critical for maintaining player comfort, so don't push your FoVs everywhere. ;)

Blog

Friday, March 15, 2019

graphics programming - Why are 3d projections on a 2d screen not like what the human eye sees?

No comments:

Post a Comment

Simple past, Present perfect Past perfect