There is a reason why Photoshop and ICE don't offer georeferenced outputs: it is really hard (nearly impossible) to reconcile to cumulative error introduced into the stitch job without taking 3D effects into account.
At this point you can either get georeferenced 3D reconstruction based stitching, which will almost always exhibit the artifacts you are pointing out, or a simple 2D feature matching based stitching that is not georeferenced. I am not aware of anything that does a hybrid.
The issue is perspective and parallax. Yes, in your picture on the right, the image looks great, but what if it had been taken at a more oblique angle? Then it would look all wonky and not match up well with the next image over.
Video pulls are too compressed and low res to be used reliably in 3D reconstruction based techniques. Yes, 4K video has the resolution but if you grab a frame and compare it to a still it ends up being equivalent only after reducing the resolution down by a factor of 16 or so.


