Camera drones allow exploring remote scenes that are inaccessible or inappropriate to visit in person. However, these exploration experiences are often limited due to insufficient scene information provided by front cameras, where only 2D images or videos are supplied. Combining a camera drone vision with haptic feedback would augment users’ spatial understandings of the remote environment. But such designs are usually difficult for users to learn and apply, due to the complexity of the system and unfluent UAV control. In this paper, we present a new telepresence system for remote environment exploration, with a drone agent controlled by a VR mid-air panel. The drone is capable of generating real-time location and landmark details using integrated Simultaneous Location and Mapping (SLAM). The SLAMs’ point cloud generations are produced using RGB input, and the results are passed to a Generative Adversarial Network (GAN) to reconstruct 3D remote scenes in real-time. The reconstructed objects are taken advantage of by haptic devices which could improve user experience through haptic rendering. Capable of providing both visual and haptic feedback, our system allows users to examine and exploit remote areas without having to be physically present. An experiment has been conducted to verify the usability of 3D reconstruction result in haptic feedback rendering.
With the rapid progress of unmanned aerial vehicles (UAVs) and Virtual Reality (VR) technologies, many researchers used camera drones to assist users explore remote scenes. These systems usually provide users with an egocentric viewing, which assists users’ exploration task as they fly the drone remotely. While the range in which users could investigate places are extended by the vision from drones, these systems do not allow users to feel and examine the objects located in that area, which constrains users’ spatial understandings. Previous work proposed a technique that generates haptic feedback of the remote scene to enhance the immersive experience by recognizing the scene and mapping its spatial understanding to simple primitive haptic feedback such as circle and square. However, such simple haptic feedback are not capable of delivering spatial understanding of a complex scene to users.
In this paper, we present a new technique that enhances the user experience in the remote environment by augmenting the spatial understanding with a camera drone and mid-air haptic feedback. Unlike previous work that defined the scene information as simple primitive haptic rendering, we directly render the haptic feedback in a mid-air from the 3D reconstruction results of remote environment. Our method converts the recognized objects from 2D RGB images to 3D reconstructed objects in real-time by incorporating a Generative Adversarial Network (GAN) with drone integrated Simultaneous Localization and Mapping (SLAM). The 3D reconstructed objects are then sent to the mid-air haptic rendering device that generates according force feedbacks to users. We have conducted a preliminary experiment to confirm the enhancement of the user experience with haptic rendering result from the proposed method.
In the following sections, we will introduce the works in the field that related to our proposed method. Then, explain our system design, the report of preliminary evaluation, and the future steps of this research.
We proposed a new spatial exploration system for VR environment based on 3D reconstruction and haptic feedback. With the help of a VR drone controller, the users could navigate the drone around the remote area while real-time frontal camera view is streamed to their view. Real-time SLAM system has been integrated into the pipeline such that drone and landmark poses could be traced along time. Scene exploitation and reconstruction are realized with the help of a well-trained GAN, using the point cloud generation from SLAM as input, and reconstructed 3D voxel grids as output. The reconstructed 3D models provide a means to render complex tactile feedback to users. Compared with the existing systems that integrate similar features, our system (1) does not rely on external and depth devices and (2) induces 3D information from 2D captures which would help users explore and exploit the scene on-the-fly.
The authors would like to thank Salar H. Khorasgani, Zhiwei Liu and Zhi Qi Liu for their help on the experiment part. This project was partially supported by the Osaka University Researcher Development Fund (grant no. J18110201).