Making video communication feel more like real life

If the metaverse is an uncanny valley and standard video calls are laggy and tiresome, what about creating immersive, life-sized experiences instead?

Meta announced in October 2022 that the avatars in its virtual reality space Horizon Worlds would soon have legs. This may have been too little, too late: Media buzz around the metaverse quieted down throughout 2023. But now with the release of the Apple Vision Pro, the metaverse is back in the news — and no longer just as a world of cartoons. The new version of Facetime allows users of the Vision Pro to interact as a “digital Persona” with their friends, built on a “dynamic, natural representation of your face and hand movements.”

Opinions about the metaverse aside, Meta and Apple are both attempting to solve one of the long frustrations of humanity: the curse of distance. The most important moments of life happen as social interactions, and a key use of technology has been to collapse physical space so we can better interact with people who are far away from us. All communications technology serves this goal: even papyrus and the printing press extended thoughts to humans far away.

In the failure to invent full teleportation, communications technology ultimately must trick us to believe that we’re interacting with someone who is not next to us. Unfortunately, our brains easily detect fraud. The low fidelity of phone calls made people sound distant. As predicted in 2001: A Space Odyssey, the basic video call was a major improvement. We can easily understand talking to someone’s head and torso as if they were the host of a TV news program.

But now a decade-plus in, video calls have come to feel as limited as a phone call: We know that our experience is being mediated. And now the metaverse’s proposed next step is for us to replicate social interaction inside of a digital space, experienced immersively through goggles strapped to our heads. Where we are forced to inhabit digital personas, however, the experience descends into the uncanny valley and can feel artificial.

tonari: spatial communication for real humans

We at tonari also want to solve the curse of distance, but we’re taking a different, more human-centered approach. We don’t put the burden on users to suspend their disbelief that they are digital avatars. We want technology to get out of the way. Sitting in front of tonari, you should feel like you’re really there with someone — far beyond what can be achieved with a normal video call. This is much more difficult than just stretching a standard video call out to a big screen. Over the last few years, we’ve been required to come up with many new technical and design solutions.

Life-sized screens

In real life conversations, it’s easy to get lost focusing on the other person, but you’re always looking at them within a wider context of the entire room behind and aside them. Immersive VR gets this right: You should be able to look beyond a tiny static screen for it to feel like reality.

At tonari, we built our portals to be floor-to-ceiling. This allows people to see not only their conversation partners but also see what’s going on behind them — as if they were able to peer in to the other room through a giant doorway. And to feel like real life, the video needs to be 1:1 scale. By putting the camera in the middle of the screen, we calibrated the settings so that objects all feel the right size as if you’re really looking at them.

Another advantage to full-size screens is that it’s just more enjoyable. To be more immersed in something makes focusing much easier — probably the same reason we enjoy movies more on large screens, and music more on big speakers. It also removes the temptations to multi-task, which can be a major source of fatigue. With a large screen, you can continue smooth conversation on tonari and collaborate on your laptop, without the compromise of switching tabs or losing where you are.

Eye-to-eye contact — without distraction

In real life, we make eye contact with each other when we talk. This has been difficult to replicate with systems where the camera is on top of the screen. We put the camera right in the middle of tonari so that when you look at someone in the center of the screen, they feel your natural gaze.

Another understated part of real-life conversations is that you can’t see yourself (other than the rare cases of being in front of a mirror). Yet seeing your own image is now a standard feature on video call software. This is distracting for most people. tonari portals don’t provide a way to see yourself — just like real life — which helps you focus 100% on the person on the other side of the screen.

High resolution, low latency

Many newer devices claim to have 4K cameras, but then they have no capability to actually send that 4K-quality video to other devices in realtime. The net result is usually 1080p and 30fps at best, and typically more like 720p and 20 fps. tonari’s hardware and software are able to actually transmit 3K 60fps video and high-bitrate 48kHz stereo audio with extremely low latency and high reliability. The resulting smoothness and fidelity help it feel closer to real life.

But only being bigger and higher resolution still isn’t enough to feel natural. When talking to someone next to you, words travel at the speed of sound, which is fast enough at short distances to never notice a lag. With any modern-day communication tools, however, signals from elsewhere must travel through hardware, software, an internet connection — potentially a server in the middle — another internet connection, then back into the software and hardware of your device before getting to you. Surprisingly, traveling in fiber optic internet cables around the world is the fast part. The lag you experience with standard video calls tends to be caused by the hardware and software.

By tightly integrating the hardware and software on our own platform built from the ground up, tonari achieves sub-100ms glass-to-glass latency, which is close enough to make it feel like a real-life conversation. In our tests, the latency of standard video call services like Teams and Zoom can get as bad as 300-500ms on the very same connection. That level creates awkward timings when making jokes and causes the common problem of talking over each other. Very low latency is critical for smooth conversations. Humans are sensitive, and anything above 150ms is often enough to throw us off. tonari can easily stay under this threshold, even across longer distances like Tokyo to San Francisco or New York to London.

In other words, compared to the video calls you’re used to, tonari sends about 10 times as many pixels to your eyes every second while also being three times faster. The performance and comfort are on a whole new level, like upgrading from a bus ride to a bullet train.

Making technology better, until it disappears

Solving these technical challenges allows tonari to better remove the feeling of distance between two people. Our hope is that tonari users don’t even notice what is happening behind the scenes. They just show up, interact like normal, and have conversations. They can get up and move around or lie down on couches. The best part is that these solutions all exist in areas where the technology is certain to get better over time. Cameras will get sharper, audio will get crisper, and hardware will get faster. So, as every year passes, we are confident that the experience will feel closer and closer to real life.

If you enjoyed this and want to learn more about tonari, please visit our website and follow our progress via our monthly newsletter. And if you have questions, ideas, or words of encouragement, please don't hesitate to reach out at 👋

Find us 💙

Facebook: @heytonari Instagram: @heytonari X: @heytonari