What’s the Difference Between Speaker Tracking and Speaker Framing?

Jay Brant • Aug 06, 2021

Video Conferencing

Video Conferencing Cameras have come a long, long way in a short while. From greatly improved video quality to simpler connectivity to integration with unified communications technologies, cameras for video calls are getting better and better.

One of the primary areas of improvement is improving the naturalness of a conversation. Video conferencing, as we cover in our Video Conferencing Buyer’s Guide, is already the most natural form of distant communications available today. But there are still some hiccups that disrupt the natural feel of a video call. For example, it can be tough for the far end to pick out the active speaker in a group video call. Wouldn’t it be great to have a director in the room, adjusting the camera to pick out whoever’s speaking, to make sure there isn’t a whole bunch of wasted space on the edges of the picture, and so on.

But who can afford to have a director in the room? Let the tech do the work. We’re going to cover three technologies here that give you the experience of having a director without you having to do anything:

Automatic Speaker Tracking
Automatic Speaker Framing
Automatic Group Framing

Poly Studio E70 Video Conferencing Camera

Automatic Speaker Tracking

Automatic speaker tracking means that the camera will pick out the active speaker from a group of people and focus on them. It provides a natural flow to the conversation, allowing the far end to see the speaker and their body language better. It’ll switch between people as new people start speaking. In other words, it tracks active speakers. Often, when no one is speaking, it’ll switch to framing the entire group. We cover group framing in a second.

Speaker tracking works by integrating beamforming microphones and the camera. The microphones pick out where speech is coming from and directs the camera to focus there. The camera might physically pan, tilt and zoom or it might use digital pan-tilt-zoom. In either case, you get an experience like a director is controlling the camera.

For example, Poly Studio E70 uses two lenses each with a 20 MP sensor in conjunction with and DirectorAI technology to provide exceptional speaker tracking. 20 MP is two-and-a-half times the resolution of 4K Ultra HD, which is 8 MP. This means that the Studio E70 has plenty of overhead, so you get full detail and lossless quality.

Jabra PanaCast 20 Webcam

Automatic Speaker Framing

Automatic speaker framing is useful for personal communications, such as when you’re using a Webcam at home. It sets the picture frame, so your face is centered and cropped effectively. You can think of it as the difference between a snapshot and a headshot. With a headshot, the professional photographer situates you and their camera, so you look amazing.

Now that we’re all too familiar with video conferencing meetings where people are at odd angles or distant from the camera, you can understand why automatic speaker framing is a wonderful thing to have.

Yealink UVC84 Video Conferencing Camera

Automatic Group Framing

Similar to speaker framing is group framing. Rather than framing a single person, however, it frames the entire group. It does this by interpreting figures as people and adjusts the frame accordingly.

Group framing compensates for the different arrangements of people in a room. For example, Yealink ZVC840 uses the UVC84 camera to give you a professional Zoom video conference for large to extra-large rooms. With that many people, you can almost never get everyone sitting in the right place, just so far from the camera. So instead of looking at an awkward arrangement of people, the far end gets to see your team looking your best.

Related Blogs

Poly Presenter Mode: Video Conferencing Walk and Talk

What’s the Difference Between Speaker Tracking and Speaker Framing?

Automatic Speaker Tracking

Automatic Speaker Framing

Automatic Group Framing

Related Blogs

Ask An Expert

Free Shipping

Details:

Exclusions: