How long will this take? Latency in broadcast control

Latency in control systems for broadcast applications is a variable in producing smooth television. When you start combining the idea of controlling live systems over large distances this problem increases somewhat.

So what is acceptable latency? Well its hard to say, in live television when you want to move a camera live on air then the smaller the better. But what is the best we can get using todays technology?

This post is an exercise to find out what the shortest possible RTT (round trip time) is for controlling a device and seeing the result to complete the feedback loop for the operator.

Here is an example of moving a camera live on air, when the operator is based in London and the studio based in Hong Kong.

I will preface this in saying this is something I have had to actually develop and build. I think a Europe to Asia example should detail a worst case scenario anywhere in the world.

All timing assumptions are based on SD video using h.264, your mileage will vary (a lot) by using other types of video codec.

Latency - London <> Hong Kong

Purple - The Operator

Here we look at the cognitive psychological process.  There is some evidence  to say it takes 300ms to recognise an object so we should add in some "reaction time".

A quick google search brings up "The reaction time tester"

Here you are asked to hit the mouse button when the green light comes on.  I scored 0.2942ms over an average of 5 tries.

Online Reaction Time Test

Of course there are many other factors, such as communication overload, skill of the operator, training etc... But this is meant to be a best case study so I'll use 300ms for ideal circumstances

Blue - Video Encoding

Well seems the standard nowadays is h.264 over TPC/IP.  The RAW encode/decode process spec'd on a D9093/D9094 cicso IP codec is 580ms for PAL and 550ms for NTSC in IP low mode. Why IP low mode? Well in practical experience this is the lowest latency setting achieved causing no problems in the picture.  Any lower and you start seeing frames drop

Cisco Encode / Decode times

Green - The decode process

Well I've sort of lumped the h.264 encoding / decoding latency together in the section above, but we should still add another 40ms to take any video processing (such as video synchronisers or standards converters) into account.

Yellow - TCP/IP network latency

I'm lucky and have access to a dedicated network. Using the standard ICMP ping tool available in any OS  I get about 300ms RTT (round trip time) between London and Hong Kong. 40ms RTT between London and Milan.

Over the Internet using i get the following result between my house in the UK and a server in Taiwan (that's the nearest server  I could get to Hong Kong) - The Global Broadband Quality Test

So I think best case is 300ms.

What makes up the network latencies? Well a number of factors.

Using I get the two following results.

london to milan - Wolfram|Alpha
london to hong kong - Wolfram|Alpha

The key piece of information here is speed of light in fibre. Now of course optical fibers don't go in a straight line, and also light has to be re-clocked and regenerated. But this is pretty much the best time your ever going to get (theoretically speaking)

The rest of the delay is made up by packet inspection routers and router packet queuing.

Network Jitter is a very important subject, but is not in the scope of this post


So what is the best RTT latency time I can expect?

Well to to visualise this I created the video below, using audio tones to represent the times shown in the flow chart diagram at the top of this post.


Using current technology with real world values I think expecting any less than 1740ms  is unlikely. Especially at the video quality that broadcasters expect.  Remember too I have only used SD video as an example, expect HD to increase this time. I also had the luxury of  using a very good network, if your using the Internet (which has its own QoS issues) your at least looking at adding 200ms.

The option of using another low quality return video feed for just monitoring the video feedback may also lower this time, but using 2 telco circuits will increase the cost.