The Xbox Companion for Durango transforms the tablet and phone you already own into an input and display device for Durango to enable multi-screen gaming and entertainment experiences. As tablets and smartphones enjoy ever increasing popularity, companion enables you to tap into this huge potential by lighting up gamers’ device investments with innovative new experiences

Touch input can be a trigger. Tablets can be private strategy screens. Smartphones can serve as radar displays. Motion sensors plus the Kinect Sensor can make Xbox Companion a portal to exploring the game world. There is no limit on the imaginative possibilities with this input medium and its screen real estate.

Xbox Companion for Durango also allows gamers to stay engaged when they are away from their consoles. By using the next Xbox Companion, they can interact with their Durango consoles from any location with Internet access.


The Xbox Companion for Durango is an application that is delivered and maintained by Microsoft for use on tablets running Windows 8, Windows Phone, iOS and Android. Users can download the app from their device’s respective app stores. Xbox Companion for Durango will supersede the existing Xbox Companion, and it will subsume and extend its functionality. The term Companion, refers to a running instance of the Durango Xbox Companion application running on a tablet or phone; the term Companion device means one such device.

As on Xbox 360, we use a single tablet or phone application to host experiences that are appropriate to whatever title is running on the console. This provides the user with a common entry point for interacting with his or her console, and it also allows title developers to project their content onto these devices without needing to integrate with multiple stores and marketplaces. This application also abstracts differences between device platforms from the title, thereby giving developers a consistent, predictable, and reliable environment to work with.

Durango offers a simple experience that allows users to bind their devices to a specific console. Binding creates an association between the user of a device and a console, or between the device and the console. The functionality available on the Companion depends on what kind of binding is used—user or device. On the console, users will be able to manage the associations with the console of devices and Companion users. Binding functionality is completely handled by the system, much as it is for gamepads today.

A user who has been bound via Companion to a console can be given system privileges, including the ability to turn on the console through the Companion device, sign in on the console by starting the Companion on a device, and interact with the console from a remote location (with some constraints).

From a title’s perspective, a Companion device shows up like any other controller, and it will typically have a user associated with it. If there is no user associated, the title can prompt for sign-in. Just like a controller, if the Companion becomes disconnected, the title is notified. Users will also be able to pass around a Companion device and change the identity associated with it.

There are two ways that titles can provide Companion experiences: by remote rendering, using DirectX, or by HTML5 and JavaScript.

Remote rendering & input

Unlike typical controllers, Companion features rich output through XAudio2 and a Direct3D surface. The title renders graphics and audio to the Companion just as it does to the main screen and audio system. This output is encoded as H.264 and transmitted over a Wi-Fi connection to the device, where it is decoded and displayed.

Companion captures touch, accelerometer, gyroscope, and text input and transmits them over a Wi-Fi connection to the title for processing.

HTML5 and JavaScript

Alternatively, titles can serve HTML5 and JavaScript content to the Companion. By using the JavaScript libraries available in the Companion app, developers can design their titles to process device-side events and send & receive messages. The HTML and JavaScript are served over the Internet from the developers’ web servers to the devices, and messages are sent and received through the Companion Xbox LIVE service or over the local subnet, when it is available.


Companion scenarios rely on a Wi-Fi connection with low latency and reliably high bandwidth between the console and the Companion devices, which connects over a Wi-Fi access point by default. The data stream to and from the console is encrypted.

The user will be prompted to connect directly to the console over peer-to-peer Wi-Fi, if network conditions aren’t favorable through the access point. For the most accurate simulation of the consumer experience, do not attempt to connect in infrastructure mode if your access point is several rooms away, or if you are in a setting with high levels of Wi-Fi traffic.


Encoding and decoding of video and audio, the wireless network connection, and touch-based input all introduce latency that must be considered when designing Companion experiences. There are two important kinds of latency to consider: input and end-to-end.

Input latency is the time it takes input on the device to be dispatched to title code on the console. Input is polled every 8 ms. A user’s touch input must be processed by the device, polled, encoded, transmitted to the console, decoded, and then dispatched to the title. The device input subsystem, the network, and the console CPU are all factors that affect input latency.

End to end latency is the time from receiving input on the Companion device to the console rendering back to the device for display. This latency is affected by several factors, in addition to those mentioned for input latency: the update and rendering latency of the title, the scheduling of the hardware video encoder (which will have system tasks also running on it), the network latency, and the device-side buffering accounting for error resilience and jitter. Many of these items can be pipelined.

Given the additional latency introduced in companion rendering experiences titles should avoid interaction patterns that emphasize the latency. For example, rendering an object that is being dragged under a user’s finger will highlight latency when the object shows up behind the moving finger. The converse is also true. Using techniques like playing a sound on the console to give the user feedback is a way to improve perceived performance.

Rendering quality of graphics

The rendering quality of graphics can be affected by several related factors. Resolution and bit rate will decrease as the number of Companion devices being rendered increases (assuming all devices are updated simultaneously). A single device has a network bandwidth of approximately 6 Mbps. Bandwidth for all devices combined may be quite low, depending on the network conditions. The aggregate number of pixels per second should not exceed 720p at 30 fps, which is roughly half the title’s encoder budget.

Naturally, the content being rendered also affects bit rate and quality. Richly detailed, 3D content with a lot of camera movement consumes more of the network bandwidth and encoder budget than static, text-heavy content.

The encoding update rate also affects rendering quality. The system supports the configuration of the targeted update rate of 15, 20, 30, or 60 fps. It is okay for titles to be bursty with their rendering. If a game stops rendering to a Companion device, the system will continue updating to progressively improve a frame until it is perfect; however, these system-supplied frames consume the title’s encoding budget.

Network conditions

Real-world network conditions are variable. So, to prevent a poor experience with remotely rendered content, the latency and error rate of the network connection between the Companion device and the console is monitored. If the average round-trip latency (including encoding time) rises above a certain threshold for a long enough period, or if there are too many client requests for a full-frame refresh within a certain time window, the remote rendering connection will be dropped, and the title will be notified. When the system determines that the connection is too degraded, the Companion will prompt the user to create a peer-to-peer Wi-Fi connection with the console.

The system will dynamically adjust the bit rate and update rate in response to network conditions; the system will not decrease the resolution of the surface being rendered.

On the device side, the Companion performs a certain level of buffering to account for network jitter. Because low latency is important to the Companion scenario, the encoded stream runs at a constant bit rate and avoids H.264’s larger I-frames at the expense of being forward only. Frames are divided into slices to enable parallelization of encoding and decoding (thus reducing latency). The encoding/decoding algorithm has several layers of efficient resiliency, including slice synthesis when packets are lost. If corruption is detected, the client will request an intra-block refresh. After the refresh, the final output frame is known to be free of propagated corruption. Less error correction will be built into the audio encoding and decoding algorithms, and because of this, audio performance may glitch due to lost data.


Aside from the work required of the CPU and GPU for remote rendering, there will be no additional performance costs to the title from incorporating the Companion. The swap chain’s present semantics will align with expected behavior allowing for double or triple buffering. Encoding is done by the console’s hardware encoder, and the work comes out of the title’s video encoder budget, which is 1080p at 30 fps. In this model, there are no stalls for CPU, GPU, or the hardware video encoder.

The battery in the Companion device is another resource to be aware of in remote rendering scenarios. Decreasing the bit rate and varying the frequency of rendering (for example, by decreasing the rate to one or two frames per second when possible) will increase battery life.