Xbox One (Durango) Next-Generation Kinect Sensor


Launched in November 2010, the Xbox 360 Kinect Sensor set a world record for the fastest selling consumer electronic device, selling ten million units in its first four months alone. Natural user interface (NUI) experiences using Kinect have proved a compelling and inspiring feature of this console generation. Xbox One (Durango) provides the opportunity to push innovative NUI further. Every Xbox One (Durango) console will come with a next-generation Kinect sensor. Developers can integrate Kinect functionality, confident that it provides a great experience for all Durango users.

Durango Sensor

The next generation sensor improves the current sensor in many areas:

  • Improved field of view results in much larger play space.
  • RGB stream is higher quality and higher resolution.
  • Depth stream is much higher resolution and able to resolve much smaller objects.
  • Higher depth stream accuracy enables separating objects in close depth proximity.
  • Higher depth stream accuracy captures depth curvature around edges better.
  • Active infrared (IR) stream permits lighting independent processing and feature recognition.
  • End to end pipeline latency is improved by 33 ms.

Sensor Characteristics Summary vs Xbox 360 Kinect Sensor

Feature Xbox 360 Kinect Sensor Durango Sensor
Field of View (FOV) 57.5˚ horizontal by 43.5˚ vertical 70˚ horizontal by 60˚ vertical
Resolvable Depth 0.8 m -> 4.0 m 0.8 m -> 4.0 m
Color Stream 640 x 480 x 24 bpp 4:3 RGB @ 30fps640 x 480 x 16 bpp 4:3 YUV @ 15fps 1920 x 1080 x 16 bpp 16:9 YUY2 @ 30 fps
Depth Stream 320 x 240 16 bpp, 13-bit depth 512 x 424 x 16 bpp, 13-bit depth
Infrared (IR) Stream No IR stream 512 x 424, 11-bit dynamic range
Registration Color <-> depth Color <-> depth and active IR
Audio Capture 4-mic array returning 48 Hz audio 4-mic array returning 48K Hz audio
Data Path USB 2.0 USB 3.0
Latency ~90 ms with processing ~60 ms with processing
Tilt Motor Vertical only No tilt motor

Play Space and Field Of Viewise

With a 70-degree horizontal and 60-degree vertical FOV, and a depth range of 0.8 m to 4.0 m, the sensor captures a much expanded area compared to the Xbox 360 Kinect Sensor. At 0.8 m, this area is 1.12 m wide by 0.92 m high; at 4 m the area is 5.6 m wide by 4.6 m high. This much larger play space fits multiple players. Four players should fit comfortably, and the place can accommodate up to six.

The form factor for the next generation sensor will be similar to the current sensor, which is a wired unit, separate from the console. However, this sensor will not have a tilt motor. The wider vertical FOV should permit the sensor to be placed and oriented to capture a large enough area in a typical room without adjustment to accommodate the vast majority of users that are in the height range 1 m to 1.83 m. The sensor is able to detect that range at just 1.58 m away.

User studies from the Kinect program coupled with the requirement to gather well-separated depth information suggest that the best position for the sensor will be above the display, looking downward toward the players. This position maximizes the available depth information and minimizes joint occlusion for seated ST scenarios.

The improved FOV means:

–          Titles can be played in a much larger selection of homes, usually without moving furniture.

–          Complexities of dynamic play space set up and tilt motor handling are removed.

–          Gameplay with players of different heights is much easier

–          Fitting two or more players in the play area is much more practical.

Sensor Data Streams – Color

The sensor can return a full HD resolution (that is, 1920 x 1080) color stream at 30 frames per second, returned in YUY2 format. YUY2 format packs two pixels as four 8-bit components: Y1, U, Y2, V where Y1 and Y2 are individual pixel luminance values, and U and V are shared chrominance values for the two pixels. Quality and resolution are considerably improved over the current generation sensor, especially in low-light situations.

Sensor Data Streams – Depth

The sensor returns a 512 x 424 16-bit depth stream, at 30 frames per second. The bit-depth layout is exactly as the current Kinect Sensor – 13 bits of depth information and a 3-bit segmentation mask. In addition to higher resolution, the depth sensor is more precise. For example, at 3.5 m it can resolve objects two to three times smaller than the current sensor.

Sensor Data Streams – Active IR

As part of the process of producing the depth stream, the sensor uses an active IR stream. This stream is 512 x 424 at 30 frames per second. The active IR stream is stable across variable lighting conditions. For example, shadows, pixel intensities and noise characteristics are the same for a well-lit room the same as for no light in the room. As a result, this stream could be used for feature detection in situations where a color stream would be useless.

Sensor Data Streams – Registration

The depth stream is derived from the active IR stream. That means both IR and depth streams will have precisely the same point of view (POV), pixel for pixel; there is no transformation mechanism to introduce artefacts. The color sensor, however, is not in the same position as the depth or IR sensor. That means the color stream will appear to be from a slightly different POV. A registration mechanism will be provided that transforms the color stream to the view space of the depth and IR stream, or the other way around. Registration inevitably adds some minor artefacts to the stream being transformed.

Sensor Data Streams – Audio

The audio hardware is a four-microphone array, each capturing a raw 48 KHz 24-bit stream. Multi-channel echo cancellation (MEC) is carried out on these streams by the MEC hardware, that is, not by the CPU and not at cost to a title. The title is presented with a noise-reduced 16 KHz 24-bit stream of voice data. The audio output from the console itself is part of the cancellation process.

Skeleton Tracking

The skeleton tracking system on Durango will be enhanced over the Xbox 360 system with the following new or improved features.

New features:

  • Tracking of players with height of one meter.
  • One mode for both seated and standing players.
  • Detection of hand states, for example, open or closed hands.
  • Detection of extra joints, and rotations for some joints.

Impoved features:

  • Tracking of six, rather than­­­­­­ two, active players.
  • Tracking of occluded joints, for example, an elbow occluded by a hand.
  • Detection of joint positions.
  • Detection of sideways poses.

Identity System

The NUI Identity system on Xbox 360 uses a combination of sensor inputs to recognize players. On Durango, identity will work the same way, except that the active IR stream provides an additional visible light-independent input, which will make identity recognition much more robust.

Durango’s identity system will be continuously running – its allocation is part of the system reservation. For this reason, developers can think of identity as another input stream from the NUI Identity system. This significantly reduced API set makes integrating identity with titles smooth and easy.

System Allocations

On Durango, from the POV of allocations, the NUI architecture is split into two parts.

Core Kinect functionality that is frequently used by titles and the system itself are part of the allocation system, including color, depth, active IR, ST, identity, and speech.  Using these features or not costs a game title the same memory, CPU time, and GPU time. These features also provide advantages. For example, the identity system will run across application switches because it is handled by the system, not individual applications, and avoids having to re-engage and sign-in repeatedly.

Functionality used less often has its allocation managed in a pay-per-play model. For example, registering color to depth and active IR (or the other way around) as an infrequently used operation will cost the title some small amount of CPU time.

System Latency

End-to-end system latency for Kinect is measured as the time from light hitting the sensor through to the display outputting an update based on that input. Improvements across the whole system on Durango are expected to remove around 33 ms from the end-to-end time. Kinect’s CPU, GPU and memory usage on Durango are part of the system reservation.

  • Kinect is good, Developers not

    I like the concept of Kinect, but I don’t like how they wasted money (our money on the previous Kinect too), effort and time to create a new one that still has a 66ms delay between your movement and the screen, they should have sticked with the 360 version and tried to improve the software and CPU/Memory use for the 3rd Xbox. The first Kinect only needed 33ms delay and specially smart developers to add Kinect gestures to the games along with the pad, not making games only playable with Kinect or only voice commands.

    For example: Dead Space 3.

    Kinesis ability: Visceral Games should have used Kinect for a couple of movements with your hand/arm to move stuff from right to left, up to down, pull or push, and you won’t break a sweat with that.

    Inventory: To select stuff in your inventory or creating weapons, another use that won’t make you break a sweat.

    Necromorphs: A gesture at the right moment to get rid of them instead of pressing a button, again not a movement that will burn the fat of our lazy asses.

    This is how you can make a game more interactive using Kinect, and not only for voice commands, full body games or non-stop arm movements.

    PS: Kinect haters, we know you irrationally hate it, so please don’t repeat yourself posting your hate, we would like some constructive opinions, maybe developers will read us and get some good ideas.

    • eternallord

      i dnt think they care abt what we say dude

    • Nisaaru

      I don’t hate Kinect but if it’s needed for the function of the unit I won’t buy the system and even then I would waste extra money on something I would never install. Because it’s providing the infrastructure to enable surveillance of private homes with HD Cam+Microphone while being always online. It’s easy to record and transmit an audio stream and/or send video when required through remote access. Even the possibility is unacceptable to me.

      You might think this is being too paranoid but I think it’s simply pragmatic because of our reality. If a technology allows this and it’s widely spread it will be used as it’s too attractive for the NSA to pass.

      • Mik3t

        I think your being overly paranoid, there will always be the system level option to turn it off

  • DoctorFouad

    disappointing (not a huge step forward from the disappointing kinect1 technology) :
    – still too much lag (maybe due to 30fps instead of 60fps camera)
    – 30fps camera instead of 60 fps
    – not enough depth (no fingers detected separately, so still not that much less inaccurate than kinect1)

    hpefully they could improve the lag further in the future…but footprint on cpu/gpu/memory would be terrible anyway…I prefer resources being spent on better AI, physics, animations, interaction with environment and NPCs…etc

  • a11mark

    Im not a kinect guy, but playing it with my gf in an airport lounge was great. The major issue for me was having an empty 6ft by 6ft space in my living room. so hopefully this is fixed with kinect 2.0 otherwise my kinect will just be facing the wall. also my tv is up on my wall with all my wires going through the wall into a switch. I hope microsoft take this into account and come out with some extension cable. they need to think about the different viewing configurations which there will be 1000’s (not holding my breath for any of this to actually happen but I would play kinect if this did)

  • Mik3t

    Seriously if i cant buy a version without kinect im not buying one

    • sdfsf

      Even if the price of the whole package is lower than the ps4’s or at the same price?

      • Mik3t

        I just feel as a consumer it will always be something that is an “addition” tp the type of games i like to play so id rather spend that money elsewhere. So for example the durango could have maybe had a better gpu or faster memory instead of the what $50 something bill of materials for kinect

        • jasonca

          apparently you did’t see the ps4 press conference when they announced the eye (kinect clone) for the ps4 will be standard

    • willhe

      You know ps4 is coming out with a new kinect style eye coming bundle with it right? Whats the big deal? All these mentally challenge people getting all crazy. You don’t have to play games with it but using it for navigation is priceless. Or as you would like to say Bing Porn

      • Mik3t

        If its so priceless for navigation how do we get my without it, the answer is very easily. With the smartglass app you dont even need kinect you can launch games/apps from there

        Yes i saw the ps4 has a stereo camera with mic but i noticed that the gpu and superfast 8gb memory dont seem to be compromised to pay for it unlike durango (#if rumours of specs are true)

        This snt really about cost i can afford both its about compromising the base specs to get an addon in the bill of materials

    • die

      The eye is a built in option built into every controller, its the move but better done. I bought a kinect, I had to return it defective after I couldn’t get it to work in any room in my apartment. My brother has one and the only way he can use it, especially for multiplayer, requires moving a loveseat and a coffee table. Atleast the move fucking worked. The small amount of time I’ve spent with a working kinect weren’t what they show in the commercials, constantly not reading motions right, losing sight of where I was. The kinect is a broken piece of technology.

  • xyz

    So how does this compare with the new pseye? Is it way better?

    • willhe

      They didn’t show much about the ps4 eye.

    • Mik3t

      Well the pseye is stereo camera about 4 inches apart so the depth perception on that will be minimal at best. The kinect used infra red to detect depth so only needs one camera and cam also work in dark conditions as infrared doesnt need light to work (unlike for example stereo cameras)