IoT Machine Vision Finds New Perspective with Eye-Tracking Technologies in Sight

Subscribe To Download This Insight

By Ryan Martin | 3Q 2016 | IN-4223

The first (computers), second (TVs), and third (mobile) screens continue to grow in size, while their successors head in the other direction. Wrist-based, head-worn, and other wearable devices benefit from this modality due to the more glanceable and interactive experiences they’re designed to deliver. The challenge is that smaller screens compromise the capacity to communicate and consume information, and, up until recently, there were few commercially-available alternatives to bridge this gap.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

Not All Devices Are Created Equal

NEWS


The first (computers), second (TVs), and third (mobile) screens continue to grow in size, while their successors head in the other direction. Wrist-based, head-worn, and other wearable devices benefit from this modality due to the more glanceable and interactive experiences they’re designed to deliver. The challenge is that smaller screens compromise the capacity to communicate and consume information, and, up until recently, there were few commercially-available alternatives to bridge this gap.

I/O Optimization and Recurrent Neural Nets Add Meaning

IMPACT


The "Optimal Recognition Point" (ORP) is the point within a word to which a reader’s eyes naturally gravitate before the brain starts to process its meaning.

In reading traditionally-formatted, line-by-line text, the eyes jump from one word to the next, identifying each ORP along the way, until a punctuation mark signals the brain to pause and make sense of it all. This is one of the reasons why it’s hard to recite the alphabet or a song backwards; the components that make them whole are learned in sequence.

When working with sequential data—which is often the case in IoT—positional memory matters.

Regular neural nets are generally oriented around fixed-size inputs and outputs based on the unidirectional flow of input data in the hidden layer (e.g., a feedforward neural network). Recurrent neural nets (RNNs) incorporate the concept of memory. To do this, a combination of input data and hidden-layer information from each timestep is used as an input for the previous timestep, recursively. It’s this hidden recurrence that adds both the context and the backend framework that advanced analytics and machine learning rely on, spanning everything from handwriting and image recognition to speech and natural language processing (NLP). The same overarching methodology is also getting attention from companies like Google (via DeepMind, acq. 2014) as the company’s AI-inspired Garage projects make their way into medical, industrial, and retail operations, among others.

Extending the Familiarity of Tap, Touch, and Talk to Eye-Tracking Technologies

COMMENTARY


Smell may be the first of the perceptible senses, but the eye is the fastest moving organ in the human body. While the first, second, and third screens have historically exercised the potential of this organ to digest output information, the fourth (wearables) presents an attractive first foray for sight as an input.

Companies such as CA-based startup Eyefluence are on an ambitious, yet attainable, quest to capitalize on this notion by bringing eye-tracking technologies to wearable AR/VR devices, which have otherwise been constrained to tap, touch, and talk-based interactions. With eye tracking, the idea is to use vision as a vehicle to measure intent (the same hardware/software can also be used to enable new applications, such as iris-based authentication).

Alternatives, like gesture control, may be well-served in the interim given the nascent state of eye-tracking tech. The longer-term outlook is for this kind of functionality to compliment, rather than compete with, the various components that make more spatially-aware, contextual computing solutions an IoT reality. The fundamental difference between such modalities is that one is based on existing hardware but requires the user to learn, initiate, and engage with the device for the interaction to occur (e.g., gestures); the other (eye tracking) is based on forthcoming hardware but allows the user to achieve the same ends by creating value from something they’re already doing—just look.

Services