A Fresh Look at Human-Machine Interaction

Subscribe To Download This Insight

2Q 2019 | IN-5504

Continuous improvement in Artificial Intelligence (AI) has ushered in a new dawn for human-machine interaction. AI vendors have not only created and deployed various solutions targeted at human-machine interaction, such as chatbots and social robots, but also contributed their underlying frameworks and tools to the open-source community. While the industry as a whole benefits from this, it is also critical to look at these developments based on the latest development in natural language understanding and processing, as well as AI ethics.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.


Rapid Growth in Edge Devices Creates Demand for Better Human-Machine Interface Solutions


The majority of Artificial Intelligence (AI) use cases in edge devices focus on human-machine interaction. Based on ABI Research’s market data Artificial Intelligence and Machine Learning (MD-AIML-103), edge devices that use AI for human-machine interaction will grow from 1.2 billion in 2017 to 5 billion by 2024. The popularity of voice assistants in smartphones and smart home speakers validates the importance of speech and text as the de facto interaction method between human and machine.

This has sparked great advancements in Natural Language Processing (NLP). At Google I/O 2018, the company demonstrated Duplex, a conversational AI system available on Google Pixel phones that can mimic human conversation and perform various tasks, such as booking restaurant reservations and scheduling appointments. A year later, thanks to its acquisition of NLP startup Semantic Machines, Microsoft showcased Cortana at Build 2019 with the capability of holding a natural conversation by assimilating and presenting contextual information in real time. At Amazon’s re:MARS 2019, Amazon unveiled Alexa Conversations, a new deep learning-based approach for Alexa skill developers to create more natural voice experiences with less effort, fewer lines of code, and less training data.

However, most non-smart speaker based chatbots nowadays are based on rule-based models, especially those deployed by finance institutions, government agencies, and airlines. At the same time, the market is extremely fragmented; Amazon and Google are cornering the smart home market, while Apple and Samsung are looking to strengthen their presence in smartphone and smart device. Microsoft’s recent effort targets enterprises, competing with startups such as Affectiva, Cortical.io, Loop AI Labs, and Nuance. In China, a whole new set of players focusing on Chinese language have emerged, such as Alibaba, Baidu, iFlytek, Rokid and Mobvoi. Despite all these advancements, there is still a long way to go in adopting machine learning and deep learning in developing chatbots that can hold natural conversations.

More Tools at Your Disposal


ABI Research believes emerging AI technologies will bring that vision to reality. As mentioned earlier, redesigning conversational AI systems based on machine learning can be a good place to start. While working on a new AI architecture is a daunting and time-consuming task, many cloud AI platform vendors, such as Microsoft, Google, and H2O, have made automated machine learning available for enterprises that wish to accelerate the time-to-market for their internal AI applications. By using supervised learning, conversational AI can improve itself based on well-annotated historical conversations and interactions. Naturally, enterprises will still need dedicated AI engineers to constantly prune and update their AI models, but the process can be more streamlined with automated machine learning.

A well-crafted, multi-channel communication strategy may be another approach that the industry can take. Instead of introducing another proprietary customer-facing interface, such as an applet or web-based chatbot, enterprises should consider leveraging popular social media applications, such as Messenger, WeChat and WhatsApp, and invest in a common AI-enabled backend that can unify and process all client interactions and generate valuable consumer insights. In doing so, enterprises can leverage the expertise of cloud companies, such as Facebook and Tencent, or unified communication service providers such as Cisco and Unified Inbox.

In addition, AI is going to become multi-modal, as mentioned in ABI Research’s multimodal learning report, AI Techniques: Multimodal Learning: Technology Development and Use Case (AN-4955). Multimodal learning in AI will further enrich human-machine interaction, as it will no longer be restricted to speech and text. Images, videos, and even physical touch and senses can be integrated into the entire interaction. This type of interaction is already happening in the robotics industry, whereby collaborative industrial robots can work in conjunction with human employees. Equipped with Red, Green, and Blue (RGB) and Time-of-Flight (ToF) cameras with built-in rule-based or convolutional neural network-based machine vision models, these robots can be taught, configured, and adjusted in accordance with the need of the work process. Of course, most of these interactions are happening in isolated manufacturing environments and handled by trained professionals, unlike consumer devices. Nevertheless, having extra methods to interact with machines will make the user experience even more robust and richer.

The Industry Needs to Resolve the AI Ethics Challenge


Despite all the advancements, there are significant barriers when it comes to deployment and adoption, and most of them are more ethically than technically oriented. Right now, the entire AI industry is facing a trust deficit. When Google first introduced Duplex, there were many questions surrounding its potential misuses and abuses. Similar links can be drawn between the banning of facial recognition by public agencies in San Francisco and the lack of deployment of Amazon Web Services’ (AWS) Deep Lens in the United States. Similarly, earlier this year, OpenAI unveiled GPT-2, a text generator AI based on unsupervised machine learning that uses a transformer neural network, the impressive text generation capabilities of which have created fear around the development of malevolent chatbots that can scam the general public.

Conversational AI vendors need to be transparent with their human-machine interaction solutions. Any vendor that uses consumer data for AI model training will need to adhere to a common framework for data security, privacy, and protection. AI models deployed by vendors need to be explainable, robust, and incorruptible. The unfortunate episode involving Xiaoice, a Chinese social chatbot from Microsoft, must be avoided by all means. The Microsoft chatbot turned racist and extreme as it engaged with internet trolls, which is the perfect example of unsupervised learning on deep neural network becoming corrupted and failing to abide to social norms. Unfortunately, it will take some time for the industry to agree on a common action plan. In Europe, the General Data Protection Regulation (GDPR) plays a critical role in safeguarding data privacy and preventing abuses, but more AI-focused frameworks are needed. While several countries have released their AI ethical frameworks, such as the United States, the United Kingdom, the European Union, and Singapore, most of the current frameworks have yet to be extended to NLP.

The commercial forces for more robust human-machine interaction will always exist. As mentioned in our previous insight, The American AI Initiative Highlights the Need for AI Governance on a Global Scale (IN-5408), the industry therefore needs to agree on a common framework that governs the safety and ethics aspects of human-machine interaction. It is paramount for end users to know that the human-robot interface they are using are harmless, ethical and comply with all the legal requirements and good industry practices. Until all these roadblocks are cleared, the road toward a seamless and immersive human-machine interface across all devices will remain bumpy.