Read The Times Australia

Daily Bulletin

Listen to me: machines learn to understand how we speak

  • Written by: The Conversation
imageYour smartphone is learning to better understand your voice commands.Flickr/Kārlis Dambrāns, CC BY

At Apple’s recent World Wide Developer Conference, one of the tent-pole items was the inclusion of additional features for intelligent voice recognition by its personal assistant app Siri in its most recent update to its mobile operating system iOS 9.

Now, instead of asking Siri to “remind me about Kevin’s birthday tomorrow”, you can rely on context and just ask Siri to “remind me of this” while viewing the Facebook event for the birthday. It will know what you mean.

Technology like this has also existed in Google devices for a little while now – thanks to OK Google – bringing us ever closer to context-aware voice recognition.

But how does it all work? Why is context so important and how does it tie in with voice recognition?

To answer that question, it’s worthwhile looking back at how voice recognition works and how it relates to another important area, natural language processing.

A brief history of voice recognition

Voice recognition has been in the public consciousness for a long time. Rather than tapping on a keyboard, wouldn’t it be nice to speak to a computer in natural language and have it understand everything you say?

Ever since Captain Kirk’s conversation with the computer aboard the USS Enterprise in the original Star Trek series in the 1960s (and Scotty’s failed attempt to talk to a 20th-century computer in one of the later Original Series movies) we’ve dreamed about how this might work.

Even movies set in more recent times have flirted with the idea of better voice recognition. The technology-focused Sneakers from 1992 features Robert Redford painfully collecting snippets of an executive’s voice and playing them back with a tape recorder into a computer to gain voice access to the system.

But the simplicity of the science-fiction depictions belies a complexity in the process of voice-recognition technology. Before a computer can even understand what you mean, it needs to be able to understand what you said.

This involves a complex process that includes audio sampling, feature extraction and then actual speech recognition to recognise individual sounds and convert them to text.

Researchers have been working on this technology for many years. They have developed techniques that extract features in a similar way to the human ear and recognise them as phonemes and sounds that human beings make as part of their speech. This involves the use of artificial neural networks, hidden Markov models and other ideas that are all part of the broad field of artificial intelligence.

Through these models, speech-recognition rates have improved. Error rates of less than 8% were reported this year by Google.

But even with these advancements, auditory recognition is only half the battle. Once a computer has gone through this process, it only has the text that replicates what you said. But you could have said anything at all.

The next step is natural language processing.

Did you get the gist?

Once a machine has converted what you say into text, it then has to understand what you’ve actually said. This process is called “natural language processing”. This is arguably more difficult than the process of voice recognition, because the human language is full of context and semantics that make the process of natural language recognition difficult.

Anybody who has used earlier voice-recognition systems can testify as to how difficult this can be. Early systems had a very limited vocabulary and you were required to say commands in just the right way to ensure that the computer understood them.

This was true not only for voice-recognition systems, but even textual input systems, where the order of the words and the inclusion of certain words made a large difference to how the system processed the command. This was because early language-processing systems used hard rules and decision trees to interpret commands, so any deviation from these commands caused problems.

Newer systems, however, use machine-learning algorithms similar to the hidden Markov models used in speech recognition to build a vocabulary. These systems still need to be taught, but they are able to make softer decisions based on weightings of the individual words used. This allows for more flexible queries, where the language used can be changed but the content of the query can remain the same.

This is why it’s possible to ask Siri either to “schedule a calendar appointment for 9am to pick up my dry-cleaning” or “enter pick up my dry-cleaning in my calendar for 9am” and get the same result.

But how do you deal with different voices?

Despite these advancements there are still challenges in this space. In the field of voice recognition, accents and pronunciation can still cause problems.

Because of the way the systems work, different pronunciation of phonemes can cause the system to not recognise what you’ve said. This is especially true when the phonemes in a word seem (to non-locals) to bear no relation to the way it is pronounced, such as the British cities of “Leicester” or “Glasgow”.

Even Australian cities such as “Melbourne” seem to trip up some Americans. While to an Australian the pronunciation of Melbourne is very obvious, the different way that phonemes are used in America means that they often pronounce it wrong (to parochial ears).

Anybody who has heard a GPS system mispronounce Ipswich as “eyp-swich” knows this also goes both ways. The only way around this is to train the system in the different ways words are pronounced. But with the variation in accents (and even pronunciation within accents) this can be quite a large and complex process.

On the language-processing side, the issue is predominantly one of context. The example given in the opening provides an example of the state of the art in contextual language processing. But all you need to do is pay attention to a conversation for a few minutes to realise how much we change the way we speak to give machines extra context.

For instance, how often do you ask somebody:

Did you get my e-mail?

But what you actually mean is:

Did you get my e-mail? If you did, have you read it and can you please provide a reply as response to this question?

Things get even more complicated when you want to engage in a conversation with a machine, asking an initial question and the follow-up questions, such as “What is Martin’s number?”, followed by “Call him” or “Text him”.

Machines are improving when it comes to understanding context, but they still have a way to go!

Automatic translation

So, we have made great progress in a lot of different areas to get to this point. But there are still challenges ahead in accent recognition, implications in language, and context in conversations. This means it might still be a while before we have those computers from Star Trek interpreting everything we say.

But rest assured. We are slowly getting closer, with recent advancements from Microsoft in automatic translation showing that, if we get it right, the result can be very cool.

Google has recently revealed technology that uses a combination of image or voice recognition, natural language processing and the camera on your smartphone to automatically translate signs and short conversations from one language to another for you. It will even try to match the font so that the sign looks the same, but in English!

So no longer do you need to ponder over a menu written in Italian, or wonder how to order from a waiter who doesn’t speak English, Google has you covered. Not quite the USS Enterprise, but certainly closer!

Michael Cowling does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.

Authors: The Conversation

Read more http://theconversation.com/listen-to-me-machines-learn-to-understand-how-we-speak-42812

Business News

Why Choosing the Right Bollard Supplier Matters for Australian Businesses and Public Spaces

From busy CBD streetscapes to sprawling warehouse loading docks, bollards have become one of the most essential safety and security fixtures across Australia. Whether protecting pedestrians from veh...

Daily Bulletin - avatar Daily Bulletin

Why Modular Content Is Transforming Modern Marketing Teams

Modern marketing teams are expected to produce more content than ever before. They need to support websites, landing pages, email campaigns, social channels, product pages, sales enablement material...

Daily Bulletin - avatar Daily Bulletin

Everything You Need to Know About Getting Support from Optus

Whether you've been an Optus customer for years or you've just switched over, at some point you'll probably need to contact their support team. Maybe your bill looks different from what you expected. ...

Daily Bulletin - avatar Daily Bulletin

The Marketing Strategy That’s Quietly Draining Sydney Business Owners’ Bank Accounts

Sydney businesses are investing more in digital marketing than ever before. The intention is clear. More visibility should mean more leads, more customers, and steady growth. However, many business ...

Daily Bulletin - avatar Daily Bulletin

Why Mining Hose Solutions Are Essential For High-Performance Industrial Operations

In environments where the ground itself is constantly shifting, breaking, and being reshaped, every component must be built to endure. Mining operations are among the most demanding in the industria...

Daily Bulletin - avatar Daily Bulletin

The Reason Talented Teams Underperform

If you’re in business, you might have seen it before. A team of capable and smart people just suddenly slows down, and things start spiraling out of control. On paper, everything looks perfect, but ...

Daily Bulletin - avatar Daily Bulletin

Why More Aussie Tradies Are Moving Away From Paid Ads

Across Australia, a lot of tradies are busy. There’s no shortage of demand in industries like plumbing, electrical, landscaping, and building. But being busy doesn’t always mean running a smooth or...

Daily Bulletin - avatar Daily Bulletin

Why Careers In The Defence Industry Are Growing Rapidly

The defence sector has evolved far beyond traditional roles, opening doors to a wide range of opportunities across technology, engineering, intelligence, and operations. This is where defense industry...

Daily Bulletin - avatar Daily Bulletin

Strategic partnerships to enable global acceleration for Aussie fashion brands: SHEIN Xcelerator launches

SHEIN Xcelerator is introducing a more agile, demand-led operating model, allowing brands to scale while retaining control over creative direction and identity. For fashion brands, the pressure t...

Daily Bulletin - avatar Daily Bulletin

The Daily Magazine

Australia’s Best Walking Trails and the Shoes You Need to Tackle Them

Australia is not short on spectacular walks. You can follow ocean cliffs in Victoria, cross ancien...

Why Pre-Purchase Building Inspections Are Essential Before Buying a Home in Australia

source Have you ever walked through an open home and started picturing your furniture, family d...

5 Signs Your Car Needs Immediate Attention Before It Breaks Down

Car problems rarely appear without warning. In most cases, your vehicle gives clear signals before...

Ensuring Safety and Efficiency with Professional Electrical Solutions

For businesses in Newcastle, a safe and fully functioning workplace remains a key part of day-to-d...

Choosing The Right Bin Hire Solution For Hassle-Free Waste Management

When it comes to managing waste efficiently, finding the right solution can save both time and eff...

Why Cleanliness Is Critical In Childcare Environments

Children explore the world with curiosity, often touching surfaces, sharing toys, and interacting ...

What to Look for in a Reliable Australian Engineering Partner

Choosing an engineering partner is rarely just about technical capability. Most businesses can fin...

How to Choose a Funeral Home That Supports Families with Care

Choosing a funeral home is rarely something families do under ideal circumstances. It often happen...

Why Premium Coffee Matters in Modern Hospitality Venues

In hospitality, details shape perception long before a guest consciously evaluates them.  Lightin...