The Ultimate Parser - Parsey McParseface

Syntaxnet by Google is an open source neural network framework and is implemented on Tensorflow. Tensorflow is an open source software library used for numerical computation using data flow graphs. It was originally developed for the purpose of machine learning and deep research on neural networks. Since a long time Google has been working on how can a computer system analyse and understand human language. Finally, it has released all the codes related to Syntaxnet and also Parsey McParseface, a parser which can be used to analyze English text. It is built on a machine learning algorithm which can analyze the structure of words in a language and therefore, provides a strong foundation for Natural language Understanding Systems.

What is Parsing or Syntactic Analysis?

The word ‘Parsing’ originates from the Latin word ‘Parse’ which means parts of speech. It is the process of analyzing a string of objects which can either be in natural language or computer language. In computational analysis, it refers to the analysis of a sentence or string of words by breaking them down to parse trees which shows their syntactic relationship with each other. Syntaxnet is a syntactic parser. When fed with a sentence, it will analyze it the way a human does; in terms of grammatical syntaxes, parts of speech etc.

Let us consider a sentence: Alice saw Bob

Here, saw is the verb, Alice is the subject and Bob is the direct object. This is how Parsey McParseface analyzes the sentence.

Source : https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

Let’s see how Parsey McParseface wilL deal wIth this more complex form of sentence

This sentence has the object Alice and the object Bob. Alice is modified by the verb reading and the verb saw is modified by the noun yesterday. The basic questions like “Whom did Alice see?” or “When did Alice see Bob” can be answered easily.

Why is parsing difficult for computers?

The most difficult problem is that human speech involves a great deal of ambiguity. A sentence of about 15-20 words can have a large number of possible syntactic combinations and unlike humans, it is difficult for the computers to conclude the most fitting structure accurately. While parsing the relationships between verbs, nouns and clauses are measured; but as the sentence tends to become more lengthy determining the possible forms of relationships become more complicated.
For example : The following sentence has two kinds of dependency parses
Source : https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

The first sentence reaches the conclusion that Alice is driving in her car, the verb drive corresponds to Alice and the noun Alice corresponds to the car. But, in the second case, there is a misinterpretation where the preposition in connects street to the car giving rise to an ambiguity called prepositional phrase attachment ambiguity.

Syntaxnet applies neural network to solve the process of ambiguity. An input sentence is analyzed from left to right with dependencies increasing with the no of words. A particular weight is attached to each possible outcome and the one with the highest score is considered as the probable one. Beam search which is a heuristic algorithm that uses breadth-first search to build its binary tree is used in this model. Instead of choosing the first hypotheses that comes along, the ranking of hypotheses according to the weight in increasing order is done and the one with the highest weight is chosen. Parsey McParseface and other SyntaxNet models are some of the most complex networks that have been trained with the Tensorflow framework. Parsey McParseface is able to depict the dependencies in English words with almost 94% accuracy. This is a huge feat considering that the accuracy is slowly approaching human performance. Perhaps the model would be more accurate with the implementation of complete world knowledge and in all languages.

Source : Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source