w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Chomsky-normal-form grammar extraction from a parse tree

VP|<NP-PP> is one nonterminal symbol. The vertical bar does not mean multiple options in the traditional sense. Rather, NLTK puts it there to indicate where the rule is derived from, i.e. "this new nonterminal symbol was derived from the combination of VP and NP-PP." It is a new production rule NLTK has created to convert your grammar into Chomsky Normal Form.

Take a look at the productions of the tree, pre-CNF:

ROOT -> S
S -> NP VP
NP -> DT NNS
DT -> 'the'
NNS -> 'kids'
VP -> VBD NP PP ***
VBD -> 'opened'
NP -> DT NN
DT -> 'the'
NN -> 'box'
PP -> IN NP
IN -> 'on'
NP -> DT NN
DT -> 'the'
NN -> 'floor'

Specifically, look at the rule VP -> VBD NP PP, which is NOT in CNF (There must be exactly two nonterminal symbols on the RHS of any production rule)

The two rules (7): VP|<NP-PP> -> NP PP and (8): VP -> VBD VP|<NP-PP> in your question are functionally equivalent to the more general rule VP -> VBD NP PP.

When VP is detected, rule application results in:

VBD VP|<NP-PP>

And, VP|<NP-PP> is the LHS of the production rule created, which results in:

VBD NP PP

Specifically, if you isolate the rule itself, you can take a look at the specific symbol (which is indeed singular):

>>> tree.chomsky_normal_form()
>>> prod = tree.productions()
>>> x = prod[7]  # VP|<NP-PP> -> NP PP
>>> x.lhs().symbol()  # Singular!
u'VP|<NP-PP>'




© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.