LING 1330/2330 Introduction to Computational Linguistics, University of Pittsburgh

Parser Review

Homework 8

(S
  (NP (DET the) (ADJ big) (N bully))
  (VP
    (VP (V punched) (NP (DET the) (ADJ tiny) (ADJ nerdy) (N kid)))
    (PP (P after) (NP (N school)))))
(S
  (NP (DET the) (ADJ big) (N bully))
  (VP
    (V punched)
    (NP (DET the) (ADJ tiny) (ADJ nerdy) (N kid))
    (PP (P after) (NP (N school)))))
(S
  (NP (DET the) (ADJ big) (N bully))
  (VP
    (V punched)
    (NP
      (NP (DET the) (ADJ tiny) (ADJ nerdy) (N kid))
      (PP (P after) (NP (N school))))))

Also, discuss the following parsing result, for s10 "Homer and his friends from work drank and sang in the bar".


                                     S                                  
             ________________________|_______________                    
            NP                                       VP                 
        ____|____________________                ____|________           
       NP                        PP             VP            PP        
   ____|________             ____|___       ____|____      ___|___       
  NP   |        NP          |        NP    VP   |    VP   |       NP    
  |    |     ___|_____      |        |     |    |    |    |    ___|___   
  N   CONJ DET        N     P        N     V   CONJ  V    P  DET      N 
  |    |    |         |     |        |     |    |    |    |   |       |  
Homer and  his     friends from     work drank and  sang  in the     bar

                                         S                                  
             ____________________________|_______________                    
            NP                                           |                  
   _________|_______________                             |                   
  |    |                    NP                           VP                 
  |    |         ___________|________                ____|________           
  |    |        |                    PP             VP            PP        
  |    |        |                ____|___       ____|____      ___|___       
  NP   |        NP              |        NP    VP   |    VP   |       NP    
  |    |     ___|_____          |        |     |    |    |    |    ___|___   
  N   CONJ DET        N         P        N     V   CONJ  V    P  DET      N 
  |    |    |         |         |        |     |    |    |    |   |       |  
Homer and  his     friends     from     work drank and  sang  in the     bar

                                     S                                      
             ________________________|_______________                        
            |                                        VP                     
            |                               _________|________               
            NP                             |    |             VP            
        ____|____________________          |    |     ________|___           
       NP                        PP        |    |    |            PP        
   ____|________             ____|___      |    |    |     _______|___       
  NP   |        NP          |        NP    VP   |    VP   |           NP    
  |    |     ___|_____      |        |     |    |    |    |        ___|___   
  N   CONJ DET        N     P        N     V   CONJ  V    P      DET      N 
  |    |    |         |     |        |     |    |    |    |       |       |  
Homer and  his     friends from     work drank and  sang  in     the     bar

                                         S                                      
             ____________________________|_______________                        
            NP                                           VP                     
   _________|_______________                    _________|________               
  |    |                    NP                 |    |             VP            
  |    |         ___________|________          |    |     ________|___           
  |    |        |                    PP        |    |    |            PP        
  |    |        |                ____|___      |    |    |     _______|___       
  NP   |        NP              |        NP    VP   |    VP   |           NP    
  |    |     ___|_____          |        |     |    |    |    |        ___|___   
  N   CONJ DET        N         P        N     V   CONJ  V    P      DET      N 
  |    |    |         |         |        |     |    |    |    |       |       |  
Homer and  his     friends     from     work drank and  sang  in     the     bar

Probabilistic Grammar

A tree generated by such a grammar will have its own probability distribution score, which is used for ambiguity resolution. A PCFG parser computes the probability score of the entire tree, which can then be used as a basis for ranking between multiple tree structures. The Viterbi parser nltk.ViterbiParser() is such a probabilistic parser.

But how does one compute the probability of an entire tree? It is essentially the product of the probabilities of the component trees. That is, it can be obtained by multiplying the individual probability of each CFG rule used in the tree. Therefore, the probability of 0.064 for the sentence "Jack saw telescopes" in this example is derived as follows:

Given this, you should be able to see how a resource such as a syntactically annotated treebank can be useful. The probability estimation of a particular context-free rule can be obtained from the corpus. Interestingly enough, in Penn Treebank 'NP -> NP PP' has a higher likelihood than 'NP -> DT NN'; as a matter of fact, it's the highest-ranking NP rule:

NP rule

probability

NP -> NP PP

0.09222728039116507

NP -> DT NN

0.0851458438711853

NP -> -NONE-

0.05163547462485247

NP -> NN

0.04678806272129489

NP -> NNS

0.041982802225594334

JJ rule

probability

JJ -> 'new'

0.027768255056564963

JJ -> 'other'

0.02296880356530682

JJ -> 'last'

0.014741172437435722

JJ -> 'many'

0.014226945491943779

JJ -> 'such'

0.01405553651011313

Thusly, a probabilistic parser based on a PCFG computes the probability of each parse tree using the probabilities of the individual CF rules used in the tree. Syntactic ambiguities, then, can be resolved in favor of a tree among the list of possible parses that has the highest probability. Our favorite Chart Parser comes with a few probabilistic variants, and "Inside Chart Parser" below is built on the PCFG above. It successfully parses s3 "Homer ate the donut on the table" and outputs a list of parses sorted in the order of probability. The top ranking parse is where 'on the table' modifies 'the donut', which matches our intuition.

>>> grammar_pcfg = nltk.PCFG.fromstring(pcfg_str)  # pcfg_str is above lines in """...""" 
>>> pchparser = nltk.parse.pchart.InsideChartParser(grammar_pcfg)
>>> for t in pchparser.parse('Homer ate the donut on the table'.split()):
        print(t)
		
(S
  (NP (N Homer))
  (VP
    (V ate)
    (NP
      (NP (DET the) (N donut))
      (PP (P on) (NP (DET the) (N table)))))) (p=7.1369e-12)
(S
  (NP (N Homer))
  (VP
    (VP (V ate) (NP (DET the) (N donut)))
    (PP (P on) (NP (DET the) (N table))))) (p=5.94741e-12)
(S
  (NP (N Homer))
  (VP
    (V ate)
    (NP (DET the) (N donut))
    (PP (P on) (NP (DET the) (N table))))) (p=1.25209e-12)
>>>

Lecture 22: Parsing Review, Probabilistic CFG

Parser Review

Probabilistic Grammar