Go to: LING 1330/2330 home page  

Exercise 10: Trees!

Before You Begin

  • Work on a Python script for this assignment. But as always, your best bet is to switch back and forth between your Python script and shell: execute your script, try out follow-up commands in the shell, update your script with successful code, and repeat.
  • The questions in this assignment should be answered directly by your script: have the script print out the answers and/or include them as comments.
  • Document your script by inserting appropriate comments. The script is going to get long, so it should help keep your code organized and also facilitate my grading.
  • You might not like the particular flavor of phrase structure grammar I used here. Well trust me, I can/would love to give you a much bigger, more detailed, and overall better grammar! But do stick to this version for the purpose of this assignment: your rules and trees should exactly match the examples given.
  • In the trees and rules, use lower-case ('the', 'he') for all words, even at the beginning of a sentence. The only exceptions are the proper names ('Homer', 'Marge', etc.). This simplifies grammar development and parsing.
  • For the same reason, disregard punctuation and symbols for this assignment.
  • For your reference, all the sentences and their tree drawings used in this assignment plus the next one (HW8) can be found on this page. Make sure the trees you build matches the tree representation on it.

Tree Objects

In this exercise assignment, you will be practicing building tree objects.
  1. Build the following three tree objects as np, aux, and vp.
       
  2. Using them, build two tree objects, named s1 and s2, for the following sentences. The trees should look exactly like the ones shown on this page.
    (s1) Marge will make a ham sandwich
    (s2) will Marge make a ham sandwich
  3. Build a tree object named s3 for the following sentence, using its full-sentence string representation.
    (s3) Homer ate the donut on the table
  4. Build tree objects named s4 and s5 for the following sentences.
    (s4) my old cat died on Tuesday
    (s5) children must play in the park with their friends
  5. Once a tree is built, you can extract a list of context-free rules, generally called production rules, from it using the .productions() method. Each CF rule in the list is either lexical, i.e, contains a lexical word on its right-hand side, or not:
     
    >>> print(vp)
    (VP (V ate) (NP (DET the) (N donut)))
    >>> vp_rules = vp.productions()       # list of all CF rules used in the tree
    >>> vp_rules
    [VP -> V NP, V -> 'ate', NP -> DET N, DET -> 'the', N -> 'donut']
    >>> vp_rules[0]
    VP -> V NP
    >>> vp_rules[1]
    V -> 'ate'
    >>> vp_rules[0].is_lexical()    # VP -> V NP is not a lexical rule
    False
    >>> vp_rules[1].is_lexical()    # V -> 'ate' is a lexical rule
    True
    
    Explore the CF rules of s5. Include in your script the answers to the following:
    1. How many CF rules are used in s5?
    2. How many unique CF rules are used in s5?
    3. How many of them are lexical?
  6. NLTK's Penn Treebank corpus represents its syntactic trees following this formalism. Load the corpus and explore. Hint: these are shorter and more manageable sentences: 0, 1, 7, 9, 45, 96, etc.
     
    >>> from nltk.corpus import treebank
    >>> tb_psents = treebank.parsed_sents()    
    >>> tb_psents[45]
    Tree('S', [Tree('NP-SBJ', [Tree('DT', ['The']), Tree('JJ', ['top']), 
    Tree('NN', ['money']), Tree('NNS', ['funds'])]), Tree('VP', [Tree('VBP', 
    ['are']), Tree('ADVP-TMP', [Tree('RB', ['currently'])]), Tree('VP', [Tree('VBG', 
    ['yielding']), Tree('NP', [Tree('QP', [Tree('RB', ['well']), Tree('IN', ['over']), 
    Tree('CD', ['9'])]), Tree('NN', ['%'])])])]), Tree('.', ['.'])])
    >>> tb_psents[45].pprint()       # same thing as print(tree)
    (S
      (NP-SBJ (DT The) (JJ top) (NN money) (NNS funds))
      (VP
        (VBP are)
        (ADVP-TMP (RB currently))
        (VP (VBG yielding) (NP (QP (RB well) (IN over) (CD 9)) (NN %))))
      (. .))
    >>> tb_psents[45].pretty_print()            # easier to grasp, maybe?
                                   S                                       
           ________________________|_____________________________________   
          |                                        VP                    | 
          |                  ______________________|____                 |  
          |                 |      |                    VP               | 
          |                 |      |         ___________|____            |  
          |                 |      |        |                NP          | 
          |                 |      |        |            ____|_______    |  
        NP-SBJ              |   ADVP-TMP    |           QP           |   | 
      ____|____________     |      |        |       ____|________    |   |  
     DT   JJ     NN   NNS  VBP     RB      VBG     RB   IN       CD  NN  . 
     |    |      |     |    |      |        |      |    |        |   |   |  
    The  top   money funds are currently yielding well over      9   %   . 
    
    >>> tb_psents[45].draw()       # A window pops up
    
SUBMIT:
  • Your Python script and saved IDLE shell output (.txt file)
  • Alternatively: Jupyter Notebook (.ipynb file) if you prefer. In-cell tree rendering likely will need additional configuration, such as installation of GhostView.