Go to: Na-Rae Han's home page  

Python 3 Notes

        [ HOME | LING 1330/2330 ]

Tutorial 12: Reading Text

<< Previous Tutorial           Next Tutorial >>
On this page: reading from a file, open(), .readlines().

Video Tutorial


Python 3 Changes

print(x,y) instead of print x, y

You may need
open(filename, encoding="utf-8")
instead of open(filename). See note below.

Video Summary

  • To open a text file within your code, use Python's built in open() function. Within the open() function, type a string containing the path of the location of your text file (in this case, it looks like open('C:/Users/mybringback/Desktop/pg16328.txt'), your location will of course look different). After opening your text file, you can tell Python what to do with it by defining it is a variable. For example, typing booktxt = book.readlines() will define "booktxt" as your text file and allow it to be recalled within your program on a readable line by line basis.
  • After defining this variable, simply typing booktxt will display your entire text file start to finish within your IDLE window. While this feature can be useful in other circumstances, oftentimes your text will be too long and unwieldy to be recalled in this manner, as is the case with the large text file used in this tutorial. In this case, you can also recall select lines from your text by placing the specific line number within brackets next to your variable. For example, booktxt[0] will recall the first line of the text from your file, which in the text we are using is formatting information.
  • To recall the length in words of your text file, type len(booktxt). For more information on the len() function, see Tutorial 11.

Learn More

  • If you are a Mac user, omit the drive letter "C:" from your file path. Mac's directory tree simply starts from "/", which is called the root.
  • Even when following Ed's exact steps and using the exact Beowulf file, some of you (mostly on Windows) will get this surprise error message:
     
    >>> book = open('C:/Users/narae/Desktop/pg16328.txt')
    >>> booktxt = book.readlines()
    Traceback (most recent call last):
      File "<pyshell#14>", line 1, in <module>
        booktxt = book.readlines()
      File "C:\Program Files (x86)\Python35-32\lib\encodings\cp1252.py", line 23, in decode
        return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 36593: character 
    maps to <undefined> 
    >>> 
    

    That means the text file's encoding method is different from your system's default encoding. Ed didn't get this error because he is operating in Python 2.7, which handles text encoding differently. In Python 3, more often than not we need to specify the encoding system during file opening, using the encoding="xxx" switch:
     
    >>> book = open('C:/Users/narae/Desktop/pg16328.txt', encoding="utf-8")
    >>> booktxt = book.readlines()
    >>> booktxt[0]
    '\ufeffThe Project Gutenberg EBook of Beowulf \n'
    >>> 
    
    Here, the file is encoded in UTF-8 (8-bit Unicode, as opposed to UTF-16 or UTF-32), so encoding="utf-8" was specified.
  • It was not done in the tutorial, but a file object, once opened and processed, must be closed. In the tutorial, a good time to close would have been after book.readlines() was executed. It can be done by calling book.close().
  • There are more details to learn (and battle with) in dealing with files on your local drive. See this advanced topic page: "File Path and CWD".
  • There are additional reading methods that are handy. See "File Reading and Writing Methods" for details.

Practice

Practice using the "mary-short.txt" file, linked on the left under "Code and Text Examples". First download and save it on your computer, and then read it in in the IDLE shell. Everyone's system is different, so you might need to refer to these two additional tutorials: "File Path and CWD" and "File Reading and Writing Methods".

Explore

  • Anne Dawson has many sample scripts for File I/O. Search for "open". Note that she uses the "escaped backslash" style (see this page) of Windows file path reference.