files (slides)

Announcements

  • Midterm 2 on March 26, next Wednesday

Key Concepts

  • iterating a string/list/dictionary (+ if statements)

    • while loop, for loops (two ways, difference?)

    • draw a loop table

  • purpose of iterating, for example:

    • mutate (list/dictionary methods)

    • aggregate (e.g., sum, factorial)

    • get max or min

    • concatenate strings (string methods)

CSc 110 - files

Files and file systems

  • On (at least most of) our computers, there is a file system via which we can create, save, modify, and remove files

    • On Mac: can browse with Finder

    • On Windows: can browse with windows explorer

  • File systems are often hierarchical

Opening a file

  • To open a file in a python program:

    info = open(file_name, mode)

  • file_name: the name of the file to open

    • It can also be a path
  • mode: the mode in which to open in

    • ‘a’    ‘r’    ‘w’

Absolute vs Relative path

  • An absolute path describes the location from the root directory.

    • /home/doc/example.txt
  • A relative path describes the location of a file relative to the current (working) directory.

    • if .py and example.txt are in the same folder, the relative path is the filename itself - example.txt

Opening and closing a file

info.txt

The quick brown fox
jumped over
the lazy
bear
sitting by the tree

read_file.py

info = open('info.txt', 'r')
info.close()

Reading a line

Use info object and call readline()to read a line

read_file.py

info = open('info.txt', 'r')
line = info.readline()
info.close()

print(line)
assert line == "The quick brown fox\n"
The quick brown fox

Reading three lines

info = open('info.txt', 'r')
print(info.readline())
print(info.readline())
print(info.readline())
info.close()
The quick brown fox

jumped over

the lazy

Write a function

Function name read_first_line: given a filename string, open file in read ('r') mode, read one line, close file, return that line.

Name your file first_read.py and submit to gradescope.

Reading a line example

info.txt

The quick brown fox
jumped over
the lazy
bear
sitting by the tree

first_read.py

def read_first_line(filename):
  info = open(filename, 'r')
  line = info.readline()
  info.close()
  return line

assert read_first_line('info.txt') == "The quick brown fox\n"

Reading lines

  • file.readline()

    • reads one line from a file, returns a string
  • file.readlines()

    • reads all of the lines, returns a list of strings
  • file.read()

    • returns one string

Iterating over a file

Use readlines():

info = open('info.txt', 'r')
for line in info.readlines():
    print(line)
info.close()
The quick brown fox

jumped over

the lazy

bear

sitting by the tree

Iterating over a file – an easier way

Here info is iterable, so we can use for x in file loop:

info = open('info.txt', 'r')
for line in info:
    print(line)
info.close()
The quick brown fox

jumped over

the lazy

bear

sitting by the tree

More string methods

We worked with .isnumeric(), a string method to check if a string contains only digit characters

Here are some other useful string methods:

  • string.strip(chars) – removes any of the characters in chars from the beginning or end of string, returns a string
  • string.split(chars) – splits string at the chars, returns a list

More string methods

.strip() and .split(" ")

info = open('info.txt', 'r')

line = info.readline()
print(line)

line = line.strip("\n")
print(line)

words = line.split(" ")
print(words)

info.close()
The quick brown fox

The quick brown fox
['The', 'quick', 'brown', 'fox']

Write your code

Use .strip() and .split(" ") to print a list: ['The', 'quick', 'brown', 'fox']

name = "The-quick-brown-fox$"

More string methods

Use only .strip():

name = "The-quick-brown-fox$"

print(name.strip('$'))
The-quick-brown-fox

Use .strip() and .split(" "):

name = "The-quick-brown-fox$"

print(name.strip('$').split('-'))
['The', 'quick', 'brown', 'fox']

Write a function

  • It opens the file in read mode.

  • It iterates over the lines, stripping the line breaks and splitting each line by space.

  • It returns a list of lists, where each inner-list is a line containing all the words in that line

Test case (download text file):

assert lines_and_words("info.txt") == [ ['The', 'quick', 'brown', 'fox'], 
                                        ['jumped', 'over'], ['the', 'lazy'],
                                        ['bear'], ['sitting', 'by', 'the', 'tree'] ]

Write a function – solution

def lines_and_words(file_name):
  f = open(file_name, "r")
  all_words = []
  for line in f:
    all_words.append(line.strip("\n").split(" "))
  f.close()
  return all_words
  
  
def main():
  assert lines_and_words("info.txt") == [ ['The', 'quick', 'brown', 'fox'], 
                                          ['jumped', 'over'], ['the', 'lazy'],
                                          ['bear'], ['sitting', 'by', 'the', 'tree'] ]
                                     
main()

Write a function

It takes a string file_name as argument and returns a dictionary with word counts.

Use str.lower() to convert a string to lowercase.

Test case:

assert count_words("info.txt") == {"the": 3, "quick": 1, "brown": 1,
                                   "fox": 1 , "jumped": 1, "over": 1, 
                                   "lazy": 1, "bear": 1, "sitting": 1,
                                   "by": 1, "tree": 1}

Write a function – solution

def count_words(file_name):
  f = open(file_name, "r")
  counts = {}
  for line in f:
    words = line.strip("\n").split(" ")
    for w in words:
      lower_case_w = w.lower()
      if lower_case_w not in counts:
        counts[lower_case_w] = 1
      else:
        counts[lower_case_w] += 1
  return counts

def main():
  assert count_words("info.txt") == {"the": 3, "quick": 1, "brown": 1,
                                     "fox": 1 , "jumped": 1, "over": 1, 
                                     "lazy": 1, "bear": 1, "sitting": 1,
                                     "by": 1, "tree": 1}
                                 
main()

Quiz 08

current time

You have 10 minutes to complete the quiz.

No need to write main() function.