Skip to main content

cs2370 Notes: 14 Baby Names

··1 min

More with Strings #

Starting with a file of baby names from the Social Security Administration, let’s see what we can do.

How many names? Male, female, total?

Build a histogram per first letter.

Hints:

  • split
  • strip
  • replace

Introducing Regular Expressions #

import re

text = "abc123"
re.match(r'a', text)
re.match(r'1', text)
re.match(r'[a-z]', text)
re.match(r'[a-z]+', text)
re.match(r'[a-z]*', text)
re.match(r'[0-9]+', text)
re.match(r'[0-9]*', text)
re.match(r'[a-z]+[0-9]+', text)
re.match(r'\D+\d+', text)


phone = '(603) 555-1212'
pat = r'\(\d{3}\)\s\d{3}-\d{4}'
# Does that look like a phone number?
re.match(pat, phone)
re.fullmatch(pat, phone)

tmpl = '''
Dear {name},

The warranty on your {year} {make} {model} is about to 
expire. This is your last chance to purchase extended
warantee coverage to avoid {disaster1} and {disaster2}.

Please call us at {phone} today!

{sender_name}
The Very Real Waranty Company.
'''

# Find the first template slot.
mm = re.search(r'{[a-z]+}', tmpl)

# That gives us a match object.
mm[0]   # Text of the match

# Group
mm = re.search(r'{([a-z]+)}', tmpl)
mm[0]   # Text of the match
mm[1]   # First group

re.findall(r'{[a-z]+}', tmpl)
re.findall(r'{\w+}', tmpl)
re.findall(r'{(\w+)}', tmpl)

for mm in re.finditer(r'{(\w+)}', tmpl:
    print(mm[0], mm[1])

text1 = re.sub(r'{\w+}', "BOOM", tmpl)
text1 = re.sub(r'{(\w+)}', r"[\1]", tmpl)

def cap(st):
    return st[1].capitalize()

text1 = re.sub(r'{(\w+)}', cap, tmpl)