" Live free or die ? "

[Assignment 1] Intro to Data Science In Python

Assignment 1

For this assignment you are welcomed to use other regex resources such a regex “cheat sheets” you find on the web. Feel free to share good resources with your peers in slack!

Before start working on the problems, here is a small example to help you understand how to write your own answers. In short, the solution should be written within the function body given, and the final result should be returned. Then the autograder will try to call the function and validate your returned result accordingly.

In [ ]:

def example_word_count():
    # This example question requires counting words in the example_string below.
    example_string = "Amy is 5 years old"
    
    # YOUR CODE HERE.
    # You should write your solution here, and return your result, you can comment out or delete the
    # NotImplementedError below.
    result = example_string.split(" ")
    return len(result)

    #raise NotImplementedError()

Part A

Find a list of all of the names in the following string using regex.

In [14]:

import re
def names():
    simple_string = """Amy is 5 years old, and her sister Mary is 2 years old. 
    Ruth and Peter, their parents, have 3 kids."""

    # YOUR CODE HERE
    name = re.findall('[A-Z][\w]{1,4}',simple_string)
    return name

In [15]:

assert len(names()) == 4, "There are four names in the simple_string"

In [75]:

import re
with open ("assets/grades.txt", "r") as file:
    grades = file.read()

    # YOUR CODE HERE
    #n = re.split('\n',grades)
for item in re.finditer("([A-Z][\w]* [A-Z][\w]*)",grades):
    print(item.groups(2))
('Bell Kassulke',)
('Jacqueline Rupp',)
('Alexander Zeller',)
('Valentina Denk',)
('Simon Loidl',)
('Elias Jovanovic',)
('Stefanie Weninger',)
('Fabian Peer',)
('Hakim Botros',)
('Emilie Lorentsen',)
('Herman Karlsen',)
('Nathalie Delacruz',)
('Casey Hartman',)
('Lily Walker',)
('Gerard Wang',)
('Tony Mcdowell',)
('Jake Wood',)
('Fatemeh Akhtar',)
('Kim Weston',)
('Nicholas Beatty',)
('Kirsten Williams',)
('Vaishali Surana',)
('Coby Mccormack',)
('Yasmin Dar',)
('Romy Donnelly',)
('Viswamitra Upandhye',)
('Kendrick Hilpert',)
('Killian Kaufman',)
('Elwood Page',)
('Mukti Patel',)
('Emily Lesch',)
('Elodie Booker',)
('Jedd Kim',)
('Annabel Davies',)
('Adnan Chen',)
('Jonathan Berg',)
('Hank Spinka',)
('Agnes Schneider',)
('Kimberly Green',)
('Rose Coates',)
('Rose Christiansen',)
('Shirley Hintz',)
('Hannah Bayer',)```

Part B

The dataset file in grades.txt contains a line separated list of people with their grade in a class. Create a regex to generate a list of just those students who received a B in the course.

In [64]:

import re
def grades():
    with open ("assets/grades.txt", "r") as file:
        grades = file.read()

    # YOUR CODE HERE
    #n = re.split('\n',grades)
    m = re.findall("([A-Z][\w]* [A-Z][\w]*)",grades)
    return m
grades()

Out[64]:

['Ronald Mayr',
 'Bell Kassulke',
 'Jacqueline Rupp',
 'Alexander Zeller',
 'Valentina Denk',
 'Simon Loidl',
 'Elias Jovanovic',
 'Stefanie Weninger',
 'Fabian Peer',
 'Hakim Botros',
 'Emilie Lorentsen',
 'Herman Karlsen',
 'Nathalie Delacruz',
 'Casey Hartman',
 'Lily Walker',
 'Gerard Wang',
 'Tony Mcdowell',
 'Jake Wood',
 'Fatemeh Akhtar',
 'Kim Weston',
 'Nicholas Beatty',
 'Kirsten Williams',
 'Vaishali Surana',
 'Coby Mccormack',
 'Yasmin Dar',
 'Romy Donnelly',
 'Viswamitra Upandhye',
 'Kendrick Hilpert',
 'Killian Kaufman',
 'Elwood Page',
 'Mukti Patel',
 'Emily Lesch',
 'Elodie Booker',
 'Jedd Kim',
 'Annabel Davies',
 'Adnan Chen',
 'Jonathan Berg',
 'Hank Spinka',
 'Agnes Schneider',
 'Kimberly Green',
 'Rose Coates',
 'Rose Christiansen',
 'Shirley Hintz',
 'Hannah Bayer']

In [ ]:

assert len(grades()) == 16

Part C

Consider the standard web log file in assets/logdata.txt. This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:

  • a host (e.g., ‘146.204.224.152’)
  • a user_name (e.g., ‘feest6811’ note: sometimes the user name is missing! In this case, use ‘-‘ as the value for the username.)
  • the time a request was made (e.g., ’21/Jun/2019:15:45:24 -0700′)
  • the post request type (e.g., ‘POST /incentivize HTTP/1.1’ note: not everything is a POST!)

Your task is to convert this into a list of dictionaries, where each dictionary looks like the following:

example_dict = {"host":"146.204.224.152", 
                "user_name":"feest6811", 
                "time":"21/Jun/2019:15:45:24 -0700",
                "request":"POST /incentivize HTTP/1.1"}

In [ ]:

import re
def logs():
    with open("assets/logdata.txt", "r") as file:
        logdata = file.read()
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [ ]:

assert len(logs()) == 979

one_item={'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': 'POST /incentivize HTTP/1.1'}
assert one_item in logs(), "Sorry, this item should be in the log results, check your formating"

YOU MIGHT ALSO LIKE

0 0 vote
Article Rating
Subscribe
提醒
guest
0 评论
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x