" What is shape of Time? "

[Note1] IDSP-Introduction

 

Forward

This is a new series of my note on Coursera Course: Introduction to Data Science in Python (IDSP). As Christopher Brooks said:

The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively.

Thus, this course will introduce fundamental technologies on how to implement python as well as its libraries on data science work. The course will last four week attached with quiz and assignments. To be honest, some assignment might be challenging as the course level is Intermediate. But some assignment is very pragmatic because the datasets are based on real-life topics such as census, vaccine and grades. I think this will be more imperative in actual data science work than merely learning numpy and pandas.

So, I will show my notes and some codes in next a few weeks, just for recording my notes and helping me recall knowledge I learned.

 

1 Intro

The beginning of Week1 Class will introduce some useful programming language in Python including:

  • Types and Sequences
  • String
  • CSV files
  • Date and Time
  • Map
  • Lamba function

 

2 The Python Programming Language: Types and Sequences

Lists are a mutable data structure. Add variables with different data structure is avaliable in List.

In [1]:

x = [1, 'a', 2, 'b']
type(x)

Out[1]:

list

Add, plus and repeat lists. Do not forget difference between lists and np.array

In [2]:

x.append(3.3)
print(x)
[1, 'a', 2, 'b', 3.3]

In [3]:

[1,2] + [3,4]

Out[3]:

[1, 2, 3, 4]

Use * to repeat lists.

In [4]:

[1]*3

Out[4]:

[1, 1, 1]

Use the in operator to check if something is inside a list.

In [5]:

1 in [1, 2, 3]

Out[5]:

True

Now let’s look at strings. Use bracket notation to slice a string.

In [6]:

x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character
print(x[0:2]) #first two characters
T
T
Th

This will return the last element of the string.

In [7]:

x[-1]

Out[7]:

'g'

This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end.

In [8]:

x[-4:-2]

Out[8]:

'ri'

This is a slice from the beginning of the string and stopping before the 3rd element.

In [9]:

x[:3]

Out[9]:

'Thi'

And this is a slice starting from the 4th element of the string and going all the way to the end.

In [10]:

x[3:]

Out[10]:

's is a string'

In [11]:

firstname = 'Christopher'
lastname = 'Brooks'

print(firstname + ' ' + lastname)
print(firstname*3)
print('Chris' in firstname)
Christopher Brooks
ChristopherChristopherChristopher
True

split returns a list of all the words in a string, or a list split on a specific character.

In [12]:

firstname = 'Christopher Arthur Hansen Brooks'.split(' ')[0] # [0] selects the first element of the list
lastname = 'Christopher Arthur Hansen Brooks'.split(' ')[-1] # [-1] selects the last element of the list
print(firstname)
print(lastname)
Christopher
Brooks

Make sure you convert objects to strings before concatenating.

In [13]:

'Chris' + 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
----> 1 'Chris' + 2

TypeError: Can't convert 'int' object to str implicitly

In [14]:

'Chris' + str(2)

Out[14]:

'Chris2'

Dictionaries associate keys with values.

In [15]:

x = {'Christopher Brooks': 'brooksch@umich.edu', 'Bill Gates': 'billg@microsoft.com'}
x['Christopher Brooks'] # Retrieve a value by using the indexing operator

Out[15]:

'brooksch@umich.edu'

In [16]:

x['Kevyn Collins-Thompson'] = None
x['Kevyn Collins-Thompson']

Iterate over all of the keys:

In [17]:

for name in x:
    print(x[name])
billg@microsoft.com
brooksch@umich.edu
None

Iterate over all of the values:

In [18]:

for email in x.values():
    print(email)
billg@microsoft.com
brooksch@umich.edu
None

Iterate over all of the items in the list:

In [19]:

for name, email in x.items():
    print(name)
    print(email)
Bill Gates
billg@microsoft.com
Christopher Brooks
brooksch@umich.edu
Kevyn Collins-Thompson
None

You can unpack a sequence into different variables:

In [20]:

x = ('Christopher', 'Brooks', 'brooksch@umich.edu')
fname, lname, email = x

In [21]:

fname

Out[21]:

'Christopher'

In [22]:

lname

Out[22]:

'Brooks'

Make sure the number of values you are unpacking matches the number of variables being assigned.

In [23]:

x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
fname, lname, email = x
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      1 x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
----> 2 fname, lname, email = x

ValueError: too many values to unpack (expected 3)

 

3 The Python Programming Language: More on Strings

It’s important to convert data to typical type.

In [24]:

print('Chris' + 2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
----> 1 print('Chris' + 2)

TypeError: Can't convert 'int' object to str implicitly

In [25]:

print('Chris' + str(2))
Chris2

Python has a built in method for convenient string formatting.

In [26]:

sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Chris'}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items']*sales_record['price']))
Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96

 

4 Reading and Writing CSV files

Let’s import our datafile mpg.csv, which contains fuel economy data for 234 cars.

  • mpg : miles per gallon
  • class : car classification
  • cty : city mpg
  • cyl : # of cylinders
  • displ : engine displacement in liters
  • drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
  • fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
  • hwy : highway mpg
  • manufacturer : automobile manufacturer
  • model : model of car
  • trans : type of transmission
  • year : model year

In [27]:

import csv

%precision 2

with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))
    
mpg[:3] # The first three dictionaries in our list.

Out[27]:

[{'': '1',
  'class': 'compact',
  'cty': '18',
  'cyl': '4',
  'displ': '1.8',
  'drv': 'f',
  'fl': 'p',
  'hwy': '29',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'auto(l5)',
  'year': '1999'},
 {'': '2',
  'class': 'compact',
  'cty': '21',
  'cyl': '4',
  'displ': '1.8',
  'drv': 'f',
  'fl': 'p',
  'hwy': '29',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'manual(m5)',
  'year': '1999'},
 {'': '3',
  'class': 'compact',
  'cty': '20',
  'cyl': '4',
  'displ': '2',
  'drv': 'f',
  'fl': 'p',
  'hwy': '31',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'manual(m6)',
  'year': '2008'}]

Using ‘Pandas’ to open CSV files would be intuitive. Details on Pandas will be presented later.

In [28]:

import pandas as pd

pd.read_csv('mpg.csv',index_col = 0)

Out[28]:

  manufacturer model displ year cyl trans drv cty hwy fl class
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
230 volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p midsize
231 volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize
232 volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsize
233 volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p midsize
234 volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsize

234 rows × 11 columns

csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of 234 dictionaries.

In [29]:

len(mpg)

Out[29]:

234

keys gives us the column names of our csv.

In [30]:

mpg[0].keys()

Out[30]:

dict_keys(['', 'drv', 'manufacturer', 'year', 'displ', 'fl', 'cty', 'model', 'hwy', 'cyl', 'class', 'trans'])

This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.

In [31]:

sum(float(d['cty']) for d in mpg) / len(mpg)

Out[31]:

16.86

Similarly this is how to find the average hwy fuel economy across all cars.

In [32]:

sum(float(d['hwy']) for d in mpg) / len(mpg)

Out[32]:

23.44

Use set to return the unique values for the number of cylinders the cars in our dataset have.

In [33]:

cylinders = set(d['cyl'] for d in mpg)
cylinders

Out[33]:

{'4', '5', '6', '8'}

Here’s a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [34]:

CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    summpg = 0
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

Out[34]:

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

Use set to return the unique values for the class types in our dataset.

In [35]:

vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

Out[35]:

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

And here’s an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [36]:

HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

Out[36]:

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]

 

5 The Python Programming Language: Dates and Times

In [37]:

import datetime as dt
import time as tm

time returns the current time in seconds since the Epoch. (January 1st, 1970)

In [38]:

tm.time()

Out[38]:

1605686091.61

Convert the timestamp to datetime.

In [39]:

dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

Out[39]:

datetime.datetime(2020, 11, 18, 15, 54, 51, 923187)

Handy datetime attributes:

In [40]:

dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime

Out[40]:

(2020, 11, 18, 15, 54, 51)

timedelta is a duration expressing the difference between two dates.

In [41]:

delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta

Out[41]:

datetime.timedelta(100)

date.today returns the current local date.

In [42]:

today = dt.date.today()

In [43]:

today - delta # the date 100 days ago

Out[43]:

datetime.date(2020, 8, 10)

In [44]:

today > today-delta # compare dates

Out[44]:

True

 

6 The Python Programming Language: Objects and map()

An example of a class in python:

In [45]:

class Person:
    department = 'School of Information' #a class variable

    def set_name(self, new_name): #a method
        self.name = new_name
    def set_location(self, new_location):
        self.location = new_location

In [46]:

person = Person()
person.set_name('Christopher Brooks')
person.set_location('Ann Arbor, MI, USA')
print('{} live in {} and works in the department {}'.format(person.name, person.location, person.department))
Christopher Brooks live in Ann Arbor, MI, USA and works in the department School of Information

Here’s an example of mapping the min function between two lists. lazy evaluation

In [47]:

store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]
cheapest = map(min, store1, store2)
cheapest

Out[47]:


Now let’s iterate through the map object to see the values.

In [48]:

for item in cheapest:
    print(item)
9.0
11.0
12.34
2.01

In [49]:

people = ['Dr.Crisopher Brooks','Dr.Kevyn Collins-Thompson','Dr.VG Vinod Vydiswaran','Dr.Daniel Romero']

def split_and_name(person):
    title = person.split()[0]
    lastname = person.split()[-1]
    return '{}{}'.format(title,lastname)

list(map(split_and_name,people))

Out[49]:

['Dr.CrisopherBrooks',
 'Dr.KevynCollins-Thompson',
 'Dr.VGVydiswaran',
 'Dr.DanielRomero']

 

7 The Python Programming Language: Lambda and List Comprehensions

Here’s an example of lambda that takes in three parameters and adds the first two.

In [50]:

my_function = lambda a, b, c : a + b

In [51]:

my_function(1, 2, 3)

Out[51]:

3

Let’s iterate from 0 to 999 and return the even numbers.

In [52]:

my_list = []
for number in range(0, 50):
    if number % 2 == 0:
        my_list.append(number)
my_list

Out[52]:

[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48]

Now the same thing but with list comprehension.

In [53]:

my_list = [number for number in range(0,50) if number % 2 == 0]
my_list

Out[53]:

[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48]
  • 9
  • 1230
  • 0

YOU MIGHT ALSO LIKE

0 0 vote
Article Rating
Subscribe
提醒
guest
0 评论
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x