Lecture 1: Introduction to Data Science & Python

Lecture 1: Introduction to Data Science & Python#

This is the lecture note for the first class of INF80054 Data Science Fundamentals. In the next 5 weeks, we will learn the following topics to be covered in 10 face-to-face classes.

Note: This unit has no programming prerequisite, but it is intended to first year post-graduate students with some computer literacty. Some backgrounds in (statistical) data analysis would also be very useful.

What is programming using a computer language (such as Python)?#

Programming a computer basically means telling the computer what to do using a language that the computer understand. Python is one such language which enables us to instruct the computer, which only understand a binary 0|1 or ON|OFF instruction, using a set of primitives and syntaxes (or rules) which, for ease of interpretation by us, are as close as possible to our normal language such as English.

Syntax and semantics#

Comparing between a normal language (such as English) and a programming language, we can see that both have syntaxes or a set of rules. For example, the sentence or statement “Cats love raw fish.” is a valid English sentence, but “Cats raw fish love.” is not. In this case, the second statement has a syntax error.

Similarly, the statement 5*2.5 is a valid Python statement, but the statement 5”Yes” is not. Again, in this case, the second statement is not syntatically valid (that is, it has a syntax error).

Primitives#

We note that the above English statements contains words. These words (and numbers) are the primitives of the English language. In contrast, Python’s primitives consist of numbers, strings, and operators.

Semantics#

Semantics refer to the meaning(s) of syntatically valid expression/statement. An expression can be syntatically valid, but has an incorrect meaning or logical error. If a syntatically valid expression has an incorrect meaning, then we say that expression has a static semantic error.

For example, the expression 5 * 2.5 is considered by a Python intrepreter as syntatically valid. If we indeed intend to use the instruction to compute the multiplication of 5 and 2.5, then it is also semantically valid. However, the expression of 5 + “yes” maybe intrepreted as syntatically valid, but it may have a static semantic error. This is because we do not know what 5 + “yes” really mean.

We also note that while in English, a statement may have multiple meanings (for example, “That man has a green thumb”), a Python statement can only have one meaning. Thus, in English, logical or semantic error could arise because of multiciplity in meanings. In such cases it is often hard to tell which meaning is intended.

In Python, semantic error occurs when the single precise meaning of an expression is not the intended meaning from that expression.

Consequences of different types of errors#

Syntactic errors can be easily detected by the programmer and the compiler/interpreter. Many file editors and IDE (integrated development environment) are syntatically aware and can detect any syntax error in our expressions.

Static semantic errors may or may not be detected by the compiler/interpreter. However, static semantic errors are usually detectable with relatively ease. In contrast, semantic errors (or logical errors) are not only undetected by the compiler/interpreter, but they are also often difficult to detect by the programmer.

Semantic errors can lead to program crashes or program hanged/running forever, and incorrect outputs.

Programming in Python: A general overview#

Python script#

A Python program (Python script) contains a sequence of definitions and statements (commands). The Python interpreter evaluates the definitions and executed the commands.

A Python program (in other words, a set of Python definition and statements) can be typed in directly in a Python console to be interpreted/executed

A Python program can be stored in a text file (as a Python script) to be read into a Python console, which is then to be interpreted (evaluated and executed) by a Python interpreter.

Object Oriented Programming (OOP)#

Regular objects have a type defining what we can do with them. For examples, “Flowers” are objects which cannot talk or walk; but can be used as beautiful decorations or ingredients. “Humans” are objects who can talk and walk.

In a similar sense, everything in Python is an object. A Python program works by defining/initialising and manipulating data objects. For examples, If X is an integer object, then we can add, subtract, multiply, divide, etc. If Y is a string object, then we can shorten it, change some of its letters, etc.

In Python, there are scalar and non-scalar objects. Scalar objects cannot be subdivided. Non-scalar objects are those with accessible internal structure

Python’s scalar objects#

int is a Python scalar object of type integer. For example, 9 is an int object.

float is a Python scalar object of type real number. For example, 3.14 is a float object.

Note

(Note: in Python, a real number must have a decimal point)

bool is Python scalar object of Boolean type with only two possible values: TRUE and FALSE

NoneType a special object with only one value: None

To find the type of an object in Python, use the function: type()

Let’s open a Windows shell from Anaconda: Start -> Anaconda -> Anaconda prompt Now, to open a Python console, enter “python” on the shell prompt (without the quotes). Then, on Python console’s prompt, enter type (9). You will be presented with the <class 'int'> result by Python.

Now, on Python console’s prompt, enter type 3.14. You will be presented with <class ‘float’> as the type of the float scalar object 3.14.

# We can also do the above type checking using Python expressions in a script file instead of interactively on console. 
# Again, to find the type of an object in Python, we can use the built-in function type() as follows 
# (Note: we use the built-in function print() to print the object type. 
# Q: What would happen if we don't use the print function?)

print(type(9))
print(type(3.14))

# We could also use the string literal if we want to provide some details to explain what is being printed
print(f'The object type of an integer number 9 is {type(9)}')
print(f'The object type of a real number 3.14 is {type(3.14)}')

<class 'int'>
<class 'float'>
The object type of an integer number 9 is <class 'int'>
The object type of a real number 3.14 is <class 'float'>

Type casting#

It may be possible to change the object type from one type to another. For example, 3 is an int object. If we type float(3) then we effectively created a float object with a value of 3.

# First, let's assign 3 to a variable x
x = 3
print(f'The type of x is {type(x)}')
print(f'Now, after casting to float, the object type of x is {type(float(x))}')
print(f'even if the value of x is still {x}')

# alternatively, we can assign the cast object to a new variable
y = float(x)
z = float(3)
print(f'The type of y is {type(y)}')
print(f'The type of z is {type(z)}')

The type of x is <class 'int'>
Now, after casting to float, the object type of x is <class 'float'>
even if the value of x is still 3
The type of y is <class 'float'>
The type of z is <class 'float'>

Python expression#

We combine a sequence of Python objects and operators to form expressions.

The syntax for a simple Python expression is: <object> <operator> <object>

Each expression in Python has a value (which has a type)

Note

(Remember, in Python everything is an object)

Python operators for `int` and `float`#

Addition operator (+): x + y
Subtraction operator (-): x - y
Multiplication operator (*): x * y

Note

If x is an int and y is a float, then the expression has a float type.

Division operator (/): x / y

Note

A division expression always has a float type

Modulus or remainder operator (%): x % y evaluates as the remainder value of x divided by y
Power operator (**): x ** y evaluates as x to the power of y
Floor division (//): x//y evaluates as the floor value of x divided by `y’

x = 5
y = 3

print(x + y)

z = x + y
print(z) 

print(f'x + y = {x + y}')
print(f'z = {x + y}')

print('x + y =', x + y)
print('z = ', z)

print('x - y =', x - y)
print('x / y =', x / y)

print(f'x = {x}, y = {y}, x % y = {x % y}')
print('x =', x, ', y = ', y, ', x // y =', x // y)

8
8
x + y = 8
z = 8
x + y = 8
z =  8
x - y = 2
x / y = 1.6666666666666667
x = 5, y = 3, x % y = 2
x = 5 , y =  3 , x // y = 1

Python assignment#

In the previous examples, we use expressions such as x = 5 and y = 3 where we assign int value of 5 to be the value of a variable x and 3 for a variable y.

Formally, an assignment is binding a value to a variable name. In Python, we use the equal sign (=) as the assignment operator to assign a value to a variable. For example, if we want to assign the value of 3.14 to a variable pi, we write the following expressions in a Python’s shell or terminal (or in a Python script file):

pi = 3.14
pi

3.14

In this example, 3.14 is stored in a location in the computer memory with an address marked as pi. Entering the variable name pi retrieves the value we bound to it.

Note

Assignment operator (=) is not the same as Math’s equality. For example, we can write the following series of Python expressions

pi = 3.14
radius = 5.0
area = pi * radius**2
print(f'area = {area}')
radius = radius * 2
area = pi * radius**2
print(f'area = {area}')

area = 78.5
area = 314.0

Rebinding: what happened to the storage in computer memory?
When we change the binding of a variable to a new value, the original value may still exist in the memory, but we lost a way to address it.

Note

We use variable names for the purpose of ‘abstraction’, that is we can re-assign or rebind different values to the variable. Q: what are the rules on variable naming?

STRING object type#

In Python, a string is defined by enclosing the elements in matching pairs of double (") or single (') quotation marks.

String elements: letters, digits, spaces, special chars.

Example 1:

Here, in Example 1, the addition operator + works for concatenating two strings.

Example 2:

So, the multiplication operator * in Example 2 replicates the string if the multiplier is a number.

Note

For more detailed examples, see W3 School Python tutorial on string

# The above examples in executable Python code format

# Example 1
name ='Jack'
greet = "Hi there!"
print(greet + name)

# Example 2
greet3 = greet * 3 + " "
print(greet3 + name)

Hi there!Jack
Hi there!Hi there!Hi there! Jack

As shown in the above examples, the set of Python operators may work in a different way as defined in the Python’s string class (Python Documentation).

In the examples above and below, the addition operator + works to concatenate two strings and the multiplication operator * works to replicates the string if the multiplier is a number.

Q: What happens if we do not add the single space character (" ")?

greet = 'Hi there!'
name = "Jack"
added = greet + " " + name
print('greet + name = ', added)

greet3 = greet * 3 + " "
print(greet3)

greet3a = (greet+" ")*3
print(greet3a)

greet + name =  Hi there! Jack
Hi there!Hi there!Hi there! 
Hi there! Hi there! Hi there! 

The print() function#

We use the built-in print() function to send output to the console (or other associated output device such as a file using [redirection]https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout())

For obvious reason, print is a reserved keyword.

y = 100
print(y)
print("The number is", y, ".", "y = ", y)

ystring = str(y) #Note: here we are typecasing int object y into a string variable
print("The number is " + ystring + ". " + "y = " + ystring)

print(f"The number is {y}. y = {y}")

100
The number is 100 . y =  100
The number is 100. y = 100
The number is 100. y = 100

Branching (Program Control Flow)#

The `if <condition>` statements#

The if <condition> statement will execute all the expressions within the indented block when the specified <condition> evaluates to a True boolean value.

if-condition

For multiple else condition, we can use the keyword elif as shown in the diagram below:

if-elif

We can also have multiple nesting if <condition> blocks such as:

nesting-if

Comparison operators#

Let x and y be variable names of any object type such as int, float, or str. Then, each of the comparison statement below would be evaluated into a Boolean value as described:

x < y will evaluate to True if x is strictly less than y, otherwise it evaluates to False
x >= y will evaluate to True if x is greater than or equal to y, otherwise it evaluates to False
x == y will evaluate to True if x is equal to y, otherwise it evaluates to False
x != y will evaluate to True if x is not equal to y, otherwise it evaluates to False

Logical operators: `and`, `or`, `not`#

Let x and y be variable names of any object type bool. Then, each of the comparison statement below would be evaluated into a Boolean value as described:

not x will evaluate to True if x is False; it evaluates to False if x is True
not y will evaluate to False if y is True; it evaluates to True if y is False
x and y will evaluate to False if x is False and y is True x or y will evalute to True if x is False and y is True

logictable

Below are some examples of comparison expression. For more examples on Python comparisson expression and conditional statements, see the W3 tutorials.

pset_time = 15
sleep_time = 8
print (sleep_time > pset_time)

drive = True
drink = False
both = drink and drive
print(both)

False
False

Getting user’s input from the console#

We use the built-in function input() to get user’s input. For example,

>>> x = input("type anything: ") >>> print(x)

Note

input() returns user’s input as str object type (i.e. string). If we want a numeric object type, we need to cast the input to the numeric object type we want. We need to ensure that the entered input is a numerical character(s)

>>> y = input("type a number: ") >>> print(type(y) >>> print(f"your input number is: {y}") >>> print(y*3) >>> print(int(y)*3)

Indentation and code block#

In Python script, text indentation matters because it defines code blocks that need to be evaluated. Python style guideline suggests the use of 4-spaces to define a separate code block. (In Spyder and other Python aware editor this is usually provided by a single Tab keystroke)

In the diagram below, we have 3 code blocks shown in three different highlight colours.

x = float("8")
y = float("7") 
if x == y: 
    print("x and y are equal") 
    if y != 0: 
        print("therefore, x / y is", x/y) 
elif x < y: 
    print("x is smaller") 
else: 
    print("y is smaller") 
print("thanks!")

y is smaller
thanks!

Assignment operator (=) vs Comparison operator (==)#

Q: What happens if we accidently use = instead of == when we want to do a comparison statement? A: We will get a syntax error message.

x = float("10") 
y = float(input("12")) 

# Note: the above line should be `y = float(input("Enter a number for y: "))` in a real scenario
# but here we are using a fixed value for demonstration purposes

if x = y: # Note: this line should be `if x == y:` in a real scenario
    print("x and y are equal") 
    if y != 0:
        print("therefore, x / y is", x/y) 
if x == y: 
    print("x and y are equal") 
    if y != 0:
        print("therefore, x / y is", x/y) 

  Cell In[11], line 7
    if x = y: # Note: this line should be `if x == y:` in a real scenario
       ^
SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='?

Loop and iteration (Program Control Flow)#

The `while` loop#

<initialisation expression>
while <condition>:
<loop expression>
<loop expression>
...

If <condition> evaluates to True then

all the <loop expression> are evaluated
then the <condition> is checked again
repeat the whole preocess until <condition> evaluates as False.

Note that we need to ensure that at some point the <condition> must evaluate as False (or that there is a break statement in the <loop expression>). We also need to ensure that <initialisation expression> produces a True value for the <condition> to start the while loop.

# See https://sentry.io/answers/print-colored-text-to-terminal-with-python/ for more colours text
RED = '\033[31m'
GREEN = '\033[32m'
RESET = '\033[0m'

print("You are in the Lost Forest")
print(GREEN + "**************************" + RESET)
print(RED + "☺" + RESET)
print(GREEN + "**************************" + RESET)
print("Go 'left' or 'right?'")

n = 'right'
n = input("You're in the Lost Forest. Go left or right? ")
while n != "left": 
    n = input("You're in the Lost Forest. Go left or right? ") 
print("You got out of the Lost Forest!")

Using `while` loop to loop a fixed number of times#

The while loop is useful when we do not have afixed number of iterations in mind. However, we can still use it even for when we have a fixed number of times.

# Loop 5 times to print a sequence of nunber from 0 to 4
n = 0
while n < 5:
    print (n)
    n+=1 # note: n+=1 is equivalent to n = n + 1

The `for` loop#

for <variable> in range(<some_number>): 
    <loop expression>
    <loop expression>
    ...

In each for loop iteration:

<variable> takes a value from range(), starting fro the lower bound of the range()
<loop expression>’s are evaluated
for the next iteration, <variable> takes the previous value plus a specified step value
repeat until the upper bound of range() is reached.

The range() “function” syntax:

range(<stop>)
range(<start>, <stop>, <step>)
<stop> must be specified; <start> and <step> are optional. The default values: <start> = 0 and <step> = 1
The loop continues until the value of the range() evaluates to (<stop> – 1)

(for more examples see: W3 for-loop tutorials)

runsum = 0
for i in range(11):
    runsum += i
    print(i, runsum)
print(runsum)

runsum = 0
for i in range(1, 11):
    runsum += i
    print(i, runsum)
print(runsum)

runsum = 0
for i in range(2,11,2):
    runsum += i
    print(i, runsum)
print(runsum)

When to use `for` loop and when to use `while` loop#

`for` loop	`while` loop
use this if no. of iterations is fixed (known)	use this if unbounded no. of iterations
can be rewritten as a `while` loop	may not be able to rewrite it as `for` loop
need to use a counter	can use a counter, but it must be initialised prior the loop and updated

# for loop 5 times and print
for n in range(5):
    print(f"n = {n}")

# while loop 5 times and print
n = 0 # if counter is used, must be initialised
while n < 5:
    print(f"while loop's n = {n}")
    n += 1 # counter must be updated

n = 0
n = 1
n = 2
n = 3
n = 4
while loop's n = 0
while loop's n = 1
while loop's n = 2
while loop's n = 3
while loop's n = 4

The `break` statement#

We use break to immediately exit whatever loop in the code block the break statement is encountered. It will skip any remaining expression in the code block. However, note that for nested loops, the immediate exit is only for the innermost loop!

In the while loop example below, the <expression_b> is never evaluated after break is reached.

while <condition_1>:
    while <condition_2>:
        <expression_a>
        break
        <expression_b>
    <expression_c>

# break in a nested while loop
n = 0
while n<5:
    while n>2:
        print("within the break code block, but before break")
        break
        print("within the break code block, but after break")
    print("outside the break code block", "n =", n)
    n+=1

# conditional break from a loop as shown
# Q: How many iteration(s) would the for-loop below run?
mysum = 0
for i in range(5, 11, 2):
    mysum += i
    if mysum == 5:
        break
        mysum +=1
print(mysum)

outside the break code block n = 0
outside the break code block n = 1
outside the break code block n = 2
within the break code block, but before break
outside the break code block n = 3
within the break code block, but before break
outside the break code block n = 4
5

String manipulation#

Some string operators#

The + operator concatenate strings, returning a string consisting of joined operand strings

>>> print( “abc” + “def” + “ghi”)
abcdefghi

The * operator creates multiple copies of a string \(n\) number of times:

s = “Go.”
n = 4
s * n # evaluates to “Go.Go.Go.Go”
4 * s # evaluates to “Go.Go.Go.Go"

The in operator returns True if the first operand is contained within the second.

s = “burn”
s in “Swinburne” # evaluates to True
s not in “Melbourne” # evaluates to True

print("abc" + "def" + "ghi")

s = "Go."
n = 4
print(s * n)
print(4 * s)

s = "burn"
print(s in "Swinburne")
print(s not in "Melbourne")

abcdefghi
Go.Go.Go.Go.
Go.Go.Go.Go.
True
True

String indexing#

With string indexing we can refer to the specific characters in certain positions that form the string. For example,

y = "foobar"

The indexing for the string y = "foobar" is shown in the diagram below:
foobar

Notice that tHe diagram shows for the 6-character long “foobar”, we can use a positive indexing (0 to 5) or negative indexing (-6 to -1).

To perform string indexing, we specify the indexes we want to refer to in a square bracket. For examples:

y[0] # evaluates to "f" the first index position of the string "foobar"
y[-1] # evaluates to "r" the last index position of the string "foobar"
y[-3] # evaluates to "b"
y[3] # evaluates to "b"

y = "foobar"
print(f"y = , {y}")
print(f"y[0] = {y[0]}") # evaluates to "f" the first index position of the string "foobar"
print(f"y[-1] = {y[-1]}") # evaluates to "r" the last index position of the string "foobar"
print(f"y[-3] = {y[-3]}") # evaluates to "b"
print(f"y[3] = {y[3]}") # evaluates to "b"

y = , foobar
y[0] = f
y[-1] = r
y[-3] = b
y[3] = b
y[1:2] = o
y[-4:-1] = oba

String length#

We can use the multi-purpose function len() to get the length (i.e. count number of characters) of a string.

ystr = "tomorrow"
print(len(ystr))
for i in range(0,len(ystr)):
    print("Position:", i, "Character =", ystr[i])

8
Position: 0 Character = t
Position: 1 Character = o
Position: 2 Character = m
Position: 3 Character = o
Position: 4 Character = r
Position: 5 Character = r
Position: 6 Character = o
Position: 7 Character = w

String slicing#

String slicing is when we want to refer to more than just one character position in a string. This is performend by specifying a range of the indexing positions using: [<start>:<stop>:<step>] or [<start>:<stop>] (where in this case, <step> is assumed 1).

Note: As shown in the example below, sliced string does not include the element of the stop position

y = "foobar"
y[1:2] # evaluates to "o"
y[-4:-1] # evaluates to "oba"

zstr = “abcdefgh”
        01234567
zstr[0:4:2] #evaluates to “ac”
zstr[2:6] #evaluates to “cdef”
zstr[:] #evaluates to “abcdefgh”
zstr[::] #evaluates to “abcdefgh” (the same as zstr[0:len(zstr):1])
zstr[::2] #evaluates to “aceg”
zstr[::-1] #evaluates to “hgfedcba”

y = "foobar"
print(f"y = {y}")
print(f"y[1:2] = {y[1:2]}") # evaluates to "o"
print(f"y[-4:-1] = {y[-4:-1]}") # evaluates to "oba"

zstr = "abcdefgh"
print(zstr[0:4:2]) #evaluates to “ac”
print(zstr[2:6]) #evaluates to “cdef”
print(zstr[:]) #evaluates to “abcdefgh”
print(zstr[::]) #evaluates to “abcdefgh” (the same as zstr[0:len(zstr):1])
print(zstr[::2]) #evaluates to “aceg”
print(zstr[::-1]) #evaluates to “hgfedcba”
print(zstr[4:1:-2]) #evaluates to “ec”

y = foobar
y[1:2] = o
y[-4:-1] = oba
ac
cdef
abcdefgh
abcdefgh
aceg
hgfedcba
ec

Strings are immutable object#

Because string is immutable, we cannot assign any specific element position of a string. For example, the following statement will generate an error message: “TypeError: ‘str’ object does not support item assignment”.

If we want to modify the value of a string variable, we meed to to rebind the variable with the whole new string as the new value or using the combinration of the new string element and sliced portions of the original string value.

stringrebind

s = "my string"
s[0] = "M" # This will result in TypeError message because s is immutable

#full rebind 
s2 = "My string"

s3 = "M" + s[1:len(s)]

s = "my string"
s[0] = "M" # This will result in TypeError message because s is immutable

#full rebind 
s2 = "My string"

s3 = "M" + s[1:len(s)]

print("s = ", s)
print("s2 = ", s2)
print("s3 = ", s3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[19], line 2
      1 s = "my string"
----> 2 s[0] = "M" # This will result in TypeError message because s is immutable
      4 #full rebind 
      5 s2 = "My string"

TypeError: 'str' object does not support item assignment

s = "my string"

#full rebind 
s2 = "My string"

s3 = "M" + s[1:len(s)]

print("s = ", s)
print("s2 = ", s2)
print("s3 = ", s3)

s =  my string
s2 =  My string
s3 =  My string

Iterating string using a `for-loop`#

Recall, we use for loop to iterate over number sequence using the range() “function”. However, the for loop can also be used to iterate over other elements (not just numbers) as shown in the examples below.

s = "CTI Swinburne"

#standard for loop
for index in range(len(s)):
    if s[index] == 'I' or s[index] == 'u':
        print("There is an I or u")

#a more pythonic for loop to work with strings
for char in s:
    if char == "I" or char == "u":
        print("There is an I or u")

There is an I or u
There is an I or u
There is an I or u
There is an I or u

Lecture 1: Introduction to Data Science & Python

Contents

Lecture 1: Introduction to Data Science & Python#

What is programming using a computer language (such as Python)?#

Syntax and semantics#

Primitives#

Semantics#

Consequences of different types of errors#

Programming in Python: A general overview#

Python script#

Object Oriented Programming (OOP)#

Python’s scalar objects#

Type casting#

Python expression#

Python operators for int and float#

Python assignment#

STRING object type#

The print() function#

Branching (Program Control Flow)#

The if <condition> statements#

Comparison operators#

Logical operators: and, or, not#

Getting user’s input from the console#

Indentation and code block#

Assignment operator (=) vs Comparison operator (==)#

Loop and iteration (Program Control Flow)#

The while loop#

Using while loop to loop a fixed number of times#

The for loop#

When to use for loop and when to use while loop#

The break statement#

String manipulation#

Some string operators#

String indexing#

String length#

String slicing#

Strings are immutable object#

Iterating string using a for-loop#

Python operators for `int` and `float`#

The `if <condition>` statements#

Logical operators: `and`, `or`, `not`#

The `while` loop#

Using `while` loop to loop a fixed number of times#

The `for` loop#

When to use `for` loop and when to use `while` loop#

The `break` statement#

Iterating string using a `for-loop`#