Chapter 1 Fundamentals

1.1 Library Management

1.2 Package Source

1.2.1 Conda

  • Package manager for many languages
  • Install binaries

1.2.2 PIP

  • Package manager for python only
  • Compile from source
  • PIP stands for ‘Pip Installs Packages’
  • It is Python’s officially-sanctioned package manager, and is most commonly used to install packages published on the Python Package Index (PyPI)
  • Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).

1.3 Package Management

1.3.1 Anaconda

Conda Environment

Anaconda is a popular package management system for python. Interaction with anaconda is through command prompt “conda”.

conda info ## check the installed conda version and directories
conda list ## list all installed python modules, and its version

Package Installation

Conda is recommended distribution. To install from official conda channel:

conda install <package_name>                 # always install latest
conda install <package_name=version_number>  ## install specific version

## Example
conda install scipy        ## official channel
conda install scipy=1.2.3  ## official channel

To install from conda-forge community channel:

conda install -c conda-forge <package_name>
conda install -c conda-forge <package_name=version_number>

## Example: Install From conda community:
conda install -c conda-forge plotnine
conda install -c conda-forge plotnine=1.2.3

1.3.2 PIP

PIP is python open repository (not part of conda). Use pip if the package is not available in conda.

Package Version

pip list ## list all installed module

Package Installation

pip install <package_name>
pip install <package_name=version_numner>

## Example: 
pip install plydata
pip install plydata=1.2.3

1.4 Example Libraries

1.4.1 Built-In Libraries

Here are some of the commonly used built-in libraries.

import string
import datetime as dt
import os

1.4.2 External Libraries

Here are some of the popular external libraries.

numpy

  • large multi-dimensional array and matrices
  • High level mathematical funcitons to operate on them
  • Efficient array computation, modeled after matlab
  • Support vectorized array math functions (built on C, hence faster than python for loop and list)

scipy

  • Collection of mathematical algorithms and convenience functions built on the numpy extension
  • Built upon numpy

Pandas

  • Data manipulation and analysis
  • Offer data structures and operations for manipulating numerical tables and time series
  • Good for analyzing tabular data
  • Use for exploratory data analysis, data pre-processing, statistics and visualization
  • Built upon numpy

scikit-learn

  • Machine learning functions
  • Built on top of scipy

matplotlib

  • Data Visualization

1.5 Variables

1.5.1 Variables Are Objects

Basic things about variables to keep in mind:

  • All variables in python are objects
  • Every variable assginment is reference based, that is, each object value is the reference to memory block of data. This is also true when passing variables to function.

In the below example, a, b and c refer to the same memory location:

  • Notice when an object assigned to another object, they refer to the same memory location
  • When two variable refers to the same value, they refer to the same memory location
a = 123
b = 123  
c = a
print ('Data of a =',  a,
       '\nData of b =',b,
       '\nData of c =',c,
       '\nID of a = ', id(a),
       '\nID of b = ', id(b),
       '\nID of c = ', id(c)
)
## Data of a = 123 
## Data of b = 123 
## Data of c = 123 
## ID of a =  3147564800176 
## ID of b =  3147564800176 
## ID of c =  3147564800176

Changing data value (using assignment) changes the reference

a = 123
b = a
a = 456  # reassignemnt changed a memory reference
         # b memory reference not changed
print ('Data of a =',a,
     '\nData of b =',b,
     '\nID of a = ', id(a),
     '\nID of b = ', id(b)
)
## Data of a = 456 
## Data of b = 123 
## ID of a =  3147576585008 
## ID of b =  3147564800176

1.5.2 Variable Assignment

Multiple Assignment

Assign multiple variable at the same time with same value. Note that all object created using this method refer to the same memory location.

x = y = 'same mem loc'
x       ## same value as y
y       ## same value as x
id(x)   ## same as id(y)
id(y)   ## same as id(x)
## 'same mem loc'
## 'same mem loc'
## 3147577068656
## 3147577068656

Augmented Assignment

y = 10
y += 1   ## shortcut for y = y + 1
y
## 11

Unpacking Assingment

Assign multiple value to multiple variabels at the same time.

x,y = 1,3
x    ## unpacked to 1
y    ## unpacked to 3
## 1
## 3