Chapter 1 Fundamentals
1.2 Package Source
1.2.2 PIP
- Package manager for python only
- Compile from source
- PIP stands for ‘Pip Installs Packages’
- It is Python’s officially-sanctioned package manager, and is most commonly used to install packages published on the Python Package Index (PyPI)
- Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).
1.3 Package Management
1.3.1 Anaconda
Conda Environment
Anaconda is a popular package management system for python. Interaction with anaconda is through command prompt “conda”.
conda info ## check the installed conda version and directories
conda list ## list all installed python modules, and its version
Package Installation
Conda is recommended distribution. To install from official conda channel:
conda install <package_name> # always install latest
conda install <package_name=version_number> ## install specific version
## Example
conda install scipy ## official channel
conda install scipy=1.2.3 ## official channel
To install from conda-forge community channel:
conda install -c conda-forge <package_name>
conda install -c conda-forge <package_name=version_number>
## Example: Install From conda community:
conda install -c conda-forge plotnine
conda install -c conda-forge plotnine=1.2.3
1.4 Example Libraries
1.4.1 Built-In Libraries
Here are some of the commonly used built-in libraries.
import string
import datetime as dt
import os
1.4.2 External Libraries
Here are some of the popular external libraries.
numpy
- large multi-dimensional array and matrices
- High level mathematical funcitons to operate on them
- Efficient array computation, modeled after matlab
- Support vectorized array math functions (built on C, hence faster than python for loop and list)
scipy
- Collection of mathematical algorithms and convenience functions built on the numpy extension
- Built upon numpy
Pandas
- Data manipulation and analysis
- Offer data structures and operations for manipulating numerical tables and time series
- Good for analyzing tabular data
- Use for exploratory data analysis, data pre-processing, statistics and visualization
- Built upon numpy
scikit-learn
- Machine learning functions
- Built on top of scipy
matplotlib
- Data Visualization
1.5 Variables
1.5.1 Variables Are Objects
Basic things about variables to keep in mind:
- All variables in python are objects
- Every variable assginment is reference based, that is, each object value is the reference to memory block of data. This is also true when passing variables to function.
In the below example, a, b and c refer to the same memory location:
- Notice when an object assigned to another object, they refer to the same memory location
- When two variable refers to the same value, they refer to the same memory location
= 123
a = 123
b = a
c print ('Data of a =', a,
'\nData of b =',b,
'\nData of c =',c,
'\nID of a = ', id(a),
'\nID of b = ', id(b),
'\nID of c = ', id(c)
)
## Data of a = 123
## Data of b = 123
## Data of c = 123
## ID of a = 3147564800176
## ID of b = 3147564800176
## ID of c = 3147564800176
Changing data value (using assignment) changes the reference
= 123
a = a
b = 456 # reassignemnt changed a memory reference
a # b memory reference not changed
print ('Data of a =',a,
'\nData of b =',b,
'\nID of a = ', id(a),
'\nID of b = ', id(b)
)
## Data of a = 456
## Data of b = 123
## ID of a = 3147576585008
## ID of b = 3147564800176
1.5.2 Variable Assignment
Multiple Assignment
Assign multiple variable at the same time with same value. Note that all object created using this method refer to the same memory location.
= y = 'same mem loc'
x ## same value as y
x ## same value as x
y id(x) ## same as id(y)
id(y) ## same as id(x)
## 'same mem loc'
## 'same mem loc'
## 3147577068656
## 3147577068656