You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
12 KiB
12 KiB
Quantitative Methods
- Title slide
- Day 1
- Title slide
- What do we aim to achieve in this course?
- Two Types of Statistics
- Descriptive Statistics
- Graphical Displays of Quantitative Information: Common Pitfalls
- Common uses of statistical graphics
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- The problem multiplied with the coming in of spreadsheets
- Graphical Displays of Quantitative Information: Dispersion
- Day 2
Title slide slide
(org-show-animate '("Quantitative Methods, Part-II" "Vikas Rawal" "Prachi Bansal" "" "" ""))
Day 1
Title slide
(org-show-animate '("Why do financial journalists need to know quantitative methods?" "" "" ""))
What do we aim to achieve in this course? slide
Make friends with numbers
Learn how to read numbers, how to present them, and how to write about them
Learn how to use computers to work with numbers
Two Types of Statistics slide
Descriptive Statistics
Use summaries of data for the entire population to describe a population
Use summaries of sample data to describe a sample
Inferential Statistics
Use sample data to describe a population
Descriptive Statistics slide
- Frequency
- Measures of central tendency
- Summary positions
- Measures of dispersion
Frequency slide
library(data.table)
data.table(names=c("Anil","Neeraj","Savita","Srimati",
"Rekha","Pooja","Alex","Shahina",
"Ghazal","Lakshmi","Rahul","Shahrukh",
"Naman","Deepak","Shreya","Rukhsana"
),
salary=c(71,50,65,40,
45,42,46,43,
45,43,45,45,
850,100,46,48
)*1000,
sex=c("M","M","F","F",
"F","F","M","F",
"F","F","M","M",
"M","M","F","F"
))->workers
workers$sno<-c(1:nrow(workers))
workers[,.(sno,names,sex,salary)]
sno | names | sex | salary |
---|---|---|---|
1 | Anil | M | 71000 |
2 | Neeraj | M | 50000 |
3 | Savita | F | 65000 |
4 | Srimati | F | 40000 |
5 | Rekha | F | 45000 |
6 | Pooja | F | 42000 |
7 | Alex | M | 46000 |
8 | Shahina | F | 43000 |
9 | Ghazal | F | 45000 |
10 | Lakshmi | F | 43000 |
11 | Rahul | M | 45000 |
12 | Shahrukh | M | 45000 |
13 | Naman | M | 850000 |
14 | Deepak | M | 1e+05 |
15 | Shreya | F | 46000 |
16 | Rukhsana | F | 48000 |
workers[,.(frequency=length(sno)),.(sex)]
sex | frequency |
---|---|
M | 7 |
F | 9 |
sex | frequency |
---|---|
M | 7 |
F | 9 |
Measures of Central Tendency slide
workers[,.(mean_salary=round(mean(salary),1),
median_salary=quantile(salary,prob=0.5))]
mean_salary | median_salary |
---|---|
101500 | 45500 |
workers[,.(mean_salary=round(mean(salary),1),
median_salary=quantile(salary,prob=0.5)),.(sex)]
sex | mean_salary | median_salary |
---|---|---|
M | 172428.6 | 50000 |
F | 46333.3 | 45000 |
Measures of Position slide
- First quartile
- Second quartile (median)
- Third quartile
- Deciles
- Quintiles
- Percentiles
Measures of Dispersion slide
Range and other measures based on positions slide
$range=max-min$
min_salary | max_salary | range |
---|---|---|
40000 | 850000 | 810000 |
workers[,.(min_salary=min(salary),
max_salary=max(salary),
range=max(salary)-min(salary))]
Range and other measures based on positions slide
- Distance between any two positions (Deciles, Quintiles, Percentiles) can be used as a measure of dispersion.
$inter.quartile.range=Q3-Q1$
25% 75% 44500 53750 10% 90% 42500 85500 10% 95% 42500 287500 25% 95% 44500 287500 0% 75% 40000 53750
## summary(workers$salary)
quantile(workers$salary,probs=c(0.25,0.75))
quantile(workers$salary,probs=c(0.1,0.9))
quantile(workers$salary,probs=c(0.1,0.95))
quantile(workers$salary,probs=c(0.25,0.95))
quantile(workers$salary,probs=c(0,0.75))
Variance, Standard Deviation and Coefficient of Variation
$variance=\frac{1}{n} \times \sum(x_{i}-x)^{2}$
$standard.deviation = \sqrt{variance}$
$cov=\frac{standard.deviation}{mean}$
workers[,.(var_salary=round(var(salary),1),
sd_salary=round(sqrt(var(salary)),1),
cov_salary=round(sqrt(var(salary))/mean(salary),2))
]
var_salary | sd_salary | cov_salary |
---|---|---|
40075200000 | 200187.9 | 1.97 |
students[,.(var_salary=round(var(salary),1),
sd_salary=round(sqrt(var(salary)),1),
cov_salary=round(sqrt(var(salary))/mean(salary),2)),.(sex)]
sex | var_salary | sd_salary | cov_salary |
---|---|---|---|
M | 89680952381 | 299467.8 | 1.74 |
F | 54500000 | 7382.4 | 0.16 |
Graphical Displays of Quantitative Information: Common Pitfalls
Common uses of statistical graphics slide
- To show trends over time
- To show mid-point variations across categories
- To show composition
- (less commonly, though more usefully) to show/analyse dispersion
Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
The problem multiplied with the coming in of spreadsheets slide



Graphical Displays of Quantitative Information: Dispersion slide
Histogram slide

Histogram with relative densities slide

Boxplot slide
- Invented by John Tukey in 1970
- Many variations proposed since then, though the essential form and idea as remained intact.