You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
14 KiB
14 KiB
Descriptive Statistics
- Title slide
- Descriptive Statistics
- Graphical Displays of Quantitative Information: Dispersion
- Graphical Displays of Quantitative Information: Common Pitfalls
- Common uses of statistical graphics
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright)
- The problem multiplied with the coming in of spreadsheets
- Paul Krugman on Fiscal Austerity
Title slide slide
(org-show-animate '("Quantitative Methods" "Descriptive Statistics" "Vikas Rawal" "Prachi Bansal" "" "" ""))
Descriptive Statistics slide
- Frequency
- Measures of central tendency
- Summary positions
- Measures of dispersion
Frequency slide
library(data.table)
data.table(names=c("Anil","Neeraj","Savita","Srimati",
"Rekha","Pooja","Alex","Shahina",
"Ghazal","Lakshmi","Rahul","Shahrukh",
"Naman","Deepak","Shreya","Rukhsana"
),
salary=c(71,50,65,40,
45,42,46,43,
45,43,45,45,
850,100,46,48
)*1000,
sex=c("M","M","F","F",
"F","F","M","F",
"F","F","M","M",
"M","M","F","F"
))->workers
workers$sno<-c(1:nrow(workers))
workers[,.(sno,names,sex,salary)]
sno | names | sex | salary |
---|---|---|---|
1 | Anil | M | 71000 |
2 | Neeraj | M | 50000 |
3 | Savita | F | 65000 |
4 | Srimati | F | 40000 |
5 | Rekha | F | 45000 |
6 | Pooja | F | 42000 |
7 | Alex | M | 46000 |
8 | Shahina | F | 43000 |
9 | Ghazal | F | 45000 |
10 | Lakshmi | F | 43000 |
11 | Rahul | M | 45000 |
12 | Shahrukh | M | 45000 |
13 | Naman | M | 850000 |
14 | Deepak | M | 1e+05 |
15 | Shreya | F | 46000 |
16 | Rukhsana | F | 48000 |
workers[,.(frequency=length(sno)),.(sex)]
sex | frequency |
---|---|
M | 7 |
F | 9 |
sex | frequency |
---|---|
M | 7 |
F | 9 |
Measures of Central Tendency slide
workers[,.(mean_salary=round(mean(salary),1),
median_salary=quantile(salary,prob=0.5))]
mean_salary | median_salary |
---|---|
101500 | 45500 |
workers[,.(mean_salary=round(mean(salary),1),
median_salary=quantile(salary,prob=0.5)),.(sex)]
sex | mean_salary | median_salary |
---|---|---|
M | 172428.6 | 50000 |
F | 46333.3 | 45000 |
Measures of Position slide
- First quartile
- Second quartile (median)
- Third quartile
- Deciles
- Quintiles
- Percentiles
Measures of Dispersion slide
Range and other measures based on positions slide
$range=max-min$
min_salary | max_salary | range |
---|---|---|
40000 | 850000 | 810000 |
workers[,.(min_salary=min(salary),
max_salary=max(salary),
range=max(salary)-min(salary))]
Range and other measures based on positions slide
- Distance between any two positions (Deciles, Quintiles, Percentiles) can be used as a measure of dispersion.
$inter.quartile.range=Q3-Q1$
25% 75% 44500 53750 10% 90% 42500 85500 10% 95% 42500 287500 25% 95% 44500 287500 0% 75% 40000 53750
## summary(workers$salary)
quantile(workers$salary,probs=c(0.25,0.75))
quantile(workers$salary,probs=c(0.1,0.9))
quantile(workers$salary,probs=c(0.1,0.95))
quantile(workers$salary,probs=c(0.25,0.95))
quantile(workers$salary,probs=c(0,0.75))
Variance, Standard Deviation and Coefficient of Variation
$variance=\frac{1}{n} \times \sum(x_{i}-x)^{2}$
$standard.deviation = \sqrt{variance}$
$cov=\frac{standard.deviation}{mean}$
workers[,.(var_salary=round(var(salary),1),
sd_salary=round(sqrt(var(salary)),1),
cov_salary=round(sqrt(var(salary))/mean(salary),2))
]
var_salary | sd_salary | cov_salary |
---|---|---|
40075200000 | 200187.9 | 1.97 |
students[,.(var_salary=round(var(salary),1),
sd_salary=round(sqrt(var(salary)),1),
cov_salary=round(sqrt(var(salary))/mean(salary),2)),.(sex)]
sex | var_salary | sd_salary | cov_salary |
---|---|---|---|
M | 89680952381 | 299467.8 | 1.74 |
F | 54500000 | 7382.4 | 0.16 |
Graphical Displays of Quantitative Information: Dispersion slide
Histogram slide

Histogram with relative densities slide

Boxplot slide
- Invented by John Tukey in 1970
- Many variations proposed since then, though the essential form and idea as remained intact.
Boxplot of wheat yields slide
Violin plots slide
Boxplots: Useful to identify extreme values slide
Boxplots: Useful for comparisons across categories slide
Violin plots slide
Graphical Displays of Quantitative Information: Common Pitfalls
Common uses of statistical graphics slide
- To show trends over time
- To show mid-point variations across categories
- To show composition
- (less commonly, though more usefully) to show/analyse dispersion
Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation slide

Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
Mis-representation: illustrations from Thomas Piketty's work (source Noah Wright) slide
The problem multiplied with the coming in of spreadsheets slide



Paul Krugman on Fiscal Austerity
What does this graph show? slide
What did Paul Krugman say? slide
"Here’s what fiscal policy should do: it should support demand when the economy is weak, and it should pull that support back when the economy is strong. As John Maynard Keynes said, “The boom, not the slump, is the right time for austerity.” And up until 2010 the U.S. more or less followed that prescription. Since then, however, fiscal policy has become perverse: first austerity despite high unemployment, now expansion despite low unemployment.
How could we better show the relationship between unemployment and fiscal austerity slide
