A simulated dataset containing sales of child car seats at 400 different stores

Carseats

2018-09-18

## ## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats': ## ## filter, lag

## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union

A simulated dataset containing sales of child car seats at 400 different stores

Carseats

A “statistic” is a the result of applying a function (summary) to the data: `statistic <- function(data)`

E.g.Â ranks: Min, Quantiles, Median, Mean, Max

summary (Carseats$Sales)

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 5.390 7.490 7.496 9.320 16.270

*Roughly*, a qua*n*tile for a proportion \(p\) is a value \(x\) for which \(p\) of the data are less than or equal to \(x\). The first qua*r*tile, median, and third qua*r*tile are the qua*n*tiles for \(p=0.25\), \(p=0.5\), and \(p=0.75\), respectively.

library(ggplot2); summary(Carseats$Sales)

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 5.390 7.490 7.496 9.320 16.270

ggplot(Carseats, aes(x="All",y=Sales)) + labs(x=NULL) + geom_boxplot() + coord_flip()

library(ggplot2);library(gridExtra); #boxplot relatives #jitter plot ggplot(Carseats, aes(x="All",y=Sales)) + labs(x=NULL) + geom_jitter(position=position_jitter(height=0,width=0.25)) + coord_flip()

## Construct different histogram of eruption times ggplot(Carseats, aes(x=Sales)) + labs(y="Count") + geom_histogram(aes(y = ..count..))

Proposal:

\[ Y = f(X) + \epsilon \]

- Here is some data
- Tell me what \(f\) is

csform <- Sales ~ Price; csmod <- lm(csform, data=Carseats); print(csmod$coefficients)

## (Intercept) Price ## 13.64191518 -0.05307302

ggplot(Carseats, aes(x = Price, y = Sales)) + geom_point() + geom_smooth(method = lm)