 # MechaCar_Statistical_Analysis

• U4_233384
了解作者
• 394.5KB
文件大小
• zip
文件格式
• 0
收藏次数
• VIP专享
资源类型
• 0
下载次数
• 2022-05-21 03:17
上传日期
MechaCar_Statistical_Analysis 项目概况 •该项目涉及使用统计和假设检验来分析汽车行业的一系列数据集。 •所有统计分析和可视化均以R编程语言编写。 •燃油效率（MPG）：燃油效率是车辆每单位燃油可行驶多远的度量。 节油型车辆在行驶给定距离时需要较少的汽油。 由于行程所需的燃料较少，因此从长远来看，节油型汽车可为驾驶员节省更多的钱。 在美国，燃油效率表示为“英里/加仑”（mpg）。 问题是，MechaCars的mpg是否比其他竞争对手好？ 工具与技术 •R编程，R-Studio（R笔记本），tidyverse，ggplot2，统计检验，假设检验 交付品1-线性回归以预测MPG 使用“ lm”函数生成多因素线性回归，以预测因变量：基于5个独立变量的“ MPG” ：（车辆长度+车辆重量+扰流板角度+ AWD +地面间隙） •由于此模型中没有特征选择，因此我们将看到一 MechaCar_Statistical_Analysis-main.zip
• MechaCar_Statistical_Analysis-main
• Resources
• Suspension_Coil.csv
2.5KB
• MechaCar_mpg.csv
3KB
• lm_output.PNG
24.6KB
• Images
• plot2.PNG
16.6KB
• lots.PNG
5.7KB
• LM_equation.PNG
9.1KB
• lot2.PNG
7.4KB
• plot1.PNG
16.4KB
• lot3.PNG
7.3KB
• lot1.PNG
7.4KB
• LM_stats.PNG
23.5KB
7.3KB
• MechaCarChallenge.Rmd
3.1KB
• MechaCarChallenge.html
782.2KB
• .RData
22KB
• .Rhistory
28.2KB

# MechaCar_Statistical_Analysis ## Project Overview • This project involves the use of statistics and hypothesis testing to analyze a series of datasets from the automotive industry. • All the statistical analysis and visualizations is written in the R programming language. • Fuel efficiency (MPG): Fuel efficiency is a measure of how far a vehicle can travel per unit of fuel. Fuel efficient vehicles require less gas to go a given distance. Because less fuel is required to cover a journey, fuel-efficient cars save more money for drivers in the long-term. In the USA, fuel efficiency is expressed as "miles per gallon" (mpg). The question is whether the mpg of MechaCars is better than other competitors or not? ## Tools and Techniques • R-programming, R-Studio (R-notebook), tidyverse, ggplot2, Statistical Tests, Hypothesis Tests ## Deliverable 1 - Linear Regression to Predict MPG Generate a multi-factor linear regression using the "lm" function, to predict a dependent variable: "MPG" based on 5 independent variables: (vehicle_length + vehicle_weight + spoiler_angle + AWD + ground_clearance) • Since there was no feature selection in this model, we will see that some variables are less relevant to include in the model. • For the overall model, we see that we have a R-squared of .7149 which mean that this model with our given dataset, our five independent variables can explain about **71%** of what determines (mpg). Which in general, is a satisfactory model to use. This model can be improved by simply including more effective variables to explain our dependent variable (mpg) through data collection. • Our model rejects the "null hypothesis" that the slope is zero, since the relationship between our Y (MPG) and our independent variables is not zero for some of the variables analyzed below. Our alternative hypothesis says that our intercept does not equal zero which is true. ![](Images/LM_equation.PNG) ### Independent Variable Analysis (P-value): 1. vehicle_lenght . P-value of about 0 shows that is statistically significant to this model. 2. vehicle_weight . P-value of about 0.0776 shows that is statistically significant to this model if we assume a higher level of alpha (.1 instead of .05). Usually alpha is set at 0.05 (5%) 3. spoiler_angle . P-value of .3069 shows that is NOT statistically significant. Therefore, we will not include in this model. 4. ground_clearance . P-value of about 0 shows that is statistically significant to this model. 5. AWD . P-value of .3069 shows that is NOT statistically significant. Therefore, we will not include in this model. Hence, the variables/coefficients that provided a **non-random** amount are "vehicle_length", "ground_clearance" and "vehicle_weight" (if we assume alpha at .1) to "mpg". The intercept has a value of -104. ### The equation: mpg = [(6.27) * vehicle_length] + [(1.25e-3) * vehicle_weigth] + [(6.88e-2) * spoiler_angle] + [(-3.41) * AWD] + [3.55 * ground_clearance] - (1.04e+2) ### Approximate to: mpg = [(6.27) * vehicle_length] + [(-3.41) * AWD] + [(3.55)] * ground_clearance] + (-104) As, R-square is 0.71 so 71% of the variations in mpg can be explained by changes in the vehicle length, the vehicle weight, the spoiler angle, the drivetrain, and the ground clearance. We can consider this linear model as efficient to predict mpg of MechaCar prototypes. The p-value of the linear model was 5.35e-11 which is smaller than the assumed significance level of .05%, so we reject that null hypothesis, and accept the alternative hypothesis that the model has a slope. Consider running another model using only the two variables that are non-random to "MPG": vehicle_length and vehicle_weight. ### Summary Statistics Table: ![](Images/LM_stats.PNG) ## Deliverable 2 - t-test on Suspension Coils ## Suspension Coil t-test In this case, the one-sample t-test has been used to assert if there is a statistical difference between the means of a sample dataset (suspension coil's pound-per-inch) and hypothesized, potential population dataset. The mean of the hypothesized, potential population dataset is given as 1,500 pounds per inch. Assuming the significance level of 0.05 percent, the p-value (0.06028) is above our significance level. The data is considered to have normal distribution. Therefore, we do not have sufficient evidence to reject the null hypothesis, and we would state that the two means are statistically similar. ![](Images/pop_lots.PNG) Lot Summary #### Design test of variance must *NOT* exceed 100 pounds per inch As per summary table below, the metrics are: Mean = 1498.78, Median = 1500 and Variance = 62.293656, with STD = 7.8926 ![](Images/sum_test.PNG) Summary Stats Table of Population The mean and the median are nearly the same. Hence, we can assume the dataset has a normal distribution and a zero skewness. This can visually be seen by the plot below. ![](Images/plot1.PNG) Distribution PSI vs. Density The design specifications for the MechaCar suspension coils dictates that the variance of the suspension coil must NOT exceed 100 pounds per inch. Based on the t-test, the variance is about 62.29 pounds per inch which is below the specified value. Hence, the current manufacturing data meets the design specification. Now, population means can never be known but just for this case there is one in place to gather some insight on pounds per square inch per Lot. Let see if the lot are statistically significant/ different from the predetermined population mean of 1500. ### Lot 1 vs. Population Mean: The t-test has a p-value of 0.9048, that is not statistically significant: so, we do not have enough evidence to reject the "null hypothesis". Lot#1 and Population mean are statistically similar. ![](Images/lot1.PNG) ### Lot 2 vs. Population Mean: The t-test has a p-value of 0.3451, that is not statistically significant: so, we do not have enough evidence to reject the "null hypothesis". Lot#2 and Population mean are statistically similar. ![](Images/lot2.PNG) ### Lot 3 vs. Population Mean: The t-test has a p-value of 0.637, that is not statistically significant: so, we do not have enough evidence to reject the "null hypothesis". Lot#3 and Population mean are statistically similar. ![](Images/lot3.PNG) ## Study Design • Increase the test by adding different categorical car data (SUV, sports car, Pickups). • Compare MechaCar dataset against the competitor’s dataset to establish that there is a statistical difference between MechaCar and Non-MechaCars products. We want to reject our null hypothesis that there is not a difference between the two groups (MechaCars and Competition) and accept our alternative that there is a difference between the two groups. The null hypothesis is that the means of mpg of all groups are equal, and the alternative hypothesis is that at least one of this means is different from all other groups. • Conduct some ANOVA test among various manufacturers versus MechaCars, analyzing more ownership factors: MPG, maintenance, depreciation, horsepower, reliability and so on. With ANOVA the tests are done individually so some car types might or might not be statistically significant. • Further explore more data: As the intercept on our linear model was statistically significant, that might be other variables and factors (not included in this dataset), that might contribute to the variation in (Miles per Gallon) "MPG".   相关推荐