🍐 我们总结了计量经济学代写中——R语言代写的经典案例,如果你有任何Econometrics代写的需要,可以随时联络我们。CoursePear™ From @2009。
Overview
In this individual project, you will work through various classification metrics. You will be asked to
create functions in R to carry out the various calculations. You will also investigate some functions in
packages that will let you obtain the equivalent results. Finally, you will create graphical output that also
can be used to evaluate the output of classification models, such as binary logistic regression.
Supplemental Material
Applied Predictive Modeling, Ch. 11 (provided as a PDF file).
An Introduction to ROC Analysis (provided as a PDF file).
Web tutorials: http://www.saedsayad.com/model_evaluation_c.htm
Deliverables (60 Points)
Upon following the instructions below, use your created R functions and the other stated packages
to generate the classification metrics for the provided data set. A write-up of your solutions
submitted in PDF format.
Instructions
Complete each of the following steps as instructed:
- Download the classification output data set (attached in Canvas to the assignment).
- The data set has three key columns we will use:
class: the actual class for the observation
scored.class: the predicted class for the observation (based on a threshold of 0.5)
scored.probability: the predicted probability of success for the observation
Use the table() function to get the raw confusion matrix for this scored dataset. Make sure you
understand the output. In particular, do the rows represent the actual or predicted class? The
columns? - Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the accuracy of the predictions.
Accuracy =
TP + TN
TP + FP + TN + FN
- Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the classification error rate of the predictions.
Classification Error Rate =
FP + FN
TP + FP + TN + FN
Verify that you get an accuracy and an error rate that sums to one.
- Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the precision of the predictions.
Precision =
TP
TP + FP - Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the sensitivity of the predictions. Sensitivity is also known as recall.
Sensitivity =
TP
TP + FN
- Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the specificity of the predictions.
Specificity =
TN
TN + FP - Write a function that takes the data set as a dataframe, with actual and predicted classifications
identified, and returns the F1 score of the predictions.
F1 Score =
2 × Precision × Sensitivity
Precision + Sensitivity
- Let’s consider the following question: What are the bounds on the F1 score? Show that the F1
score will always be between 0 and 1. (Hint: If 0 < a < 1 and 0 < b < 1 then ab < a.) - Write a function that generates an ROC curve from a data set with a true classification column
(i.e., class) and a probability column (i.e., scored.probability). Your function should return the plot
of the ROC curve and the calculated area under the ROC curve (AUC). Note that I recommend
using a sequence of thresholds ranging from 0 to 1 at 0.01 intervals. - Use your created R functions and the provided classification output data set to produce all of the
classification metrics discussed above. - Investigate the caret package. In particular, consider the functions confusionMatrix, sensitivity,
and specificity. Apply the functions to the data set. How do the results compare with your own
functions? - Investigate the pROC package. Use it to generate an ROC curve for the data set. How do the
results compare with your own functions?
CoursePear™提供各类学术服务,Essay代写,Assignment代写,Exam / Quiz助攻,Dissertation / Thesis代写,Problem Set代做等。