Use the Kaggle Credit Card Data set for this exercise.  Use 100K and the entire data set representing fraudulent and non-fraudulent data.   Use the same approach to generate test and training data sets.

1.  Perform ridge and lasso to reduce the input feature set.  Use the reduced feature set to rerun the logistic regression.  Identify the reduced input feature set.

2.  Compare with the raw logistic regression.    The total accuracy for the comparison is not a good measure.  Explain why.  Use other measures to compare the two models.

As explained in class, this credit card data set is unbalanced. Read for a discussion of how to handle unbalanced data sets.

3.  Make a powerpoint presentation of the technique used with unbalanced data in the paper

4.  Use the ROSE package discussed adjust for the imbalance in the credit fraud data.  Run logistic regression with the new data set.  Also check (Links to an external site.) for a more concise explanation. .   >>     data set

"Get 15% discount on your first 3 orders with us"
Use the following coupon

Order Now