##### 🍐 我们总结了美国代写中——CS代写的经典案例，如果你有任何Python代写的需要，可以随时联络我们。CoursePear™ From @2009。

Instruction

CMPSC 448: Machine Learning and AI

Homework 2 (Due Sunday, Feb 20th)

This HW includes both theory and implementation problems:

- You cannot look at anyone else’s code.
- Your homework must work with Python 3.5+ (you may install the Anaconda distribution of Python)
- You need to submit a report including all deliverable and figures (in PDF format in Gradescope), also three files Problem4.ipynb,homework2.py, and homework2plot.py in Canvas
- The only modules your code can import are: math, numpy, matplotlib, random, pandas, sklearn Theory and problem solving Problem 1 (20 points). In the lectures, we showed that the MLE estimator for linear regression when the random noise for each data point is identically and independently distributed (i.i.d.) from a Guassian distribution N (0, σ2) reduces to the linear regression with squared loss (OLS). In this problem, we would like to change the distribution of the noise model and derive the MLE optimization problem. In particular, let’s assume for a fixed unknown linear model w∗ ∈ Rd the response yi for each data point xi ∈ Rd in training data S = {(x1, y1), (x2, y2), . . . , (xn, yn)} is generated by y i = w ⊤∗ x i + ε i ,

where εi is generated i.i.d. from a Laplace distribution εi ∼ Laplace(0, σ) with density function:

( ) exp(ε) ifε<0 P(ε|0,σ)= 1 exp −|ε| = 1 ( σ )

2σ σ 2σexp −σε ifε≥0

for ε sampled from Laplace(0, σ).

Under above assumption on the noise model,

- Show that each yi, i = 1, 2, . . . , n is a random variable that follows Laplace(w⊤∗ xi, σ) distribution.
- Write down the MLE estimator for training data in S and derive the final optimization problem. Note that you just need to state the final minimization problem and not its solution.
- Compare the obtained optimization problem to one obtained under Gaussian noise model and highlight key differences. Problem 2 (20 points). Suppose we run a ridge regression with regularization parameter λ on a training data with a single feature S = {(x1, y1), (x2, y2), . . . , (xn, yn)} , and get coeﬀicient w1 ∈ R (for simplicity, we assumed the data are centered and no bias (intercept) term is needed).

We now include an exact copy of first feature to get a new training data as

S′ ={([x1,x1]⊤,y1),([x2,x2]⊤,y2),…,([xn,xn]⊤,yn)}

where each training example is a 2 dimensional vector with duplicate coordinates, refit our ridge regression

and get the solution [w1′ , w2′ ]⊤ ∈ R2 .

• Derive the optimal solution for [w1′ , w2′ ]⊤ and show that w1′ = w2′ . • What is relation between w1′ and w1.

Problem 3 (20 points). As we discussed in the lecture, the Perceptron algorithm will only converge if the data is linearly separable, i.e., there exists a linear classifier that can perfectly classify the training data.

If the training data S = {(x1, y1), (x2, y2), . . . , (xn, yn)} where xi ∈ Rd and yi ∈ {−1, +1} is not linearly separable, then there is a simple trick to force the data to be linearly separable and then apply the Perceptron algorithm as follows. If you have n data points in d dimensions, map data point xi to the (d+n)-dimensional point [xi,ei]⊤ ∈ Rd+n, where ei ∈ {0,1}n is a n-dimensional vector of zeros, except for the ith position, which is 1 (e.g., e4 = [0,0,0,1,0,…,0]⊤).

Show that if you apply this mapping, the data becomes linearly separable (you may wish to do so by providing a weight vector w in (d + n)-dimensional space that successfully separates the data).

Programming and experiment

Problem 4 (20 points). In this problem will use the Pima Indians Diabetes dataset from the UCI repository to experiment with the k-NN algorithm and find the optimal value for the number of neighbors k. You do not need to implement the algorithm and encouraged to use the implementation in scikit-learn. Below is a simple code showing the steps to use the NN implementation when the number of neighbors is 3:

from sklearn.neighbors import KNeighborsClassifier # Create NN classifier

knn = KNeighborsClassifier(n_neighbors = 3)

# Fit the classifier to the data knn.fit(X_train,y_train)

# Predict the labels of test data

yhat = knn.predict(X_test)

# Or, directly check accuracy of our model on the test data knn.score(X_test, y_test)

To accomplish this task, please do:

- a) Download the provided Pima.csv data file and load it using pandas. As a sanity check, make sure there are 768 rows of data (potential diabetes patients) and 9 columns (8 input features including Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age, and 1 target output). Note that the data file has no header and you might want to explicitly create the header. The last value in each row contains the target label for that row, and the remaining values are the features. Report the statics of each feature (min, max, average) and the histogram of the labels (target outputs).
- b) Split the data into training and test data with 80% training and 20% test data sizes. You can easily do this in scikit-learn, e.g., from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Use 5-fold cross-validation on training data to decide the best number of neighbours k. To this end, you can use the built in functionality in scikit-learn such as cross_val_score. For k = 1, 2, 3, . . . , 15 compute the 5-fold cross validation error and plot the results (with values of k on the x-axis and accuracy on the y-axis). Include the plot in your report and justify your decision for picking a particular number of neighbors k.

c) Evaluate the k-NN algorithm on test data with the optimal number of neighbours you obtained in previous step and report the test error.

d) Process the input data by subtracting the mean (a.k.a. centralization) and dividing by the standard deviation (a.k.a. standardization) over each dimension (feature), and repeat the previous part and report the accuracy. Do centralization and standardization affect the accuracy? Why?

Running and Deliverable

You are provided with a Problem4.ipynb file to put the implementation code for all parts above. Make sure your notebook is runnable on Anaconda with Python 3.5+. The results and discussions should be included in the PDF file.

Problem 5 (40 points). In this problem, we consider a simple linear regression model with a modified loss function and try to solve it with Gradient Descant (GD) and Stochastic Gradient Descant (SGD).

In general setting, the data has the form {(x1, y1), (x2, y2), . . . , (xn, yn)} where xi is the d-dimensional feature vector and yi is a real-valued target. For this regression problem, we will be using linear prediction w⊤xi with the objective function:

1 ∑n

f ( w ) = n

∑d

g δ ( w , x i , y i ) + λ w j2

i=1

j=1

adding a constant feature, which is always equal to ONE to simplify estimation of the “intercept” term.

Gradient Descent

In this part, you are asked to optimize the above objective function using gradient descent and plot the function values over different iterations, which can be done using the python library matplotlib.

To this end, in Problem5.py, fill in the function bgd_l2(data, y, w, eta, delta, lam, num_iter) where data is a two dimensional numpy array with each row being a feature vector, y is a one-dimensional numpy array of target values, w is a one-dimensional numpy array corresponding to the weight vector, eta is the learning rate, delta and lam are parameters of the objective function. This function should return new weight vector, history of the value of objective function after each iteration (python list).

Run this function for the following settings and plot the history of objective function (you should expect a monotonically decreasing function over iteration number):

• η = 0.05, δ = 0.1, λ = 0.001, num_iter=50 • η = 0.1, δ = 0.01, λ = 0.001, num_iter=50 • η = 0.1, δ = 0, λ = 0.001, num_iter=100 • η = 0.1, δ = 0, λ = 0, num_iter=100

(y − w · x − δ)2 gδ(w, xi, yi) = 0

(y − w · x + δ)2

Please note that we simply dropped the intercept term by the simple trick we discussed in the lecture, i,e.,

if y ≥ w · x + δ if |y − w · x| < δ if y ≤ w · x − δ

Stochastic Gradient Descent

In Problem5.py fill in the function sgd_l2(data, y, w, eta, delta, lam, num_iter, i) where data, y, w, lam, delta and num_iter are same as previous part. In this part, you should use √η as a learning

t

rate, where t is the iteration number, starting from 1. The variable i is for testing the correctness of your function. If i set to −1 then you just need to apply the normal SGD (randomly select the data point), which runs for num_iter, but if i set to something else (other than −1), your code only needs to compute SGD for that specific data point (in this case, the num_iter will be 1!).

Run this function for the settings below and plot the history of objective function: • η = 1, δ = 0.1, λ = 0.5,num_iter=800

• η = 1, δ = 0.01, λ = 0.1, num_iter=800

• η=1,δ=0,λ=0,num_iter=40

• η=1,δ=0,λ=0,num_iter=800 Running and Deliverable

You are provided with a sample data sets (data.py). You can load these files into numpy array using this syntax: np.load(‘data.py’). The sample data for this homework has 100 data points (xi, yi) in a 100 × 2 numpy array. Please note that you need to add a column of all ones to data to handle the intercept term, making your data a 100 × 3 numpy array. For the plotting part, make sure that your plots have appropriate title, x and y-axis labels. You need to submit two python files Problem5.py and problem5Plot.py and include all the plots/results in the PDF file including all the plots. In Problem5.py everything except the imports should be inside the functions definition we mentioned above. The file Problem5Plot.py is where you are generating the plots.

Deliverables

This homework includes both theory and coding problems. You are asked to:

- SubmitaPDFfileincludingtheanswersfortheoryproblemsandtheresultsofcodingproblems/plots/discussion as discussed above, in a single PDF file via Gradescope. Note that your PDF should be graded in- dependently and you need to include all the results.
- All the codes needs to be submitted via Canvas. Make sure your code is running and include enough details about your code.

**CoursePear™提供各类学术服务，Essay代写，Assignment代写，Exam / Quiz助攻，Dissertation / Thesis代写，Problem Set代做等。**