Homework 1
Part A. Small Data sets
Coefficients for polynomial obtained using ‘h1_train_small.csv’.
Degree | 0 | 1 | 2 | 3 | 5 | 9 |
w0 | 1.5645e+02 | -8.2819e+01 | 2.7887e+01 | -8.5143e+01 | -6.4201e+03 | 2.0790e+06 |
w1 | 5.0372e+01 | -9.3025e-01 | 7.9680e+01 | 7.8048e+03 | -4.3771e+06 | |
w2 | 5.4003e+00 | -1.2541e+01 | -3.6479e+03 | 4.0283e+06 | ||
w3 | 1.2590e+00 | 8.2759e+02 | -2.1283e+06 | |||
w4 | -9.0909e+01 | 7.1185e+05 | ||||
w5 | 3.8820e+00 | -1.5642e+05 | ||||
w6 | 2.2595e+04 | |||||
w7 | -2.0704e+03 | |||||
w8 | 1.0927e+02 | |||||
w9 | -2.5320e+00 |
Plots of curves obtained of degrees 0, 1, 2, 3, 5 and 9
Part B: Validation
Error(Erms) Table for different data sets
Degree | 0 | 1 | 2 | 3 | 5 | 9 |
Training Set | 74.00987883822417 | 15.62451580462774 | 12.16090809108962 | 11.8421804238779 | 3.842905029648383 | 2.328903479852792e-07 |
Test Set | 629.410744713379 | 281.2525954828554 | 24.91328131435892 | 631.0638739897187 | 162838.1603881223 | 827952039.4361696 |
Validation Set | 171.8771193327797 | 44.85674911028034 | 11.09053468952674 | 32.8677297430707 | 1585.756131451526 | 216579.8980638061 |
Observation: Clearly for N=2 the error is minimum for validation set. Hence the optimal choice for N is 2 having root-mean-square error = 11.09053468952674.
Part C:
Coefficients for polynomial obtained using ‘h1_train_big.csv’.
Degree | 0 | 1 | 2 | 3 | 5 | 9 |
w0 | 256.3838619855513 | -166.2835748293977 | -19.49938856966844 | 27.56209951950596 | 353.0840806603857 | 2168.675007713666 |
w1 | 67.62678989039185 | 14.05591899268045 | -12.82358692326629 | -321.4112251789422 | -3575.612235002031 | |
w2 | 4.285669671816916 | 8.929589949019823 | 119.167071304206 | 2619.872467107589 | ||
w3 | -0.2476757481174882 | -18.91453318823929 | -1098.089751683805 | |||
w4 | 1.508508173295082 | 289.2515809628427 | ||||
w5 | -0.04683596924199019 | -49.14334550418263 | ||||
w6 | 5.355510632198883 | |||||
w7 | -0.3597561824547386 | |||||
w8 | 0.01348302183778243 | |||||
w9 | -0.0002141957434497342 |
Error(Erms) Table for different data sets
Degree | 0 | 1 | 2 | 3 | 5 | 9 |
Training Set | 149.8744235246006 | 21.28235256556897 | 10.62120040139233 | 10.42105532830979 | 10.3001406393962 | 10.25676188107689 |
Test Set | 542.4778231210471 | 166.9298734733813 | 19.2005590032111 | 77.04968265996644 | 648.6526413605375 | 4311.27782863249 |
Validation Set | 129.1376272432213 | 16.41940358517186 | 10.43865816178575 | 10.68620609463883 | 10.0173140877985 | 10.13029630990042 |
Choice of N
Here the rms error is minimal for degrees 2, 3, 5, 9. If we were to select optimal N using training set only then the best choice would be 9. But using N=2 we can get approximately same result but with low complexity. Hence optimal choice for N is 2.
Using validation set also we have 5 as the best possible choice for N. But giving the same argument, N=2 appears to be the best choice. Hence, the validation set here is not much use as the error variation in both cases is very low.
Analysis
The best guess for target function will be using N=2. Which gives the function:
y = 4.285669671816916*x^2+14.05591899268045*x-19.49938856966844
The test set has larger error fluctuations as compared to the validation set. The reason for this can be attributed to the fact that the time range is approximately the same for training set and the validation set. But in case of test set, the range slightly goes out of bound for training set. Since the polynomial is not obtained using data in that range the error is high even though there were large number of data points available for training.
To download the octave scripts used in this assignment click here.