Homework 1

Part A. Small Data sets

Coefficients for polynomial obtained using ‘h1_train_small.csv’.

        

Degree

0

1

2

3

5

9

w0

1.5645e+02

-8.2819e+01

 2.7887e+01

-8.5143e+01

-6.4201e+03

 2.0790e+06

w1

5.0372e+01

-9.3025e-01

7.9680e+01

7.8048e+03

-4.3771e+06

w2

5.4003e+00

-1.2541e+01

-3.6479e+03

4.0283e+06

w3

1.2590e+00

8.2759e+02

-2.1283e+06

w4

-9.0909e+01

7.1185e+05

w5

3.8820e+00

-1.5642e+05

w6

2.2595e+04

w7

-2.0704e+03

w8

 1.0927e+02

w9

-2.5320e+00

small_train_VS_test.png

small_train.pngsmall_test.png

Plots of curves obtained of degrees 0, 1, 2, 3, 5 and 9

small-0.pngsmall-1.pngsmall-2.pngsmall-3.pngsmall-5.pngsmall-9.png

Part B: Validationsmall_validate.png

small_train_VS_validate.png

Error(Erms) Table for different data sets

Degree

0

1

2

3

5

9

Training Set

74.00987883822417

15.62451580462774

12.16090809108962

11.8421804238779

3.842905029648383

2.328903479852792e-07

Test Set

629.410744713379

281.2525954828554

24.91328131435892

631.0638739897187

162838.1603881223

827952039.4361696

Validation Set

171.8771193327797

44.85674911028034

11.09053468952674

32.8677297430707

1585.756131451526

216579.8980638061

Observation: Clearly for N=2 the error is minimum for validation set. Hence the optimal choice for N is 2 having root-mean-square error = 11.09053468952674.

Part C:

        big_train_VS_test.pngbig_train.pngbig_test.png

big_train_VS_validate.pngbig__validate.png

Coefficients for polynomial obtained using ‘h1_train_big.csv’.

        

Degree

0

1

2

3

5

9

w0

256.3838619855513

-166.2835748293977

-19.49938856966844

27.56209951950596

353.0840806603857

2168.675007713666

w1

67.62678989039185

14.05591899268045

-12.82358692326629

-321.4112251789422

-3575.612235002031

w2

4.285669671816916

8.929589949019823

119.167071304206

2619.872467107589

w3

-0.2476757481174882

-18.91453318823929

-1098.089751683805

w4

1.508508173295082

289.2515809628427

w5

-0.04683596924199019

-49.14334550418263

w6

5.355510632198883

w7

-0.3597561824547386

w8

 0.01348302183778243

w9

-0.0002141957434497342

Error(Erms) Table for different data sets

Degree

0

1

2

3

5

9

Training Set

149.8744235246006

21.28235256556897

10.62120040139233

10.42105532830979

10.3001406393962

10.25676188107689

Test Set

542.4778231210471

166.9298734733813

19.2005590032111

77.04968265996644

648.6526413605375

4311.27782863249

Validation Set

129.1376272432213

16.41940358517186

10.43865816178575

10.68620609463883

10.0173140877985

10.13029630990042

Choice of N

Here the rms error is minimal for degrees 2, 3, 5, 9. If we were to select optimal N using training set only then the best choice would be 9. But using N=2 we can get approximately same result but with low complexity. Hence optimal choice for N is 2.

Using validation set also we have 5 as the best possible choice for N. But giving the same argument, N=2 appears to be the best choice. Hence, the validation set here is not much use as the error variation in both cases is very low.

big-0.pngbig-1.pngbig-2.pngbig-3.pngbig-5.pngbig-9.png

Analysis

The best guess for target function will be using N=2. Which gives the function:

y = 4.285669671816916*x^2+14.05591899268045*x-19.49938856966844

The test set has larger error fluctuations as compared to the validation set. The reason for this can be attributed to the fact that the time range is approximately the same for training set and the validation set. But in case of test set, the range slightly goes out of bound for training set. Since the polynomial is not obtained using data in that range the error is high even though there were large number of data points available for training.

          

To download the octave scripts used in this assignment click here.