What is the best approach for hyperparameter tuning of a neural network? | Coursera Community
Coursera Header

What is the best approach for hyperparameter tuning of a neural network?

  • 8 November 2018
  • 2 replies

Userlevel 5
Badge +5
I am using sklearn's MLPRegressor. Here's what I am interested in knowing:

  1. What are the most important hyperparameters to focus on tuning?
  2. What are the suitable ranges of values for each hyperparameter?
  3. What is the expected results for each hyperparameter? (e.g. if we have a neural network architecture with more nodes we might expect increase accuracy - I guess this comment is more about NN architecture than hyperparameter tuning, do you tune hidden layer sizes in the same way that you tune other hyperparameters?)
  4. For which hyperparameters will tuning potentially increase our chance of overfitting/underfitting?

2 replies

Userlevel 6
Badge +4
regarding #1-3

I don't know the details of that algorithm. But the values you tweak is stuff like how many layers in the network, what the activation function is going to be (e.g relu) and so on. The model itself should learn all those thousands of weights in the hidden layer(s).

To figure out the best params for your problem is most likely going to be a whole lot of trial and error.

regarding #4:

If you split your date into training, dev set, and test set you probably don't have to worry that about over-fitting. (the idea being you experiment with different parameters on the dev set).

If you constantly tweak parameters on the dev set and NOT the test set, then you reduce the risk of 'cherry picking' the best hyperparameters for a given test set.

When tweaking hyperparams in some sense the programmer is his own worst enemy; if you keep on tweaking and use results on the test set as evidence of doing the right thing then you of course risk over-fitting. So really, the trick it to know when to stop.

## And lastly

If you can acquire more data (e.g. via data augmentation techniques) then in a lot of cases more data will outperform hyperparam tweaking. So yeah, don't get carried away with hyperparams when there is a lot more "simple stuff" you could be doing to improve performance.
Userlevel 3
Badge +5
Neural networks has more hyper parameters than other models . Some of them are :
Regularization constant (lambda) , drop-out rate , learning-rate , number of epochs , Optimizer , activation function , additional parameters of optimization function etc . The important ones that affect accuracy are : regularization constant , drop-out rate , learning -rate , number of epochs,optimizer .
I don't have any ideas on exact range of values . Choose the commonly applied values range .
As regularization constant increases the training accuracy decreases and over-fitting decreases, under fitting increases . The same applies for drop-out rate . Increasing number of epochs increases the training set accuracy but increases over-fitting . But dropout rate and regularization rate determines the under-fitting/over-fitting of the model more than other parameters .
I don't choose the network architecture in the same way as tuning other hyper parameters . Try some neural network architectures and choose one of them . Tune the hyper parameters for that chosen architecture .
L2_regularization and dropout are the major factors in determining the accuracy in cross-validation and test data set .