What should i do if there is no data for some features in some training examples? | Coursera Community
Coursera Header

What should i do if there is no data for some features in some training examples?

  • 11 September 2019
  • 1 reply
  • 51 views

Badge +1
I'm working on a machine learning problem with 100,000 training examples and about 100 features.

Y is only 1 number for every training example labeling as 1 or 0.

And i'm going to use basic neural network to solve it.

However, there is no data for some features in some training examples.

For example, feature_1 : 1,0,0,NA,0,0,1,1,1,0,NA,0,1,1,NA,......

What should i do in this situation?

I have three plans for it as a draft. Can someone tell me if these plans would work?

Or maybe you have a better idea?

1.Ignored those NA like this

[1,0.5,NA,0]*[theta1,theta2,theta3,theta4]=1*theta1 + 0.5*theta2 + 0*theta3 +0*theta4

But this will learn NA=0, so i think this would work if there's no 0 in this feature, but fail to work

otherwise.Am i right?

2.adjust NA like this

First, you look at a feature, and it show like this

feature_1 : 1,0,0,NA,0,0,1,1,1,0,NA,0,1,1,NA

Then i let NA= -1 or 2 or 100(a number that did not close to possible choices of the feature), would

this work? And (-1 , 2 ,100) which one do you think is better?

3.use Recommender Systems to estimate those NA first, then applied it to NN.

1 reply

Userlevel 1
Badge +2
I would go for the second choice but with going for negative number. make sure you don't have outliers which can make problems.

going to recommending system could be tricky and would complicate things but it should be better in performance (but you should measure this), overall try different ways and if the performance difference (which might be increasing a little bit with recommended system is important for you go for it)

Reply