i was recently experimenting with sklearn Pipelines. i made a preprocessing pipeline with one hot encoding for categorical data. i split my training set into training and dev set. The problem is that some categories only go into test set and hence have no column due to encoding, and my program fails :/. Is there a way to get preprocessed dataset back from pipeline object? so i can first preprocess all data then split and still use my pipeline.
P.S. - sorry if it’s a rookie mistake i’m new