- karlduane
Mis-Misinformation
Updated: Jan 19, 2021
I've been spending the last two weeks reworking a group project into something I can deploy. The group decided to build a Covid-19 Misinformation Classifier that could give a quick 'misinformation check.' After several iterations of NLP and competing models, we developed a 99/92 % accurate model.
The struggle came when I went to deploy the model. Development of the flask and pickling of the model went smoothly, but for some reason, when I tested the model on current news titles and verified misinformation, I was consistently getting the opposite response from what I had expected. The model was still getting a 99/92 accuracy split, but I was consistently getting true titles classified as misinformation and false titles being listed as true. I was almost ready to give it up as "garbage in, garbage out" when I realized the consistency of errors was indicative of the real problem.
My data was labelled using the following map:
df['label'].map({'true' : 0,'false' : 1,'misleading' : 1})
And my predictions were referencing this dictionary to generate a human readable prediction:
r_dict = {0 : "misinformation.", 1 : "valid."}
OPE. See the problem? No wonder my flask API was giving me the consistently wrong prediction. The model was predicting and performing as expected, my code then told the browser to reverse the label.
Ultimately, this was an object lesson in tracking variable consistency. The best model in the world that generates predictions with 100 % accuracy on unseen data simply isn't useful if the human then reverses the data dictionary...