• karlduane


Updated: Jan 19


I've been spending the last two weeks reworking a group project into something I can deploy. The group decided to build a Covid-19 Misinformation Classifier that could give a quick 'misinformation check.' After several iterations of NLP and competing models, we developed a 99/92 % accurate model.

The struggle came when I went to deploy the model. Development of the flask and pickling of the model went smoothly, but for some reason, when I tested the model on current news titles and verified misinformation, I was consistently getting the opposite response from what I had expected. The model was still getting a 99/92 accuracy split, but I was consistently getting true titles classified as misinformation and false titles being listed as true. I was almost ready to give it up as "garbage in, garbage out" when I realized the consistency of errors was indicative of the real problem.

My data was labelled using the following map:

df['label'].map({'true' : 0,'false' : 1,'misleading' : 1})

And my predictions were referencing this dictionary to generate a human readable prediction:

r_dict = {0 : "misinformation.", 1 : "valid."}

OPE. See the problem? No wonder my flask API was giving me the consistently wrong prediction. The model was predicting and performing as expected, my code then told the browser to reverse the label.

Ultimately, this was an object lesson in tracking variable consistency. The best model in the world that generates predictions with 100 % accuracy on unseen data simply isn't useful if the human then reverses the data dictionary...

2 views0 comments

Recent Posts

See All

Business Applications of Existing AI technology

Last week I had a conversation with the CTO of a small company (10-50 people) about how the company uses its data and a portion of the conversation has stuck with me ever since. In the conversation o

DATAcated Conference- Favorite Quotes and Takeaways

Recently I had the opportunity to attend the LinkedIn DATAcated conference. It included a number of excerpts from big names in the data science world and I wanted to record some of my favorite quotes,