Revisiting and Refactoring
Updated: Jan 19
Over the weekend I took a second look at an older project with the idea of turning it into a better portfolio piece. Many people are familiar with the dataset- the Ames Iowa Housing dataset, where a data scientist looks at home sales records and builds a model to predict the price of the home. The first time I looked at the dataset I spent a long time in the exploratory data analysis segment, iterating over the model, and fitting a single linear regression model- with fairly reasonable results.
This time around, I wanted to use some of the newer, more advanced tools that I'd learned after getting a result I was satisfied with the first time around. Before I had barely heard of a Neural Network, I hadn't encountered Principal Component Analysis for dimensionality reduction, and I didn't know as much about data loading with the pandas .read_csv() function. I had thought that going through it again, it would be easy to beat my previous model. Turns out, while the process was faster, it actually didn't help my model at all, in fact, even using PCA and neural networks, I still haven't beaten my original score.
I guess it just goes to show that using a newer technique is no substitute for domain knowledge and a robust exploratory data analysis section to fully comprehend the nature of the data.