• karlduane

Revisiting and Refactoring

Updated: Jan 19, 2021

Over the weekend I took a second look at an older project with the idea of turning it into a better portfolio piece. Many people are familiar with the dataset- the Ames Iowa Housing dataset, where a data scientist looks at home sales records and builds a model to predict the price of the home. The first time I looked at the dataset I spent a long time in the exploratory data analysis segment, iterating over the model, and fitting a single linear regression model- with fairly reasonable results.

This time around, I wanted to use some of the newer, more advanced tools that I'd learned after getting a result I was satisfied with the first time around. Before I had barely heard of a Neural Network, I hadn't encountered Principal Component Analysis for dimensionality reduction, and I didn't know as much about data loading with the pandas .read_csv() function. I had thought that going through it again, it would be easy to beat my previous model. Turns out, while the process was faster, it actually didn't help my model at all, in fact, even using PCA and neural networks, I still haven't beaten my original score.

I guess it just goes to show that using a newer technique is no substitute for domain knowledge and a robust exploratory data analysis section to fully comprehend the nature of the data.

1 view0 comments

Recent Posts

See All

Last week I had a conversation with the CTO of a small company (10-50 people) about how the company uses its data and a portion of the conversation has stuck with me ever since. In the conversation o

Recently I had the opportunity to attend the LinkedIn DATAcated conference. It included a number of excerpts from big names in the data science world and I wanted to record some of my favorite quotes,