I've heard 80% of Data Science is cleaning up messy data full of typos and all out of order. I am about to start the Fast.AI course which I hear is amazing, but I just can't seem to finish these graphs of this solar panel data I have. It's a mess!(thanks dad).
I planned on using this data for the Seaborn Kaggle Micro-Course and being finished quite quickly, but this data is significantly messier than I expected.
Good news though!
I've been reusing all sorts of python and pandas functions from my notes. There are a ton of things I've been remembering and I've been writing functions like a "row context" function that brings up rows around a row that happens to have something off about it.
So, if Data Science really is mostly cleaning data then I just learned a ton about cleaning up data and I'm excited to use it on other projects. With any luck, life gets easier with the practice I've been running for at least a week.
Comments