My BFF loves organizing and tidying up. I on the other hand like things tidy but don’t like spending my precious binge-watching time on it.
The same goes for data cleaning. Everyone finds data availability exciting. How do we get rid of the logical incompatibilities, formatting errors, typos, structural and veracity issues etc.?
We know there are some standard issues that come with most data. Most of us, think we will just finish our job at hand quickly and move on to analysis because either data cleaning is boring or we have a deadline or some other reason.
Draw a data flow diagram. Write a few little scripts with it. Test and implement.
Life will be easier, I guarantee.