This repo is basically just going to be a place for me to store random experiments I do with data, whether it's techniques I've not tried before or playing with new tools.
Testing whether grouping categorical features with large numbers of unique values into a smaller number of unique values resultants in a more predictive feature. This is pretty well established when the sub-groups are logical (for example mapping "Doctor" and "Captain" to "Professional" and "Lord" and "Baron" to "Aristocrat" in Kaggle's Titanic) but I wanted to see whether the same applies when there's no logical groups.