From the course: Executive Guide to Predictive Modeling Strategy at Scale
Unlock the full course today
Join today to access over 24,000 courses taught by industry experts.
Aggregate and restructure
From the course: Executive Guide to Predictive Modeling Strategy at Scale
Aggregate and restructure
- [Narrator] In my experience, some of the most important variables get generated when you convert a very tall transactional data set into a case level data set, but you lose a lot of information when you go from a lot of rows to fewer rows so what kind of information do you want to keep? A lot of this will seem obvious like you might do total purchases or median or mean purchases. What might be less obvious at first is it won't be clear to the data scientist which of these variables is going to work best until they're in there assessing and exploring the data. So for instance, somewhat famously in statistics means are sensitive to outliers. The presence of just a few outliers can change the mean, but it doesn't change the median as much. Now an experienced data scientist will just grab a few of these but there are some that they won't know about until they look. For instance, looking for something like the number of…