Chief Scientist Claudia Perlich will present “Predictability and other Predicaments” at the MLconf in New York. MLconf aims to create an atmosphere to discuss recent research and application of Machine Learning methodologies and practices and how they’re presently applied in industry. Each event is a single-track, single-day event, composed of 14-16 presentations, averaging 25 minutes in length. The goal of this format is for attendees to take home practical tips and methods to apply in their own work; as well as cited papers, code samples and work to reference for their own research.
In the context of building predictive models, predictability is usually considered a blessing. After all – that is the goal: build the model that has the highest predictive performance. The rise of ‘big data’ has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for very different reasons. In such mixed scenarios, the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of applications where this happens: clicks on ads being performed ‘intentionally’ vs. ‘accidentally’, consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have significantly negative impact on the usefulness of predictive modeling.