Lots of customers are using web data feeds to train and power machine learning models for a variety of artificial intelligence applications. As described in our sentiment classification tutorial using webhose.io data, you’ll need at least three distinct datasets to develop a machine learning classification model:
- Training dataset for each class
- Testing dataset to confirm performance
- Run “live” test on new dataset
For example, suppose you want to train a machine learning model to forecast virality before posting new content. The first step is to feed the model blog posts you know have been widely shared (viral class). The second step is to feed the model posts that have not been widely shared for 30 days (unpopular class). Once trained, test the model’s performance on a new dataset to provide a measure of performance. Finally, run the model on new content posted just minutes ago, and poll social signal data a few days later. Rinse and repeat until your model can forecast effectively.