Web genre dataset
This is a dataset of 1,539 webpages manually classified into 20 genres. It is
intended as a training set for classification algorithms.
The list of genres
Online access to the dataset
Files
- short description of the dataset preparation and notes on the gathered data
[pdf]
- table of webpages and genres [xls] [mdb] [csv,
but semicolons instead of commas]
- cached webpages, 126 MiB [zip]