Web genre dataset
This is a dataset of 1,539 webpages manually classified into 20 genres. It is 
intended as a training set for classification algorithms.
The list of genres
Online access to the dataset
Files
	- short description of the dataset preparation and notes on the gathered data 
	[pdf]
- table of webpages and genres [xls] [mdb] [csv, 
	but semicolons instead of commas]
- cached webpages, 126 MiB [zip]