A repetition based measure for verification of text collections and for text categorization
Document clustering based on non-negative matrix factorization
Beyond independent relevance
Query length in interactive information retrieval
Stuff I’ve seen