Sorry, I agree with the GP. This was a popular book for learning ML with Weka (which is still around):

There is also the Knowledge Discovery in Databases (KDD) term which is still around via:

I recommend starting with weka and this great book:
some other helpful books:

- Data Mining, by Witten and Franke; describes basics with rigor, including how to use Weka, which they wrote

a couple java-based books from Manning:

- Collective Intelligence in Action (by Satnam Alag) and

- Algorithms of the Intelligen Web (Marmanis, Babenko)


spot on. OP: Are you asking how basic tf-idf works, or is there something you can't get lucene / SOLR / sphinx / tsearch to do easily?

nevertheless, here are some good background materials (search amazon on "data mining"

Also the Collective intelligence by Satnam alag is quite good (a lot of java code to wade through tho

