I can pretty much guarantee there are elements of this you're not considering which are addressed there (though there are also elements which Farmer and Glass don't hit either). But it's an excellent foundation.
Second: If you're going to have a quality classification system, you need to determine what you are ranking for. As the Cheshire Cat said, if you don't know where you're going, it doesn't much matter how you get there. Rating for popularity, sales revenue maximization, quality or truth, optimal experience, ideological purity, etc., are all different.
Beyond that I've compiled some thoughts of my own from 20+ years of using (and occasionally building) reputation systems myself:
"Content rating, moderation, and ranking systems: some non-brief thoughts"
⚫ Long version: Moderation, Quality Assessment, & Reporting are Hard
⚫ Simple vote counts or sums are largely meaningless.
⚫ Indicating levels of agreement / disagreement can be useful.
⚫ Likert scale moderation can be useful.
⚫ There's a single-metric rating that combines many of these fairly well -- yes, Evan Miller's lower-bound Wilcox score.
⚫ Rating for "popularity" vs. "truth" is very, very different.
⚫ Reporting independent statistics for popularity (n), rating (mean), and variance or controversiality (standard deviation) is more informative than a single statistic.
⚫ Indirect quality measures also matter. I should add: a LOT.
⚫ There almost certainly isn't a single "best" ranking. Fuzzing scores with randomness can help.
⚫ Not all rating actions are equally valuable. Not everyone's ratings carry the same weight.
⚫ There are things which don't work well.
⚫ Showing scores and score components can be counterproductive and leads to various perverse incentives.
I'm also increasing leaning toward a multi-part system, one which rates:
1. Overall favorability.
2. Any flaggable aspects. Ultimately, "ToS" is probably the best bucket, comprising spam, harassment, illegal activity, NSFW/NSFL content (or improperly labeled same), etc.
3. A truth or validity rating. Likeley rolled up in #2. But worth mentioning separately.
4. Long-term author reputation.
There's also the general problem associated with Gresham's Law, which I'm increasingly convinced is a general and quite serious challenge to market-based and popularity-based systems. Assessment of complex products, including especialy information products, is difficult, which is to say, expensive.
I'm increasingly in favour of presenting newer / unrated content to subsets of the total audience, and increasing its reach as positive approval rolls in. This seems like a behavior HN's "New" page could benefit from. Decrease the exposure for any one rater, but spread ratings over more submissions, for longer.
And there are other problems. Limiting individuals to a single vote (or negating the negative effects of vote gaming) is key. Watching the watchmen. Regression toward mean intelligence / content. The "evaporative cooling" effect (http://blog.bumblebeelabs.com/social-software-sundays-2-the-...).
Google Tech Talk: http://www.youtube.com/watch?v=Yn7e0J9m6rE
See also: "YouTube: Five Stars Dominate Ratings" (http://youtube-global.blogspot.com/2009/09/five-stars-domina...)
If you read that book and then look at HN, it's clear how its design encourages behaviors that are not aligned with the goals of the community managers.
Just yesterday I noticed that OReilly recently published Building Web Reputation Systems (Randy Farmer, Bryce Glass)
Randy was consulting with a company I was at a couple of years ago and while the two of us certainly didn't see eye-to-eye on a lot of things, it encouraged a lot of thought and debate.
I'll be ordering the book today.
Fresh book recommendations delivered straight to your inbox every Thursday.