I am reading Sex by Numbers, which I enjoy a lot. It's a touchy subject with a lot of data of varying quality. A naive approach would be to take all of it. A dogmatic - to set an arbitrary (and subjective!) threshold, separating "good" from "bad" data. I love the way, in which it is done there, i.e. by grading sources by:
4: numbers that we can believe (e.g. births and deaths)
3: numbers that are reasonably accurate (e.g. well-designed & conducted surveys, e.g. the Natsal report)
2: numbers that could be out by quite a long way (e.g. non-uniform sampling, the Kinsey report)
1: numbers that are unreliable (e.g. surveys from newspapers, even with huge sample sizes)
0: numbers that have just been made up (e.g. "men think of sex every 7 seconds")
Just have a peek at the first chapter, which is freely accessible, and is exactly on data reporting, data reliability and dealing with subjective questions. A lot of thought is given about knowing the possible biases (e.g. people who are less likely to respond, who would like to downplay or exaggerate some things) and consistency of measurements.
So - in short: started reading to get curious facts about sex, ended up recommending to my data science students and mentees. (As the vast majority of problems starts with how you collects data, interpret it, and how well are you aware of its shortcomings).
I am reading Sex by Numbers, which I enjoy a lot. It's a touchy subject with a lot of data of varying quality. A naive approach would be to take all of it. A dogmatic - to set an arbitrary (and subjective!) threshold, separating "good" from "bad" data. I love the way, in which it is done there, i.e. by grading sources by:
4: numbers that we can believe (e.g. births and deaths)
3: numbers that are reasonably accurate (e.g. well-designed & conducted surveys, e.g. the Natsal report)
2: numbers that could be out by quite a long way (e.g. non-uniform sampling, the Kinsey report)
1: numbers that are unreliable (e.g. surveys from newspapers, even with huge sample sizes)
0: numbers that have just been made up (e.g. "men think of sex every 7 seconds")
Just have a peek at the first chapter, which is freely accessible, and is exactly on data reporting, data reliability and dealing with subjective questions. A lot of thought is given about knowing the possible biases (e.g. people who are less likely to respond, who would like to downplay or exaggerate some things) and consistency of measurements.
So - in short: started reading to get curious facts about sex, ended up recommending to my data science students and mentees. (As the vast majority of problems starts with how you collects data, interpret it, and how well are you aware of its shortcomings).