I ask because my research proposal involves a population of texts produced over a long period of time—a much, much longer period of time than this blog represents, but not so different from a methodological point of view. Let's say you have a research question that involves changes over time, meaning that you want to compare the content of posts produced in September, October, November, and December. (You could stratify them some other way, but let's keep this simple.) When I submit this post, the count will look like this:
(Okay, so there's no need to sample from such a small population, but play along.) Can you simply take a random sample of, say, 10 posts from each month? If you did, your sample would have something like 33% of October's posts and 88% of December's posts. Then again, since the purpose of the study is comparison and not overall generalization, you don't have to worry about your samples being uneven. As long as you can generalize about October from the October samples, you're fine. Still, maybe it would be better to take 50% of each month. I don't know. If you start taking percentages, then do you risk some kind of distortion? Is 6 of 12 as representative as 15 of 30? I need to go back to the textbooks.
For what it's worth, I think that my research design is going to involve nonproportionate stratified sampling. That would be relevant to this hypothetical study if, for example, October had 30 posts and September had 5. There's just no way you could sample from 5 and get a representative picture of the content of the blog posts. I suppose, though, that I have to think about the ways in which nonproportionate sampling constrains and affects the comparison.