tag:blogger.com,1999:blog-4903856565127937599.post7468332349201847805..comments2018-05-24T00:31:00.005-04:00Comments on Numoracle Recipes: SamplingRamkumar Krishnan (Ram)http://www.blogger.com/profile/16930865713058805668noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-4903856565127937599.post-23860793284887103292007-12-21T15:21:00.000-05:002007-12-21T15:21:00.000-05:00Chen, Upon inquiry, I found that the answer to thi...Chen, Upon inquiry, I found that the answer to this is non-trivial and a factor of implementation of row storage (not just specific to Oracle, but any DB that has a hierarchy of granularity for storing rows/columns). The short answer to this question is that for the sample clause implementation to consider both performance and "fairness" in sampling - accepting the probability that a row will be present in the sample is better - rather than a specific number of rows. From a usage standpoint, the good news is that the number of rows returned are pretty close to the % requested - for e.g., I input 10% over a 105 row table, and got 13 rows sample. Over several trials of sampling, the number of rows returned will eventually equal the % of rows in the table - that is 11 rows - even if the interpretation of this input appears to be different. Hope this helps. +RamRamnoreply@blogger.comtag:blogger.com,1999:blog-4903856565127937599.post-83349456242069376542007-12-21T12:58:00.000-05:002007-12-21T12:58:00.000-05:00Hi, Good question. One clarification before I inve...Hi, Good question. One clarification before I investigate the answer and get back to you. Note<BR/>that the percentage value is not the percentage of rows from the table - as in "sample and return 30% of the rows in a table". Acc to the docs "This percentage indicates the probability of each row, or each cluster of rows in the case of block sampling, being selected as part of the sample." - as in "return rows that have a 30% chance of being sampled". I'll get back with an answer shortly.Ramnoreply@blogger.comtag:blogger.com,1999:blog-4903856565127937599.post-47544708550456287162007-12-19T16:21:00.000-05:002007-12-19T16:21:00.000-05:00I always wondered why the sample clause allows spe...I always wondered why the sample clause allows specifying percentage but not number of rows in sample.<BR/><BR/>If you know something about the distribution, you will be able to calculate how many rows you'll need to get specific bias. Why force us to convert the number into percentage?Anonymousnoreply@blogger.com