Actually, I disagree with Jon on the computational
intensity bit. Even
using 10,000 simulations, which is more than enough to get a good answer (I
can justify this claim rigorously if you want), it takes under 3 minutes to
run using all group sizes from 1 to 100. Dropping to a more reasonable
1,000 sims takes under 15 seconds. The inefficiency in your code (if it is
taking that long) is probably in the function to check for three matches in
the room. If you write this by hand with for-loops or a vector of counters
or something, instead of using nested "unique" functions, of course it'll
take forever. That section is where you want to spend the most time on
efficiency because it's the one called most often inside your loops.
Also, a note on the variability point to clarify what I said earlier. I
was recommending that you use 10 or 100 sims just while you're testing the
code for bugs, i.e., to make sure it runs properly and returns what you
want it to return. Of course you should increase the number of sims for the
final run.
All best,
J
On Thu, Feb 21, 2008 at 12:21 PM, Jon Bischof <jbischof at fas.harvard.edu>
wrote:
Hey guys,
While I admire everyone's attempts to create a complicated algorithm
to test multiple room sizes and return a vector a probabilities, I
think that approach is wildly inefficient for this particular problem.
The problem is that, for your program not to take a whole afternoon to
run, you will need to restrict yourself to a small number of
iterations for each room size. Since the variation of your estimate
will be high with so few trials, it's going to be difficult to
pinpoint the right number---esp. when you don't have a good idea where
to start.
I would recommend just guessing and testing different numbers with
medium size samples (maybe 1,000) to hone in on the correct answer. As
you get closer to 0.5, you can increase the number of iterations
dramatically (perhaps to 1,000,000) to get the right answer with
whatever level of precision desired. This method should dramatically
reduce the difficulty of this problem, esp. if you're new to R
programming.
Best,
Jon
2008/2/21, Quan Li <quanli at mail.ucf.edu>:
This is how I did it and please let me know if
there is any logical
error to it. I run a double loop starting with a group of size
3, sampling
from this group and calculate the probability of 3 or more people sharing a
birthday, if this probability is less than 0.5, group size is increased
by 1, sampling and calculating the probability again. The loop stops when
the probability reaches 0.5 and the final group size is the minimum
group size I need.
quan
>>> Keith Schnakenberg <keith.schnakenberg at gmail.com> 2/21/2008 2:11
AM >>>
Is there some way to index the group sizes systematically?
On Feb 20, 2008, at 10:58 PM, Jeremy Hodgen wrote:
> but you want to check the group sizes systematically not randomly -
> so you need to count how many of your draws for each group size
> have 3 or more b'days the same.
>
> Then rather than being interested in the number as in your code:
>>> 3same <- length(same) - length(unique(same))
> We're interested in simply in whether the second is less than the
> first.
>
> Jeremy
>
> On 21 Feb 2008, at 05:25, Keith Schnakenberg wrote:
>
>> sample(x, sample(20:100)) generates random samples of random size.
I
>> checked to make sure. I'll get
back to you on the last question.
>>
>> On Feb 20, 2008, at 9:12 PM, Joseph Williams wrote:
>>
>>> I didn't try for such an elegant solution. I just plugged in
>>> different
>>> values for "people" until I got a value for "sameday"
that gave
>>> me the
>>> desired probability. It only takes a couple of guesses to get
the
>>> numbers
>>> right.
>>>
>>> I am not sure you can write sample(20:100) inside "room"
>>>
>>> Where is your if statement tallying the number of rooms?
>>>
>>> Joe
>>>
>>> -----Original Message-----
>>> From: gov2001-l-bounces at
lists.fas.harvard.edu
>>> [mailto:gov2001-l-bounces at
lists.fas.harvard.edu] On Behalf Of
Keith
>>> Schnakenberg
>>> Sent: Wednesday, February 20, 2008 11:21 PM
>>> To: gov2001-l at
lists.fas.harvard.edu
>>> Subject: [gov2001-l] problem 2
>>>
>>> Classmates,
>>>
>>> I am trying to avoid over-using the list, but I am not working
with
>>> anyone and I don't know people
who know this stuff, so this is
the
>>> only form of collaboration
available to me. I am working on
problem
>>> 2, and I'm struggling with how
to vary the sample size
>>> appropriately.
>>> Here is what I've been doing:
>>>
>>> sims <- 100000
>>> alldays <- seq(1, 365, 1)
>>> for (i in 1:sims){
>>> room <- sample(alldays, sample(20:100), replace=TRUE)
>>> size <- length(room)
>>> same <- x[duplicated(room)]
>>> 3same <- length(same) - length(unique(same))
>>> }
>>>
>>> I am getting syntax errors, but you can see the logic of what
I'm
>>> trying to do--I'm drawing
random samples of random size and then
>>> storing the length and the number of samples with three or more
same
>>> birthdays in vectors so that I can
calculate the probabilities
for
>>
each sample size. This is my third or fourth approach, but I am
>> having difficulty getting the syntax to implement my ideas. I
>> thought
>> of indexing the room size so that it increased by 1 with each new
>> simulation, but I could not figure out how to make that happen in
>> terms of syntax.
>>
>> Is there a less cumbersome approach that I am missing?
>>
>> Thanks,
>> Keith
>> _______________________________________________
>> gov2001-l mailing list
>> gov2001-l at
lists.fas.harvard.edu
>>
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
>>
>> _______________________________________________
>> gov2001-l mailing list
>> gov2001-l at
lists.fas.harvard.edu
>>
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
>
> _______________________________________________
> gov2001-l mailing list
> gov2001-l at
lists.fas.harvard.edu
>
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
Dr Jeremy Hodgen
Senior Lecturer in Mathematics Education
King's College London
Department of Education and Professional Studies
Franklin-Wilkins Building
Waterloo Bridge Wing
150 Stamford Street
London SE1 9NH
Tel: 020 7848 3102
Fax: 020 7848 3182
E-mail: jeremy.hodgen at kcl.ac.uk
_______________________________________________
gov2001-l mailing list
gov2001-l at
lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l at
lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
--
Jon Bischof
Graduate Student
Department of Government
Harvard University
_______________________________________________
gov2001-l mailing list
gov2001-l at
lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l