Some Brain Teasers From Scientific American

This article in Scientific American, by Keith Stanovich, makes some solid points about the limits of IQ.

No doubt you know several folks with perfectly respectable IQs who repeatedly make poor decisions. The behavior of such people tells us that we are missing something important by treating intelligence as if it encompassed all cognitive abilities.

I highly recommend going over to SA and reading the whole article. But just for fun, here are some “brain teasers” that Mr. Stanovich included in his article, along with his answers, given in the footnotes.


1. Jack is looking at Anne, but Anne is looking at George. Jack is married, but George is not. Is a married person looking at an unmarried person?

A) Yes
B) No
C) Cannot be determined

Answer. ((More than 80 percent of people choose C. But the correct answer is A. Here is how to think it through logically: Anne is the only person whose marital status is unknown. You need to consider both possibilities, either married or unmarried, to determine whether you have enough information to draw a conclusion. If Anne is married, the answer is A: she would be the married person who is looking at an unmarried person (George). If Anne is not married, the answer is still A: in this case, Jack is the married person, and he is looking at Anne, the unmarried person. ))

2. A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?

Answer. ((Many people give the first response that comes to mind—10 cents. But if they thought a little harder, they would realize that this cannot be right: the bat would then have to cost $1.10, for a total of $1.20.))

3. I’m going to skip question three, because although it illustrates an interesting and important finding about how partisanship kills thoughtfulness, it’s not fun as a stand-alone brain-teaser. But you can read it at Scientific American. ((Filler footnote.))

4. Imagine that XYZ viral syndrome is a serious condition that affects one person in 1,000. Imagine also that the test to diagnose the disease always indicates correctly that a person who has the XYZ virus actually has it. Finally, suppose that this test occasionally misidentifies a healthy individual as having XYZ. The test has a false-positive rate of 5 percent, meaning that the test wrongly indicates that the XYZ virus is present in 5 percent of the cases where the person does not have the virus.

Next we choose a person at random and administer the test, and the person tests positive for XYZ syndrome. Assuming we know nothing else about that individual’s medical history, what is the probability (expressed as a percentage ranging from zero to 100) that the individual really has XYZ?

Answer. ((The most common answer is 95 percent. But that is wrong. People tend to ignore the first part of the setup, which states that only one person in 1,000 will actually have XYZ syndrome. If the other 999 (who do not have the disease) are tested, the 5 percent false-positive rate means that approximately 50 of them (0.05 times 999) will be told they have XYZ. Thus, for every 51 patients who test positive for XYZ, only one will actually have it. Because of the relatively low base rate of the disease and the relatively high false-positive rate, most people who test positive for XYZ syndrome will not have it. The answer to the question, then, is that the probability a person who tests positive for XYZ syndrome actually has it is one in 51, or approximately 2 percent.))

5. An experiment is conducted to test the efficacy of a new medical treatment. Picture a 2 x 2 matrix that summarizes the results as follows:

Improvement No Improvement
Treatment Given 200 75
No Treatment Given 50 15

As you can see, 200 patients were given the experimental treatment and improved; 75 were given the treatment and did not improve; 50 were not given the treatment and improved; and 15 were not given the treatment and did not improve. Answer this question with a yes or no: Was the treatment effective?

Answer. ((Most people will say yes. They focus on the large number of patients (200) in whom treatment led to improvement and on the fact that of those who received treatment, more patients improved (200) than failed to improve (75). Because the probability of improvement (200 out of 275 treated, or 200/275 = 0.727) seems high, people tend to believe the treatment works. But this reflects an error in scientific thinking: an inability to consider the control group, something that (disturbingly) even physicians are often guilty of. In the control group, improvement occurred even when the treatment was not given. The probability of improvement with no treatment (50 out of 65 not treated, or 50/65 = 0.769) is even higher than the probability of improvement with treatment, meaning that the treatment being tested can be judged to be completely ineffective.))

puzzle

6. As seen in the diagram, four cards are sitting on a table. Each card has a letter on one side and a number on the other. Two cards are letter-side up, and two of the cards are number-side up. The rule to be tested is this: for these four cards, if a card has a vowel on its letter side, it has an even number on its number side. Your task is to decide which card or cards must be turned over to find out whether the rule is true or false. Indicate which cards must be turned over.

Answer. ((Most people get the answer wrong, and it has been devilishly hard to figure out why. About half of them say you should pick A and 8: a vowel to see if there is an even number on its reverse side and an even number to see if there is a vowel on its reverse. Another 20 percent choose to turn over the A card only, and another 20 percent turn over other incorrect combinations. That means that 90 percent of people get it wrong.

Let’s see where people tend to run into trouble. They are okay with the letter cards: most people correctly choose A. The difficulty is in the number cards: most people mistakenly choose 8. Why is it wrong to choose 8? Read the rule again: it says that a vowel must have an even number on the back, but it says nothing about whether an even number must have a vowel on the back or what kind of number a consonant must have. (It is because the rule says nothing about consonants, by the way, that there is no need to see what is on the back of the K.) So finding a consonant on the back of the 8 would say nothing about whether the rule is true or false. In contrast, the 5 card, which most people do not choose, is essential. The 5 card might have a vowel on the back. And if it does, the rule would be shown to be false because that would mean that not all vowels have even numbers on the back. In short, to show that the rule is not false, the 5 card must be turned over.))

This entry was posted in Mind-blowing Miscellania and other Neat Stuff. Bookmark the permalink.

16 Responses to Some Brain Teasers From Scientific American

  1. Pesho says:

    Ah, this takes me back. I have seen similar sets of ‘brain teasers’ many times in my life, and it’s absolutely striking what diverse arguments people will try to support with them.

    I got them all right this time – such questions are always trivial, if you know the trick: remember that your intuition is wrong, and simply work them out, without trying to take any shortcuts. By the way, the first question should specify that all the named entities are humans, because if Anne is a dog, it is neither married nor unmarried, and the third question is dumb to begin with, given that a car which is likely, due to its weight and solid frame, to kill the occupants in the other car may be perfectly desirable.

    But I think it is worth sharing where I encountered these ‘puzzles’ before.

    The first time, it was in the army, right after I had arrived in my new unit, together with a bunch of other ‘hot shots’ from the ’88 crop of Bulgarian conscripts. We were given a dozen similar questions which were presented to us as warm-up questions to the real tests, and we promptly managed to get most of them wrong. The instructors had a good laugh at us, and used them to illustrate the dangers of overconfidence. Months later, they brought them back, to explain the psychology behind the intuitive mistakes, and to suggest how we could use it to manipulate people into a particular behavior. I remember a colonel telling us, off-hand, “And if nothing else, these are a great way to casually and publicly destroy someone’s authority and self-confidence in thirty seconds flat.”

    The second time I encountered a set of very similar teasers was in a statistics class at MIT. On the first day, we were allowed to pick one of two tests – the easy one and the hard one, depending on how well we felt we were prepared for the class (some statistics and discrete math were pre-requisites) The professor said that he did not even care about us writing our name in, because it did not affect our final grade… You could turn the ‘easy’ test as soon as you were done, but he said he expected those of us who took the hard test to take the full 15mn. (There were five questions, three of which were direct analogs of 2, 4, and 6) When it turned out that the two tests were the same, and that the people taking the ‘hard’ test had scored over 90% while the people taking the ‘easy’ had scored under 50%, he used them to illustrate the value of concentration and working things out by the book. As an added bonus, choosing to put one’s name down correlated with scoring higher in the ‘hard’ test, and inversely correlated with scoring in the ‘easy’ test.

    And nowadays, my wife uses similar questions to illustrate to her students how human brains work, together with some sleights of hand and card tricks I’ve taught her. (It’s nice when your class sizes are limited to 17)

    All this verbal diarrhea to say that I find it disingenuous to use such brain teasers to illustrate anything about I.Q.

    We are humans. We are hardwired to take shortcuts: in most cases, those shortcuts are beneficial. It is trivial to design situations where our instincts will lead us into the wrong direction. It is only slightly harder to design tests which will result in lower than average performance by people who test higher on most common I.Q. tests.

    This says nothing about the usefulness of I.Q. tests, or the worth and even existence of an I.Q. The best I can say about I.Q. tests of any kind is that their results correlate with a person’s ability to prosper in the society that has approved the specific I.Q. test.

    I seldom hire more than one person per year, but when I do, I take the time to throw a few of these questions at them. Not to count it against them if they fail, but to see whether they recognize the questions, whether they are inclined to work out something that appears simple, and how they deal with being told they got things wrong. On the other hand, not being able to admit they were wrong or grasp why they were wrong is an automatic failure of the interview.

  2. Harlequin says:

    I’d be interested in the repeatability of these tests…namely, do the same people get similar questions right every time, or is it situational? I could see how frustration/tiredness could affect one’s score more strongly here than on a traditional IQ test.

    In any case I feel like a couple of these are unfair as measures of reasoning, since they require familiarity with probability and statistics that most people don’t have. (To be fair, I’d be happier if more people realized how bad they are on these topics, but I don’t think it’s in some way a measure of innate ability if they can’t do them.)

  3. Pesho says:

    There should be absolutely no repeatability for failure on these particular tests, at least not for anyone able to learn from mistakes.

    Once people are aware of the fact (and it is a fact) that intuition will sometimes steer them wrong, most will take the time to solve the problem properly, especially without extreme time pressure. All of these problems (except for #3, which Ampersand properly excluded) can be solved on three lines, as long as you have a basic understanding of discrete math. You just have to force yourself to look at the numbers, and if you do it enough times, it will take you very little time to come up with the equation, the ratio, or what not.

    As for intuition, it will keep steering you wrong, unless you keep encountering situations that are resolved in the improbable direction.

    For a very similar example – I have been trained to pickpocket. I’m reasonably good at it, and it is a great party trick. It is a lot of fun explaining to people how you are distracting them by waving your other hand, and then doing it again, only to see them still glance up as you take what you’ve announced you’ll take. I am not immune to it – my eyes will also follow the hand tracing a high arc while the other hand is taking my stuff. I will also pay attention to the leg bump as opposed to the light touch on my chest or my butt. This said, an instant later, I will recognize the distraction for what it is, and check my pockets as I head after the guy who’s trying to get away. I do not know whether the Tech still keeps the issues from the Fall of 1994, but if it does, I have the proof reason can fix what instinct screws up.

    Also, I would guess that if you play the same trick on a person for long enough, they will learn to ignore the distraction. And it not would be such a great thing, because there is a reason that our eyes track objects moving in arcs.

  4. Harlequin says:

    Well: no repeatability if they’re telling people the right answers after they’ve done the quiz. If they just hand them a test with some of these scattered in with more standard problems, and don’t give them the answers to any, you could give different but similar problems a few weeks or months later and see how they do.

  5. Pesho says:

    Actually, and I am guessing here, if you scatter these problems amongst similar problems that do not trigger intuition, and require working things out instead, I expect to see fewer failures. If you hide the bat&ball problem amongst four problems that require constructing equations with one unknown, people will not let their intuition lead them astray, they will just solve yet another x + (x + 1) = 1.1 equation.

    In the same way, if you hide #6 amongst problems that require translating English statements into symbol logic people will just check
    1. (A) V ^ E or V ^ |E
    2. (K) |V ^ E or |V ^ |E
    3. (8) V ^ E or |V ^ E
    4. (5) V ^ |E or |V ^ |E
    against
    V -> E, which is invalidated by V ^|E, which only appears in 1 and 4.

    The common thing in all of these questions is not that they are hard, but that they tempts us into different shortcuts for which we are hardwired. If we are in a mode in which we are not taking shortcuts, we will not get them wrong. But guess what? People who are dealing with something they believe is easy, in an area in which they are believe are skilled, take shortcuts.

    And designing tasks so that apparent shortcuts lead you astray has been bread and butter to strategists, crooks, illusionists, etc… from times immemorial.

  6. Simple Truth says:

    Mensa member here: I suck at this particular type of thinking. In particular, they always strike me as useless. Want to know if Anne is married? Ask. Why was the car causing such fatalities? That would make a difference in policy concerns. The sample size is too small for the untreated group to fairly compare with the treated group and when we’re talking about an issue of health (life vs. death maybe?) it’s probably better to treat someone than not. Lastly, turn over all 4 cards. You know nothing until you’ve checked that all conform.

    All of these are designed to test a certain type of thinking – what the article calls dysrationalia (cute) – but what it’s actually doing is the same basic function as a riddle: can you go down the same path as the author of the riddle? It’s not an IQ test – it’s a way to see if your thinking conforms to a specific target. Often times, high IQ is known for the ability to recognize patterns and sequences, but to also be very non-conforming.

  7. Pesho says:

    Simple Truth, I will assume that you are being serious in what you just wrote, and I will actually answer you.

    In real life, when resources are limited, and they always, always, ALWAYS are, we do not have the luxury to trust everyone, treat everyone, collect all the pertinent information, etc… Instead of married or unmarried, Anne may be vaccinated or not, and we may have to decide on quarantine periods depending on whether we have a new infection. (q#1) Turning a card over may have a high cost, and yes, we do know enough without having to turn them all. (q#6) We may have the medicine to treat someone who is 95% likely to be infected, but not enough to treat everyone who is 2% likely to be infected. (q#4) And we definitely do not want to administer a treatment that is actually decreasing the patients’ chance of improving. (q#5)

    These are not riddles where you have to guess that I am irrigating my red strawberry with gasoline. These are very realistic questions that come up very often, have a clear, single, correct answer… and our intuition steers us away from it. If people get them wrong, there may be a cost in time, resources, lives… and by the way, it is not always the right thing to spend material resources on a task, even if it will undoubtedly result in a saving lives. We also have to look at the lives that could be saved by spending these resources elsewhere.

    So if you have gotten most of these questions wrong, the lesson to take is not “I am an unique, non-conforming person with the ability to recognize patterns”, nor “I’m stupid”. The lesson is “Our initial impulse may be wrong, and when there is a lot at stake, it may be worth taking the time to look twice.”

    ———

    By the way, in case it’s unclear so far:

    I am absolutely not agreeing with the linked article about the meaning that can be drawn from the way most people answer these questions. But there is nothing wrong with the questions themselves, and I think it is well worth taking a bit to time to reflect on them, and learning to recognize situations in which our first impulse may be wrong.

  8. Simple Truth says:

    Pesho: I remain absolutely serious, and thank you for responding. I missed your previous response to Harlequin by not refreshing the page before I hit submit, so your viewpoint makes more sense to me knowing where your frame of thinking and training is coming from.

    I agree that these questions are a useful teaching tool for logic. However, just because these questions have an answer that you can logic out doesn’t mean that they necessarily lead to the answers given exclusively, or that you should take the added time to figure them out in the way they are presented. For example, in the pickpocket scenario – knowing the logic of it doesn’t help. The practical application of checking your pockets does. In Anne’s scenario, just as you said, Anne could be a dog, or, a truly American scenario – she could be in a legal domestic partnership equal to marriage. The practical application (asking) in this case works better than the logical deduction (binary married/unmarried.)

    I work with databases on a regular basis. You are always, always, limited by three things: accuracy of data, accuracy of request/query, and accuracy of your understanding of the results. As any system with moving parts, the more flaws you add into it, the less efficient the system becomes. To try and reduce real world scenarios to math is admirable, if mostly not realistic.

    These are very realistic questions that come up very often, have a clear, single, correct answer… and our intuition steers us away from it.

    Do you have a specific scenario in mind? I’m not sure I can think of any real-world situations that have a clear, single, correct answer where your first intuition would be wrong. It sounds interesting, though.

  9. Harlequin says:

    There are cases of people making stupid statistical arguments in court, based on incorrect gut-level understanding of statistics, with results that were very bad for the people on trial–the case I’m thinking of in particular is one where a woman had two children die of SIDS, and was put on trial for their murder, and a doctor testified that there was a 1 in 73 million chance of this happening at random. The thinking there is that since there’s about a 1/8,500 chance of a single child dying of SIDS, then the chance of 2 is that probability squared–but that ignores 1) that two children in the same household share many risk factors, so the probabilities are independent, and 2) that even if correct you must weigh that 1 in 73 million chance against the also very unlikely chance that a mother kills two of her children. The woman went to prison, because the jury took the doctor’s expertise about medicine and took it to imply an expertise about the statistics involved, which the doctor absolutely did not have.

    That’s the probability half of these questions, of course, not the logic half–and I stand by what I said above, that it’s not really appropriate to think of those as a matter of base intelligence. But it’s an example where there’s one obvious right answer, which was not the answer accepted by the people involved, which ended up sending an innocent woman to prison. (Her conviction was later overturned, for other reasons; here’s Wikipedia on the case.)

  10. gin-and-whiskey says:

    If you haven’t read this, you should:
    http://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555
    Talks a lot about the sort of cognitive decisions we make when we decide how much of our brain power to devote to a problem like that.

  11. Pesho says:

    Do you have a specific scenario in mind? I’m not sure I can think of any real-world situations that have a clear, single, correct answer where your first intuition would be wrong. It sounds interesting, though.

    Well, I thought that question #4 is a perfect example of real world situation which too many people get wrong. A condition affecting 0.1% of the population isn’t uncommon. A test with 0%FN/5%FP probability is not that far off many existing tests. If anything, it’s quite a bit better than most tests I’ve encountered. And still, most people first instinct is to trust the test’s result in the described scenario, despite the fact that the odds it’s accurate are 1 in 50. I certainly hope that trained physicians do not make that mistake, but am pretty sure that sometimes the people making a policy decision will not be particularly well trained, in anything.

    But if you are asking for a real world example, I’ll give you one. We have multiple plants, and two are in California. California Edison allows you to pay less for electricity if you promise to reduce your consumption down to an agreed value within a time limit of getting a request to do so. If you fail to do so, you pay through the nose for every kWh over the limit. We nearly did not go with that option, because we need to keep metal molten in our induction furnaces and casting machines, or we would be losing literally millions worth of equipment if metal solidified. So everyone was dismissing the idea, saying that we could not afford the penalties… until our casting area superintendent shared his concerns that if things got bad enough, power could be cut whether we or Edison wanted it or not.

    To make a long story short, when three of us got together and did the math, it turned out that even with the high costs of a diesel generator and fuel, it would cost us a less to switch to our own power, as long as there were no more than one six hour FlexAlert two days out of seven for three months of the year (or something like that, I no longer remember exactly) So we did this, and there have been only two summers in which we have had to use the generator more than really rarely.

    And, a few years ago, we had a day long outage that would have destroyed our induction furnaces, and probably our casting machines, because when it happened, the company contracted to help us dispose of hazardous materials in emergencies couldn’t deal with all the requests, and defaulted to many of its obligations.

    So here is your example. When our casting superintendent was retiring, he asked us to estimate how much money we had saved by getting a backup generator instead of staying on the No-FlexAlerts plan. It was millions (Aluminium casting is a very energy intensive process) without counting what would have happened if we had to choose, during the long outage, between letting metal solidify inside machines and furnaces, or letting it out an in unapproved manner.

    If his concerns had not made us look again at our original decision, our intuition would have been wrong, because we all found it inconceivable that we could save money by running our own generator.

  12. Simple Truth says:

    Pesho:

    The example you gave is very interesting, and I think a good real-world application of listening to someone who has practical knowledge in the field and their concerns rather than just working out on paper what is the best plan of action. In my understanding of your example, it was a factor that no one had taken into consideration that caused the change in answer, not a logical fallacy from a fast decision.

    Big decisions need to be made slow, and people need to have accountability in them in order to achieve the right answer. These type of riddles don’t relate to that, in my way of thinking.

    (Sorry for the short answer – I will read anything posted though! Just getting hammered at work this week.)

  13. ballgame says:

    Well, I thought that question #4 is a perfect example of real world situation which too many people get wrong. … I certainly hope that trained physicians do not make that mistake, but am pretty sure that sometimes the people making a policy decision will not be particularly well trained, in anything.

    Sadly, Pesho, many physicians do not fully understand the importance of the false positive paradox. Back in 2002, Gerd Gigerenzer documented cases of people committing suicide after a positive test result for AIDS because they had been told it was extremely likely they had the disease, when in fact the chance they actually had the disease was still quite small. He wrote an excellent book about this issue (and better ways to discuss statistics with ordinary people) in Calculated Risks.

  14. HelenS says:

    I was confused by the part in 4 that said

    Imagine also that the test to diagnose the disease always indicates correctly that a person who has the XYZ virus actually has it. Finally, suppose that this test occasionally misidentifies a healthy individual as having XYZ.

    Don’t those two sentences contradict one another?

  15. Perfidy says:

    “Don’t those two sentences contradict one another?”

    —-

    Not at all. Two different (mutually exclusive) situations are considered:

    If you HAVE the disease, the test always correctly says that you have it.

    If you DON’T have the disease, the test sometimes (wrongly) says that you do have it.

    Just because the test is correct when you have the disease doesn’t mean it has to be correct when you don’t have the disease.

  16. ballgame says:

    No, they don’t, HelenS … but that is definitely part of what confuses so many people about this. The statements are talking about what results you’ll get with the test when you test two different groups of people: the set of people who actually have the XYZ virus, vs. the set of people who don’t.

    So imagine that we have these two sets of people. Set XYZ is composed of 100 individuals who have the XYZ virus. Set H (for “healthy”) are 100,000 people who are free of the virus. The statement you referred to is saying that when you test Set XYZ, you’ll get 100 correct results all indicating that the individual tested has the virus. However, when you test Set H, who don’t have the virus, the test will incorrectly indicate that 5,000 of those 100,000 healthy people also have the virus (according to the original problem’s ‘5% false positive rate’).

    The problem in real life, of course, is that we wouldn’t know who was in which set. That would have been the whole reason we developed the test in the first place! All we would have known was that here was a group of 100,100 people that we tested to find out who has this XYZ virus. And the end result of that process is: 95,000 that the test indicated were healthy, and 5,100 people that the test indicated had the virus.

    But we know (somehow) that only 1 out of 1,000 people overall have this virus, so we know that only 100 of those 5,100 of those positive tests are real. We can’t tell which ones, though … all we know is that if you’re one of the folks in this group of 5,100 positive tests, your chance of actually having the virus is only 100 out of 5,100 (~2%).

    If this is still confusing, imagine an alternative scenario where researchers travel to an island that they KNOW has never been exposed to the virus. So they go to Madagascar and test 100,000 known-to-be-healthy people. Because of the 5% false positive rate, the results are: 95,000 healthy (no virus) and 5,000 positive tests for the XYZ virus.

    How many of those 5,000 ‘tested positive’ Madagascarians have the XYZ virus?

Comments are closed.