• Member Since 22nd Jan, 2013
  • offline last seen Oct 20th, 2022

Bradel


Ceci n'est pas un cheval.

More Blog Posts144

Dec
5th
2015

Bradel Needs CS Advice · 5:34pm Dec 5th, 2015

Hey all.

Sorry about dropping off-grid for a lot of the last month. I'd been expecting to get a lot of pony stuff done lately, but I got hit with a sudden change in my plans for the winter just before Thanksgiving. In fact, that's what I'd like to talk about here.

Just prior to Thanksgiving, I was approached by the chair of my department to teach a course in the winter quarter. Normally, I would have just turned this down—I have better things to be doing with my time right now than focusing on teaching—but the offer came with some side benefits I couldn't afford to ignore. (I suppose if I were pithier, I'd just say that my chair gave me an offer I couldn't refuse.) So it turns out that I'm going to spend January through March teaching "Introduction to Probability and Statistics for Computer Science".

This is the only statistics course most of my students will ever get, and there's a phenomenal amount of variability in how it has gotten taught. Some instructors have gone hard at the math and calculus side, some have made it a slightly ramped-up version of gen-ed intro statistics, some have focused on computer simulations, and there's some interest in making it a statistical thinking class as well. I'm doing a lot of course design work over the next five days, deciding what I want this class to be—and that's not easy. There's so much I'd like to cover, but I'll only have 18 80-minute lectures to do it in. And while I'm a dab hand at computer science, most of my CS work dates from the early 2000s and I don't have a great sense of what problems today's students are tackling.

Fimfiction, however, knows more about computer science than I do. You guys frequently knock my socks off when I learn how tech-savvy you all are. More than that, I know there are a number of computer specialists here with an interest in statistical tools who've spent time pursuing this sort of stuff on their own.

So I want to put it to you guys: if you were a CS major, taking your one and only class about statistics, what would you want the instructor to talk about. No time for deep explorations of particular tools, but what ideas do you think would be important for helping a CS major understand how statistics can be useful to them and enhance their ability to do the work they're planning to do?

Comments ( 34 )

One thing that's always interested me is comparative popularity measurement algorithms.

Like "Score-to-views" weight. Reddit's system, for instance, is;

y = {1 if x > 0, 0 if x = 0, -1 if x < 0)
z = {1 if x < 1, otherwise x}

rank = log(z) + (y * t)/45000

Whereas sites like Amazon have upvotes divided by downvotes rather than as a flat subtraction, or something less flat.

I suppose that's more ranking and databasing than statistics, though, but having a lesson be on the comparative weight of certain component numbers when categorizing their wholes is still an interesting thing to look into.

Naive Bayes classifiers come to mind, too.

3594882
Good call. It'd be easy to include a module on weighted averaging. It's not a thing we ever really bother with in regular intro stats courses, but it's got a lot more applicability in computer science than it does for general education students.

Is this a lower-division or upper-division class? I'm guessing the latter.

Hmmm. They are relatively young CS students, yes? Illustrate everything with code. Have as much mathiness as you like as long as, right next to it, there's a chunk of code that does that in something like R. It's not so much that they require the code, it's that it serves as a soothing sight to convince them that all the curly mathy bits have a purpose to them.

But that's more of an aesthetic thing.

As for the actual contents of it... hmmm... Quick admission: my CS curriculum was so poorly designed that I had no stats whatsoever[1] so all I know about the subject[2] has been gleaned from various books and random sources. What I noticed in those (and in the curricula of intro-to-stats courses I skimmed online just now) is an exceptional focus on stuff like means comparison. T-tests and ANOVA/ANCOVA/MANOVA as far as the eye can see. It strikes my (extremely untrained) eye as something more of interest to people doing experiments rather than your average CS student. I mean, I guess, if you are doing usability tests or maybe benchmarking you may need to do an ANOVA at some point and compute a p-value and an $\omega^2$ value or something but most of the time... not really.

What I would have wanted to hear (especially with the benefit of hindsight) is more general modeling, and especially stuff with applications in machine learning and AI. Things like Markov chains[3], HMMs, maybe Bayesian networks? Obviously, you'll have to adjust this to the expected cleverness of your students.

[1] I had plenty of maths courses, but for some reason stats wasn't even mentioned.
[2] To within a rounding error, nothing. So take everything I say with the world's largest grain of salt.
[3] PageRank uses them, if memory serves. Also you can have a algorithm-off between using Markov chains to generate spam-filter-busting slabs of statistically-natural text and spam filters. :)

3594882
Seconded!

I'm a CS student.

#1 thing would be building models that represent practical situations. I know that's much more broad than most other suggestions, but I've yet to take a statistics course myself, and my experience is only from personal projects that I have done.

It's important to realize that the main reason CS students need to take stats is so they can be prepared to do the kind of math you need to do with computers.

For example, if you don't understand the Central Limit Theorem, you might try to create a flat distribution by combining two or more random rolls from a flat distribution. So that's kind of essential if you want to work with pseudorandom numbers, which come up all the time in programming.

I would focus on applied stats and the most important theorems, and I'd probably make sure they get some basic Bayesian bits too. (Mainly this is because people are idiots and everypony should understand the importance of Bayesian logic.) Shit like hypothesis testing and z-scores and ANOVA's might be relevant to the social sciences, but it isn't what most CS students need. I say this having degrees in both areas. :twilightsmile:

Perhaps you could pitch the class as a "data science" class. Many CS graduates will have large data sets put into their lap, and this class should cover the statistical tools required to analyze and visualize that data. This would involve some discussion of the basics of hypothesis testing and regression, but consider perhaps more advanced topics useful for large data sets such as empirical Bayes methods, clustering, and principal component analysis (though if this is a freshman class, these topics may not be as easy to teach; it's hard to discuss PCA if they haven't learned what an eigenvalue is).

Is machine learning still big in CS? That's an area where stats is important and discussing concepts like likelihood functions (e.g. via maximum likelihood estimation), regression, and Bayesian inference is probably useful.

3595264
Lower-division. According to the degree programs I'm looking at, they suggest this course for sophomores.


3595288
3595448
I think you're both right that a lot of the standard focus on testing (which is honestly pretty useless for everyone IMO) is extra-special useless here. Those sorts of tools are really useful, and I do think they're important to understand for anyone who's trying to learn how statistics works—but if you're just going to get a one-quarter course on statistics, I feel like you're better off focusing on concepts that are going to be important to what you are likely to do, rather than tools that probably aren't.

Which leads into:
3595755
3595288
Some of the topic ideas here are helpful in terms of knowing where students may be likely to see this stuff come up again, and where I might be able to get some examples to use—but things like clustering, PCA, hidden Markov models, support vector machines[1], all that is well beyond the scope of this class and what I'd expect students to be prepared for. The only prerequisite for this course is single-variable calculus.


Based on what y'all are saying (including 3595298 who I haven't tagged yet and 3594882 who weighed in early), I'm thinking that the following are some good places to focus:
– Statistical thinking
– Understanding data
– Basics of probability
– Building probability models
– Simulation
– Estimation

I'm leaving out the word "Bayesian" because I'm a dyed-in-the-wool Bayesian and the subjectivist approach to probability is going to heavily color anything I do. I'd like to shove some basic decision theory in there, too, but I don't know how much time I'll have.

---

[1] I keep thinking someone talked about support vector machines here, but I can't find it. Did I imagine it? Do I just see support vector machines everywhere?

3595852
Incidentally, one thing I always found useful in my own stats courses was explanations of the usage of statistics in science and polling; both are real-world things we have to deal with all the time in our life, reading papers and stuff, and it really helped me to understand how this stuff worked.

3595890
Can you expand on that or give some examples? To me, all this stuff is ingrained everywhere and I can't think about the practice of science or polling without thinking about statistics, so I don't think I've got a great sense of what's revelatory for other people.

3595906
I think the colored M&M's example (yes, I know, XKCD used jellybeans, shhh) was important for understanding why it is that p-values aren't always as useful as they seem to be, and how they can be misused, and how you have to consider that, when you're searching through a broad space, you're increasing your odds of finding something which appears to be significant but which is not, in fact, significant, and how you might have to tighten your standards if you're looking at multiple things at once.

Likewise, simply understanding why it is that a poll can call up 1,000 properly randomized people and get a reasonably accurate vision on the views of an entire country is something which really helped me to understand the significance of polling and why people do it.

My friend is currently taking a course in probability and statistics for engineers, which is obviously different from computer science, but it is also a sophomore class and might be a helpful framework. Some ideas I've gleaned from the outline of that course:

-Combinatoric counting techniques
-Conditional probabilities
-Probability distributions
-Sample mean, standard deviation, Central Limit Theorem
-Linear regression and correlation

Most of that probably falls under basics of probability, but food for thought I suppose.

Tachyon votes in favour of simulations but wishes it noted he went into science where such sims are quite useful.

I dunno, personally. I took a statistics class for my CS degree, but it was long enough ago it doesn't stick out in my memory and in the time since most of the programming work I've done hasn't really involved the subject.

Well, in ways that I have to think about, at least. I do a lot of database work, but it's mostly against an already designed system. I work in the IT side of things on a campus, and we have one of those big, all-encompassing student information systems.

i don't have anything really useful to add, as I came from a math background in college and never took any CS myself, but I feel like you ought to specifically use the Monty Hall problem as a discussion (and homework!) topic. Note especially the anecdote about Erdos in the wiki article.

You want to make a case about why CS and stats go together like chocolate and peanut butter; bait the hook with how CS can solve problems that confound mathematicians, and then set the hook with how statistics can solve problems that confound programmers.

3598330

This also properly equips the students for their ongoing online slapfights, because the Monty Hall Problem is the new Airplane on a Conveyor Belt, as far as ability to start an argument goes.

3598668
It is? But it's so easily solvable!

More people need to know about the Sleeping Beauty Problem, I guess.

3598702

So was the airplane on the conveyor belt. From what I've observed, people fail to grasp the importance of the fact that Monty will never show the car. I even watched someone get it wrong after constructing a truth table for it.

The binomial distribution will come in very helpful in CS in computing the expected number of different cases for many things, like "what are the odds of getting 3 or more bit errors in a single disk sector if ..."

Multiple linear regression, because it's such an easy and useful way of modelling data. It would be nice to find some prediction problem with a big existing (real-life) data set which you could run thru a bunch of weka machine learning methods--neural nets, SVMs, naive bayes, classifier trees--and then show that multiple linear regression beats them all.

Tests for randomness of data.

The t-test is not commonly used in computer programming, but is vital in computer science. If you want to test whether an algorithm does better than random at a task, or compare two algorithms and know which one is better, you need to do some similar test. Algorithm X produces errors 3, 7, and 2. Algorithm Y produces errors 4, 6, and 4. Can you conclude with 95% certainty that Y is better than X? Besides, it's vital for reading papers in science at all. Even for reading the newspaper. People need to know what "statistically significant" means.

A test they might need to do in signal processing is to test for periodicity in a signal.

Multiple hypothesis testing and degrees of freedom. Make sure they understand that if you perform 100 different tests on your data and find 3 that are statistically significant, that isn't a big deal.
I have a book on my desk where a guy routinely says things like, "I take these 11 data points and fit them to an equation of the form m = ax + by + cz + dx*x + ey*y + f*z*z + g*x^3 + h*y^3 + j*z^3, and I find these values for a..j explain 70% of the variance in the data. This proves that x, y, and z are the crucial variables in determining m." Explain why that's stupid.

3595919
The really funny thing is that the most elementary stats courses[1] will tell you why the M&... jell.... colored confectionery example is wrong. They failed to adjust for the familywise error rate.

'Course, doing it as a dozen studies instead of as just one is a great way to hide the above.

[1] Defined here as 'stuff that even Ghost knows'


3598668
3598702
Fun fact: Some time back I got blind drunk at a party, and got into an argument about Monty Hall which I proceeded to prove, to everyone's satisfaction, on a wine-stained sheet of paper in very shaky handwriting.

Then I proceeded to talk about FTL drives for three hours, and then convinced everyone present to get into a dial-up connection sound imitation competition.

Oddly, all of these things made me popular. CS student parties, man. They are hardcore.

:pinkiehappy:

Also: Sleeping Beauty never fails to give me a splitting headache. I have to admit I like the thirder position, myself, but not unreservedly.

3599198

The binomial distribution will come in very helpful in CS in computing the expected number of different cases for many things, like "what are the odds of getting 3 or more bit errors in a single disk sector if ..."

Also great at fleecing people in Liar's Dice.

I... uh... heard somewhere.

Multiple linear regression, because it's such an easy and useful way of modelling data. It would be nice to find some prediction problem with a big existing (real-life) data set which you could run thru a bunch of weka machine learning methods--neural nets, SVMs, naive bayes, classifier trees--and then show that multiple linear regression beats them all.

Hey, I'm just now looking at a classifier tree that cleanly outperformed an (admittedly logistic) regression by about 20%. And you know what that means.

...that's right. I screwed up the cross-validation somewhere along the line.

*sigh*

:trollestia:

Tests for randomness of data.

Seconded!

The t-test is not commonly used in computer programming, but is vital in computer science. If you want to test whether an algorithm does better than random at a task, or compare two algorithms and know which one is better, you need to do some similar test. Algorithm X produces errors 3, 7, and 2. Algorithm Y produces errors 4, 6, and 4. Can you conclude with 95% certainty that Y is better than X? Besides, it's vital for reading papers in science at all. Even for reading the newspaper. People need to know what "statistically significant" means.

That's really more 'general statistical literacy' than anything. Still, granted, you do need to know that. I think Bradel can fit it in no problem. Shouldn't take you much more than an afternoon to get to grips with the t-test, surely?

Multiple hypothesis testing and degrees of freedom. Make sure they understand that if you perform 100 different tests on your data and find 3 that are statistically significant, that isn't a big deal.

I have a book on my desk where a guy routinely says things like, "I take these 11 data points and fit them to an equation of the form m = ax + by + cz + dx*x + ey*y + f*z*z + g*x^3 + h*y^3 + j*z^3, and I find these values for a..j explain 70% of the variance in the data. This proves that x, y, and z are the crucial variables in determining m." Explain why that's stupid.

I think I can do it in about five words 'parsimony-adjusted measures of fit.' If you add more predictiors, your R^2[1] will go up. Simple as that. It's why things like AIC and BIC were invented. Someone published a book without taking this into account? Seriously? (Also on second reading, n=11! 11! That's a miserable sample size. If he had any more coefficients to play with he could have run a Lagrange interpolating polynomial through the whole mess and claimed 100% variance explained. Feh. Didn't even see the sample size first time 'round.)

(Fun fact: I seem to recall that regression model fitting information criteria are Bradel's area of research. So I'm sure he'll have quite a few things to say re: the above example.)

[1] Which is bandied about with 'explains' all the time. Accounts for is more accurate. It doesn't necessarily explain anything.

It's the morning, so I don't want to go around posting actual useful stuff, but...

3599399

Also: Sleeping Beauty never fails to give me a splitting headache. I have to admit I like the thirder position, myself, but not unreservedly.

Halfer for life, and IIRC it's all about decision theory. Think about it this way: all bets pay out equally, but on one track SB has to make one bet, and on the other track she has to make two bets (she gets asked the question twice). SB having credence 1/2 induces a 1/3 / 2/3 response pattern. When you recognize that your utility function isn't balanced and separate the action SB takes from the credence she has in heads/tails, it's bloody obvious that her credence ought to be 1/2. Anything else would just make her behave stupidly.

Essentially, the thirder position wants to treat the three wakings as independent events, but two of them are perfectly correlated. People freak at the idea that P(H&Mo) = P(T&Mo) = P(T&Tu) = 1/2, but they ignore the fact that {T&Mo} <=> {T&Tu}.

The question of whether SB ought to say "Monday" or "Tuesday" is purely semantic and driven by decision theory. What she believes about the coin, on the other hand, is obvious. Even the thirder position depends on her believing that the coin is fair.

<shots fired>

3599722

When you recognize that your utility function isn't balanced and separate the action SB takes from the credence she has in heads/tails, it's bloody obvious that her credence ought to be 1/2. Anything else would just make her behave stupidly.

If they make bets about it, the halfer consistently loses money to the thirder, which is the definition of behaving stupidly.

The question of whether SB ought to say "Monday" or "Tuesday" is purely semantic and driven by decision theory.

She's never asked whether it's Monday or Tuesday.

3599876
If they make bets about it, the halfer that recognizes Tails pays out 2:1 does just fine, thank you.

3599399

(Also on second reading, n=11! 11! That's a miserable sample size. If he had any more coefficients to play with he could have run a Lagrange interpolating polynomial through the whole mess and claimed 100% variance explained. Feh. Didn't even see the sample size first time 'round.)

It's worse than that. Sometimes the number of coefficients in his equation is larger than his sample size--and yet he still manages to explain only .3 to .7 of the data! He couldn't have had the computing power to optimize all those coefficients anyway, so he must have picked them heuristically.

Bradel, I'd love to have some guidelines as to how much of the variance of a dataset I should expect to explain with best fit to an equation with N variables. Say a polynomial. (What's the name for a polynomial with no xy terms?) Add whatever assumptions/restrictions are useful.

3599906 Someone who does that is a thirder, not a halfer. Stop using the word "credence". That stuff is irrelevant and ungrounded language games. I'm uninterested in aspects of SB that aren't defined operationally. When you define them operationally, you must be a thirder, IIRC.

Eh, we had this argument already.

3600172
No, he has a point. You can't conflate p(a fair coin flipped heads) and p(I am awake because a coin flipped heads). This strikes me like the boy-girl problem, where the correct answer changes based on the way the question is defined, and the question specifically asks what you believe the odds are.

Thought experiment: the setup is identical except that the drug does not cause amnesia, and when you are woken up for the first time (on Monday) you must place a bet for $X on whether the coin came up heads or tails. If you are woken up on Tuesday, then at that time Monday's bet will be placed again for you, with no opportunity to change it, with your answer and the stakes identical. If you are not woken up on Tuesday, no second bet is placed. This is a totally identical formulation in terms of outcomes (assuming that you are betting methodologically rather than randomly: i.e. assuming that your amnesia-Tuesday response would be identical to your amnesia-Monday response).

On Monday, the odds of the coin flip coming up heads are 50%,. However, you will lose $2x if you bet on heads and you're wrong, and you lose $1x if you bet on tails and you're wrong. So what you believe about the odds and what you should bet about the odds are not identical.

3601180 No one has ever suggested that anyone believes the odds are anything other than 50/50. The question therefore can't be what you believe about the odds of the coin flip.

So what you believe about the odds and what you should bet about the odds are not identical.

What you "believe about the odds" hasn't been defined other than as what you should bet about the odds. "Belief" isn't mystical. It has to refer back to the real world. If you define "belief about the odds" clearly, I think it will turn out either to be the thirder position on betting, or the trivial statement that the coin flip has 1:1 odds, or some other uncontroversial observation. There's no paradox here, only confusion over words, enabled because we can make endless ungrounded claims and define arbitrary new terms about what's going on inside someone's head. Operationalize "belief" and I think the confusion will vanish.

In other words, I don't want to argue about what "belief" or "credence" should mean. Describe the situation, and put a question to SB in terms such that we can evaluate her answer as correct or incorrect, and we should agree on what she should answer. If that isn't done, then it isn't a logic problem.

3601336

There's no paradox here, only confusion over words.

On that, we wholly agree. I would put myself in the halfer camp exactly because:

If you define "belief about the odds" clearly, it will turn out to be … the trivial statement that the coin flip has 1:1 odds

and that's how I parse the question being asked. You parse it differently, and you say that makes me a thirder.

Only one quibble:

What you "believe about the odds" is not defined other than as what you should bet about the odds.

So if I said "You and I are going to run this experiment on Alice 10,000 times, and I'll bet you $100 that the number of heads we flip as we perform these experiments is closer to 5000 than 3333," and you believe Alice is correct to assess p(heads)=1/3, then why wouldn't you take my bet?

Because that's the same argument I just made. Again down to definitions.

3601336
I want to agree with this, and do personally agree with this completely—but my understanding of SB is that the philosophical thirder position is essentially tantamount to saying "the coin comes up heads 1/3 of the time, because of the frame of reference".

I get the feeling, then, that we're all kind of on the same page that: (1) the coin comes up heads 1/2 the time, (2) any rational gambler should bet on tails because it pays out twice as much[1], and (3) the whole thing is basically just arguing over what words mean[2].

I'm pretty happy with that conclusion.


[1] There's basically no difference between "twice as much" and "twice as often" here, but I think that phrasing it as "twice as often" perpetuates the sort of nonsensical idea that you can treat separate wakings on the multi-wake path as meaningfully distinct from one another.

[2] I agree with what you that 'credence' is an awful word and I really have no idea what it's supposed to mean. It seems to be how the SB nerds talk about things, but my experiences thus far with the overlap between philosophy and probability is that philosophers use a lot of dumb words that they've decided have subtly different meanings I can't understand. Ferinstance, they seem to make a big point of choosing to talk about the "chances" of something versus the "probability" of something. Maybe this is just a way of trying to have your cake (the classical interpretation of probability) and eat it too (the subjectivist interpretation, which seems like it's the only really justifiable interpretation), but I'm not sure. In any case, I was trying to keep things to the language they use, a la Wikipedia, but I think that language is stupid and I don't know what they're on about with it.

3601377

So if I said "You and I are going to run this experiment on Alice 10,000 times, and I'll bet you $100 that the number of heads we flip as we perform these experiments is closer to 5000 than 3333," and you believe Alice is correct to assess p(heads)=1/3, then why wouldn't you take my bet?

The bit that makes people go all cray-cray over this is that there's a difference between the experimenter's perspective and Alice's perspective. But your formulation kind of ignores that, because traditionally the experimenter is considered to be optimizing for betting, so the experimenter wants to bet tails twice as high as heads, but Alice literally can't tell the difference between heads and tails.

Also, the more I think about this, the more it seems totally unworthy of discussion. Is there really any part of this that isn't bloody obvious to everyone? I can hardly figure out what people are arguing about at this point. Not in an "I'm right and you're wrong" way; I literally can't figure out what point is supposed to be getting discussed.

3601377

So if I said "You and I are going to run this experiment on Alice 10,000 times, and I'll bet you $100 that the number of heads we flip as we perform these experiments is closer to 5000 than 3333," and you believe Alice is correct to assess p(heads)=1/3, then why wouldn't you take my bet?

Alice says P(heads | I was woken up) = 1/3, which is not P(heads). The question is about conditional probability. To see it more clearly, consider these alternate SB problems:

If the coin comes up heads, SB will be woken up on Monday only. If it comes up tails, she'll be woken up every day for 20 years. If she finds she's been woken up, what should she say is the probability that the coin came up tails?

If the coin comes up tails, SB will be woken up on Monday and Tuesday. If it comes up heads, she'll be immediately shot in the head. If she finds she's been woken up, what should she say is the probability that the coin came up tails?

Do you think that she should say 1/2 in either of these cases?

3601472
After reading all this, I just feel bad for the girl and hope she's being compensated well. This is the depth of my statistical analysis of the situation.

3608456 Don't worry. We outsourced the sleeping to India. $10 is, like, a thousand rupees.

Login or register to comment