Updated CPI plot and data

US CPI through March 2014, cropped.

US CPI through March 2014, cropped.

March CPI was announced by the BLS today; raw and core CPI both grew by about 2.4% (this is the annualized rate). Our main graphs have been updated and this post has a cutaway of the most recent data. Light gray is raw CPI and black is core CPI.

Interesting links for, ahem, “forecasting Fridays”

Something old for blogs, but new to this one. A list of links! To posts on other blogs!

A caveat: I’ve probably only skimmed these, not read them carefully. The usual disclaimer applies: links are not endorsements. They’re even better! A link means “this might be interesting and this blog probably has interesting ideas” and not “I agree with this moron.”

If there are blogs I should know about, send me an email. I’ll try to update the blogroll to more accurately reflect what I read.

Obscure writing recommendation for math: don’t start sentences with variables

I was reminded of this rule when I read Matt Yglesias’s summary of “Capital in the 21st Century” on Vox. One of the paragraphs started “R is a more abstract idea….” But “R” hadn’t been defined anywhere. The variable “r” has been defined as the rate of return on capital, but mathematical variables are case sensitive, so referring to “r” later as “R” is incorrect. It’s also confusing to some readers. (Admittedly, probably a small small minority of readers.) I spent some time looking for “R” earlier in the article before I realized what was going on. (You won’t find this line in the article now; to their credit, it was quickly corrected.)

Hence the rule: do not start a sentence with a mathematical symbol. I probably learned this rule from the Handbook of Writing for the Mathematical Sciences by Nicholas Higham (p. 29; I’m mentioning this book because you should own it) and it’s accepted enough that it’s in the Chicago Manual of Style. So you do a little write-around: if the sentence were, “The rate of return, ‘r,’ is a more abstract idea…” there’s no confusion. It’s slightly wordier, but better. (The Vox article does an even better write-around now.) Using a semicolon instead of a period can also work, but that’s not going to help at the beginning of the paragraph.

If you didn’t know the rule before, now you do.


UConn over Kentucky, statistical graphics

I put together some graphs for the final four and the championship game for fun. Each of them plots the winning team’s lead over the course of the game. (I.e., when the line moves up by 2, that means the eventual winner scores; when it moves down, the loser scores.)

Don’t worry, these graphs won’t be a regular occurrence. (The code is in a lot of disarray, and there’s no way I’m putting in the effort to maintain it right now.) I have a lot of family in Connecticut and friends who are big fans too, so it’s great to see UConn win!

One thing to notice (just to tie the post back to econometrics): if we think that the distribution of “baskets” is more or less stationary or I(0), then it’s clear that each of the teams’ scores is a unit root process (with drift). Subtracting one score from the other, like we’re doing here, still gives a unit root process. What that means is that we’re likely to see lots of things like “runs” just from the fact that it’s a unit root process. So patterns like we see at the beginning of the second half of the Wisconsin – Kentucky game, where it looks like Kentucky gets hot but then loses momentum, might be consistent with just a constant drift term (so there’s no momentum and no getting hot). Anyway, it might be worth looking into. (I’m not going to.)

Notes and links related to the unemployment report

animated gif of the us unemployment rate



These graphs plot the unemployment rate before, during, and after each recession since WW2; there’s a separate line for each recession. The data are current through the April 4th BLS announcement, which had data through March 2014. The data released today will be revised next month, so today’s estimated 6.7% unemployment rate for March may be changed. You can look up historical revisions of the unemployment rate on the St. Louis Fed’s ALFRED database. (At some point I plan to add revisions to the graphs, but… not yet.)

If the animated gif is annoying, you can see these graphs displayed in a static gallery on the dedicated unemployment graphs page. And tell me if they’re annoying. (Twitter or comment box below.) I’m trying the animations for fun and to try out new things, but I’m not sure how I feel about them myself.

Some relevant links to more data:

  • The “U6″ unemployment rate, which is a more inclusive measure of the unemployment rate (it counts marginally attached and underemployed workers as unemployed. Marginally attached workers are people who would like to work but are not actively looking for a job, so they’re typically not counted as part of the labor force. Underemployed workers are people who are working part time jobs but would like full time jobs, and they’re typically counted as “employed”. This rate was 12.7% in March; it is invariably higher than the standard unemployment rate, but follows nearly identical patterns over time and only goes back to 1994, which is why I don’t make an effort to graph it.
  • The labor force participation rate.
  • The report itself (pdf)

So, as I promised on Twitter, some thoughts on the unemployment rate. First, and I feel compelled to say this as an econometrician, this idea that we should try to pick up new trends over the last two or three months of data is a little ridiculous. There’s a lot of noise in the data, even without explicit measurement error. That said, I wouldn’t want the Fed to wait 6 months after the first clear signs of a recession — just to be sure — before acting, so there’s a tension.

In any case, these reservations are sort of hypothetical today. The recovery/expansion looks about as weak as it’s been all along. Combined with the low rates of inflation and the absence of a meaningful recovery period in GDP, you might think that there would be more urgency among policy-makers for stimulus and expansionary policy. But there clearly isn’t, either at the Fed or in congress.

If you want more details and anything approaching a nuanced analysis, you should always go to another website. Here are some:

Visual scoreboard R.I.P.

I ran a website called “Visual Scoreboard” (visualscoreboard.com) for a year or two, where I posted some sports-related statistical summaries. I haven’t had time for it lately, so I’ve shut down the site and will set up the domain to redirect here. But I’ll put out an announcement if I start it back up.

You can get a sense of what I was doing with the site by looking at its Twitter, @vis_sb, which I plan to leave up. The account was pretty amazing in one key metric: percentage of followers who were Edward Tufte. (I think that at one point we had eight followers and one of them was Tufte, giving us a “Tufte-rate” of 0.125 which I was pretty excited about.)

For football, we used to post game summaries like this:

This tweet plots the difference in the teams’ scores over the course of the game. If it goes up by 7, that means that Cleveland scored a touchdown and extra point (so their score increased by 7).

The line width indicated which team is on offense.

For basketball, we posted similar graphics:

If I were starting the site back up, I’d like to update graphs like that in real time and (this part is ambitious) post density forecasts for the rest of the game in real time, maybe conditional on different lineup combinations.

It was fun, but maintaining scripts to scrape different sports sites was more time consuming than you’d expect. But kind of necessary: one of the cool things about the site is that it ran on autopilot, so the mac mini attached to my TV would generate news graphs and post them to the site and to twitter every ten minutes. It was a pretty awesome service!

p.s. to any people trying to set up a sports “data-journalism” site: the code is very disorganized but on github. You’re welcome to use it, and I’ll make some effort to be helpful if you try to set it up. Ideally you’d want legitimate access to the streaming data, and in that case you could do some pretty freaking phenomenal things.

Annotated FOMC statement and more

There was an FOMC decision and Janet Yellen press conference today. No change now, may raise the interest rate sooner than previously expected (beginning in slightly more than a year.) They’re continuing to taper. Some better analyses:

538′s comparative advantage

After I wrote that last post on 538 I thought a little more about what Nate Silver’s and by extension 538′s comparative advantage might be. (tl;dr – I don’t think it’s ‘data journalism’ necessarily.) And it’s pretty obvious: forecasting and forecast aggregation. He pretty clearly understands probability modeling and model combinations, he’s comfortable making specific predictions publicly (which is hugely underrated; lots of smart people are to worried about making a mistake to be good at forecasting), and he does a very very good job explaining when and how forecasts change. That was the entire appeal of his election forecasts.

This article on government spending, for example, would be a lot tighter and more interesting if it made predictions about future spending over the next 15 years. Or, for that matter, if that plane article tried to predict when and where the plane would be found.

Kneejerk reaction to the new 538

In the spirit of honest disclosure, you should know that I am very jealous of Nate Silver. Keep that in mind as you read this blog.

Since it’s apparently not too early to have kneejerk reactions to the new 538 site (e.g. Krugman, Cowen), I want to get mine in too. Some quick thoughts:

  1. They’re trying to do “data journalism.” But the articles they’ve posted are pretty superficial. (This seems to be Cowen and Krugman’s main complaint.) For example: the linkbait, “How Statisticians Could Help Find That Missing Plane” just talks around Bayesian Statistics and spends only a few paragraphs on actual examples of statisticians finding airplanes. But that’s the interesting part! Not whether Bayesian or Frequentist statistics is better for finding airplanes (which is, unfortunately, the article’s focus), but, in detail, how does “statistical airplane finding” work? What should they be doing differently?
  2. It looks like they’re aiming for the Freakonomics/Planet Money pop-quant audience. That probably makes sense financially, but I was hoping for more.
  3. They should consider hiring a PhD statistician as an ombundsman. (Well, I’d be okay with an econometrician, but this is for external credibility.) I nominate Andrew Gelman. Not that I agree with everything he writes, but he seriously knows what he’s doing, he studies political science, and he’s a Bayesian and used to blog for 538 (pre-NYT, I think) to boot.
    Look, this stuff is hard and, even though he has a good track record as a forecaster and gambler, if Nate Silver is deciding what’s good statistical practice and what’s bad practice, 538 is going to make embarrassingly preventable mistakes. The previously discussed airline-finding article is concerning: it reads like Bayesian fanfic.

    Having a “real” statistician in an oversight role would also be a great way to differentiate their site from everyone else in the pop-quant space. (Gelman, would you take the job? How much would they have to pay you? ESPN’s like, loaded. You could even write a recurring column on plagarism and data fraud!)

  4. 538′s using wordpress? Grantland‘s on wordpress too? So are re/code and MMQB? WTF? Why are all these new boutique news sites running on wordpress?

    Background: I’ve started noticing sites’ backends after reading this post by Felix Salmon. WordPress is literally the easiest way you could set up a blog, (for example, this blog is on wordpress) so it makes some sense. But it locks you in. Felix talks about scaling issues, and that does seem important here. Shouldn’t 538′s goal be to get big? Or ESPN’s goal for the entirety of their news sites?

    But there’s another issue too. Since ESPN is setting up multiple boutique news sites and has a pretty massive web site of its own, shouldn’t setting up a specialized CMS for media heavy websites be a high priority? Wouldn’t that go hand in hand with what 538′s trying to do? And Grantland? Imagine how much better Zach Lowe would be if he had a good online video editor and didn’t have to embed YouTube clips in every article! Imagine if it were easy for 538′s writers to embed live interactive statistical graphics. (Which is probably doable with wordpress, but could certainly be easier.)

    Is “vanilla blogging” really the best strategy for a new news website, when there are so many other interesting and capital/tech intensive approachs? Like say, bl.ocks.org (run by Mike Bostock, an NYT hire) or Shiny (by R Studio, which has many Iowa State connections and employs a former student whose committee I was on; this is meant as ‘disclosure’ btw) or, more “traditionally,” vice news.

    It’s maybe worth noting, given Salmon’s post, that Ezra Klein’s taking his news talents to Vox media.

Am I nitpicking? Oh god yes.

But none of these choices exactly scream that they have a bold new vision of web journalism and are going to push it in exciting directions. More that they think that they have a fun way to capitalize on the pop-quant/Freakonomics market and monetize Silver’s exploding reputation. Like I said at the very beginning of the post, I’m very deeply and sincerely jealous, and maybe being jealous is making me wrong about the rest of these points too. And it’s entirely possible I’m reading too much into their first day online. In any case, I hope they have higher ambitions than a newsier “Freakonomics” and that they can pull it off.

P.S. This should probably be its own post, but if I do that I won’t write it.

It’s worth pointing out that Cowen and Krugman are both economists and that’s not surprising; the level of economic discourse on the web is simply outstanding. This isn’t necessarily apparent to non-economists, but tons of the most credentialed and established academic economists are regular bloggers and hammer at substantive policy issues, new academic research, etc. And then there are the professional bloggers who are just as outstanding. (See our blogroll for a very small selection.) And then there are less credentialed bloggers who establish themselves with fantastic writing and analysis. And…(etc!) It is an fing tough market.

This is part of why I question the “vanilla blogging” strategy in part 4 above. For econ, at least, there are thousands and thousands of people who can do “vanilla blogging” at an absurdly high level. I’d be surprised if that weren’t true of other fields as well. I’m not sure why 538 sees that as their comparative advantage.

Slowly moving to the Julia language from R

Julia is a new computing language that’s gotten a lot of attention lately (e.g., this Wired piece) and that I’ve ignored until recently. But I checked it out a few days ago and, holy crap, it’s a nice language. I’m rewriting the code in my book to use Julia instead of R and I’m almost certainly going to use it instead of R in my PhD class next fall.

So, why Julia and why not R? (And, I suppose why not Python/Matlab/other languages?)

  • Multiple dispatch. So you can define a function
    function TrickyAlgorithm(aThing, bThing)

    differently to depend on whether aThing is a matrix of real numbers and bThing is is a vector, or aThing is a vector and bThing is a matrix, or any other combinations of data types. And you can do this without lots of tedious, potentially slow, and confusing (to people reading and maintaining the code) argument checks and conversion within the function.

    Note that this is kind of similar to Object Oriented Programming, but in OOP TrickyAlgorithm would need to be a method of aThing or bThing. Also note that this is present in R as well.

  • Homoiconicity — the code can be operated on by other parts of the code. Again, R kind of has this too! Kind of, because I’m unaware of a good explanation for how to use it productively, and R’s syntax and scoping rules make it tricky to pull off. But I’m still excited to see it in Julia, because I’ve heard good things about macros and I’d like to appreciate them (I’ve started playing around with Clojure and like it a lot too…). And because stuff like this is amazing:
    @devec r = exp(-abs(x-y))

    which devectorizes x and y (both vectors) and evaluates as

    for i = 1:length(x)
        r[i] = exp(-abs(x[i]-y[i]))

    (this example and code is from Dahua Lin’s blog post, Fast Numeric Computation in Julia). Note that “evaluates as” does not mean “gives the same answer as,” it means that the code r = exp(-abs(x-y)) is replaced with the loop by @devec and then the loop is what’s run.

  • Decent speed. Definitely faster than well written R; I don’t have a great feel for how well it compares to highly optimized R (using inline C++, for example), but I write one-off simulation programs and don’t write highly optimized R.And the language encourages loops, which is a relief. R discourages loops and encourages “vectorized” operations that operate on entire objects at once (which are then converted to fast loops in C…). But I use loops all the time anyway, because avoiding loops in time series applications is impossible. R’s poor support for recursion doesn’t help either.And, more to the point, I teach econometrics to graduate students. Many of them haven’t programmed before. Most of them are not going to write parts of their analysis in C++.
  • The syntax is fine and unthreatening, which will help for teaching. It basically looks like Matlab done right. Matlab’s not a bad language because its programs look like they’re built out of Legos, it’s a bad language because of its horrendous implementation of functions, anonymous functions, objects, etc. Compared to R, Matlab and Julia look downright friendly. Compared to Clojure… I can’t even imagine asking first year PhD students (some with no programming experience at all) to work with a Lisp.
  • The last point that’s always mentioned in these language comparisons. What about all of the R packages? There are thousands and thousands of statistical packages coded up for R, and you’re giving that up by moving to a different language.This is apparently a big concern for a lot of people, but… have you looked at the source code for these packages? Most of them are terrible! But some are good, and it might take some time to port them to Julia. Not that much time, I think, because most high-performance popular R packages are a thin layer of interoperability over a fast implementation in C or C++, so the port is just a matter of wrapping it up for Julia. And most of the well designed packages are tools for other package developers.That’s not quite true of R’s statistical graphics, though. They’re really great and could be hard to port. And that’s more or less the only thing that I’m sure that I’ll miss in Julia. (But hopefully not for too long.)
  • Lastly, and this is important: the same massive quantity of packages for R is a big constraint on its future development. Breaking backwards compatibility is a big deal but avoiding it too much imposes costs.

Anyway, since I converted some R code to Julia I thought it would be fun to compare speeds. The first example is used to show the sampling distribution of an average of uniform(0,1) random variables. In R, we have

rstats <- function(rerror, nobs, nsims = 500) {
  replicate(nsims, mean(rerror(nobs)))}

which is (I think) pretty idiomatic R (and is vectorized, so it’s supposed to be fast). Calling it gives

R> system.time(rstats(runif, 500))

[out]:  user  system elapsed 
       0.341   0.002   0.377

For comparison to the Julia results, we’re going to care about the “elapsed” result of 0.377 seconds; the “system” column isn’t relevant here.  Calling it for more observations and more simulations (50,000 of each) gives

R> system.time(rstats(runif, 50000, 50000))

[out]:   user  system elapsed 
      204.184   0.217 215.526

so 216 seconds overall. And, just to preempt criticism, I ran these simulations a few times each and these results are representative; and I ran a byte-compiled version that got (unexpectedly) slightly worse performance.

Equivalent Julia code is

function rmeans(dist, nobs; nsims = 500)
    means = Array(Float64,nsims)
    for i in 1:nsims
        means[i] = mean(rand(dist, nobs))
    return means

which is pretty easy to read, but I have no idea if it’s idiomatic. This is my first code in Julia. If you like to minimize lines of code and preallocation of arrays, Julia has list comprehensions and you can write the stylish one line definition (that gave similar times)

rmeans_pretty(dist, nobs; nsims = 500) =
    [ mean(rand(dist, nobs)) for i = 1:nsims ]

We can time  (after loading the Distributions packages):

julia> @elapsed rmeans(Uniform(), 500)

[out]: 0.093662961

so 0.09 seconds, or about a quarter the time as R. But (I forgot to mention earlier), Julia uses a Just In Time compiler, so the 0.09 seconds includes compilation and execution. Running it a second time gives

julia>  @elapsed rmeans(Uniform(), 500)

[out]: 0.004334132

which is half the time again. (Update on 3/17: as Jules pointed out in the comments, 0.004 is 1/20th of 0.09, so this is substantially faster than I’d initially thought. So we are getting into the ~100 times faster range. That’s actually a pretty exciting speed increase, but I’ll need to look into it some more. Well, that was embarrassing.)

Running the larger simulation, we have

julia> @elapsed rmeans_loop(Uniform(), 50000, nsims = 50000)

[out]: 77.318591953

so the R code is a little less than three times slower here. (The compilation step doesn’t make a meaningful difference.) So, Julia isn’t hundreds of times faster, but it is noticeably faster than R, which is nice.

But speed in this sort of test isn’t the main factor. I’m really excited about multiple dispatch — it’s one of the few things in R that I really, really liked from a language standpoint. I really like what I’ve read about Julia’s support for parallelism (but need to learn more). And I like metaprogramming, even if I can’t do it myself yet. So Julia’s trying to be a fast, easy to learn, and elegantly designed language. That’s awesome. I want it to work.

ps: and it’s open source! Can’t forget that.

Cajetan, an interactive macroeconomics simulation and game

I might start coding up a proof-of-concept interactive macroeconomics simulation/game. If I do, I’ll post details here. (This post is mostly a placeholder to generate a URL.)

The game should be a pretty simple implementation of the ubiquitous baby-sitting economic model (published by Sweeney and Sweeney, 1977, [pdf] and publicized by Paul Krugman). The internal logic is probably 10 lines of SQL, so it’s just a matter of taking an hour or two to set everything up.

Ok, Stalwartbucks are kind of cool

This is kind of cool: Joe Weisenthal’s forked Bitcoin:

Along with my friend Guan Yang, today I’m proud to announce a brand new digital currency that everyone can mine or trade.

See, one of the cool things about this new world of digital currencies is that they’re all based — for the most part — on open-source technology. So all you have to do is copy the code of someone else, get some servers running, and voilà, anyone can start creating the currency by having their computer solve math problems.

And, to make sure there’s a market for them:

Once every month, Guan and I will go out to get Korean barbecue — our favorite food to eat together — and invite a third person along who wants to talk about economics and technology (two of our favorite subjects). Of course, to buy your way into this dinner, you’ll have to pay using Stalwartbucks. Thus, unlike Bitcoin, we’re instantly creating a floor in the value of the coins by dint of the fact that they’ll be redeemable for something of real value.

Furthermore, I plan to auction off one post a month, where I will write on the topic of your choice: Again, you’ll have to pay in Stalwartbucks.

And hopefully it’s not just me. Other Twitterers, writers, pundits, and so forth will be encouraged to help create a thriving ecosystem whereby people can redeem their Stalwartbucks.

Maybe I should do something similar for my Principles class…

Read more: http://www.businessinsider.com/introducing-stalwartbucks-2014-1