# Slowly moving to the Julia language from R

Julia is a new computing language that’s gotten a lot of attention lately (e.g., this Wired piece) and that I’ve ignored until recently. But I checked it out a few days ago and, holy crap, it’s a nice language. I’m rewriting the code in my book to use Julia instead of R and I’m almost certainly going to use it instead of R in my PhD class next fall.

So, why Julia and why not R? (And, I suppose why not Python/Matlab/other languages?)

• Multiple dispatch. So you can define a function
``function TrickyAlgorithm(aThing, bThing)``

differently to depend on whether `aThing` is a matrix of real numbers and `bThing` is is a vector, or `aThing` is a vector and `bThing` is a matrix, or any other combinations of data types. And you can do this without lots of tedious, potentially slow, and confusing (to people reading and maintaining the code) argument checks and conversion within the function.

Note that this is kind of similar to Object Oriented Programming, but in OOP `TrickyAlgorithm` would need to be a method of `aThing` or `bThing`. Also note that this is present in R as well.

• Homoiconicity — the code can be operated on by other parts of the code. Again, R kind of has this too! Kind of, because I’m unaware of a good explanation for how to use it productively, and R’s syntax and scoping rules make it tricky to pull off. But I’m still excited to see it in Julia, because I’ve heard good things about macros and I’d like to appreciate them (I’ve started playing around with Clojure and like it a lot too…). And because stuff like this is amazing:
``@devec r = exp(-abs(x-y))``

which devectorizes x and y (both vectors) and evaluates as

``````for i = 1:length(x)
r[i] = exp(-abs(x[i]-y[i]))
end
``````

(this example and code is from Dahua Lin’s blog post, Fast Numeric Computation in Julia). Note that “evaluates as” does not mean “gives the same answer as,” it means that the code `r = exp(-abs(x-y))` is replaced with the loop by `@devec` and then the loop is what’s run.

• Decent speed. Definitely faster than well written R; I don’t have a great feel for how well it compares to highly optimized R (using inline C++, for example), but I write one-off simulation programs and don’t write highly optimized R.And the language encourages loops, which is a relief. R discourages loops and encourages “vectorized” operations that operate on entire objects at once (which are then converted to fast loops in C…). But I use loops all the time anyway, because avoiding loops in time series applications is impossible. R’s poor support for recursion doesn’t help either.And, more to the point, I teach econometrics to graduate students. Many of them haven’t programmed before. Most of them are not going to write parts of their analysis in C++.
• The syntax is fine and unthreatening, which will help for teaching. It basically looks like Matlab done right. Matlab’s not a bad language because its programs look like they’re built out of Legos, it’s a bad language because of its horrendous implementation of functions, anonymous functions, objects, etc. Compared to R, Matlab and Julia look downright friendly. Compared to Clojure… I can’t even imagine asking first year PhD students (some with no programming experience at all) to work with a Lisp.
• The last point that’s always mentioned in these language comparisons. What about all of the R packages? There are thousands and thousands of statistical packages coded up for R, and you’re giving that up by moving to a different language.This is apparently a big concern for a lot of people, but… have you looked at the source code for these packages? Most of them are terrible! But some are good, and it might take some time to port them to Julia. Not that much time, I think, because most high-performance popular R packages are a thin layer of interoperability over a fast implementation in C or C++, so the port is just a matter of wrapping it up for Julia. And most of the well designed packages are tools for other package developers.That’s not quite true of R’s statistical graphics, though. They’re really great and could be hard to port. And that’s more or less the only thing that I’m sure that I’ll miss in Julia. (But hopefully not for too long.)
• Lastly, and this is important: the same massive quantity of packages for R is a big constraint on its future development. Breaking backwards compatibility is a big deal but avoiding it too much imposes costs.

Anyway, since I converted some R code to Julia I thought it would be fun to compare speeds. The first example is used to show the sampling distribution of an average of uniform(0,1) random variables. In R, we have

``````rstats <- function(rerror, nobs, nsims = 500) {
replicate(nsims, mean(rerror(nobs)))}``````

which is (I think) pretty idiomatic R (and is vectorized, so it’s supposed to be fast). Calling it gives

```R> system.time(rstats(runif, 500))

[out]:  user  system elapsed
0.341   0.002   0.377```

For comparison to the Julia results, we’re going to care about the “elapsed” result of 0.377 seconds; the “system” column isn’t relevant here.  Calling it for more observations and more simulations (50,000 of each) gives

```R> system.time(rstats(runif, 50000, 50000))

[out]:   user  system elapsed
204.184   0.217 215.526```

so 216 seconds overall. And, just to preempt criticism, I ran these simulations a few times each and these results are representative; and I ran a byte-compiled version that got (unexpectedly) slightly worse performance.

Equivalent Julia code is

```function rmeans(dist, nobs; nsims = 500)
means = Array(Float64,nsims)
for i in 1:nsims
means[i] = mean(rand(dist, nobs))
end
return means
end```

which is pretty easy to read, but I have no idea if it’s idiomatic. This is my first code in Julia. If you like to minimize lines of code and preallocation of arrays, Julia has list comprehensions and you can write the stylish one line definition (that gave similar times)

```rmeans_pretty(dist, nobs; nsims = 500) =
[ mean(rand(dist, nobs)) for i = 1:nsims ]```

We can time  (after loading the Distributions packages):

```julia> @elapsed rmeans(Uniform(), 500)

[out]: 0.093662961```

so 0.09 seconds, or about a quarter the time as R. But (I forgot to mention earlier), Julia uses a Just In Time compiler, so the 0.09 seconds includes compilation and execution. Running it a second time gives

```julia>  @elapsed rmeans(Uniform(), 500)

[out]: 0.004334132```

which is half the time again.

Running the larger simulation, we have

```julia> @elapsed rmeans_loop(Uniform(), 50000, nsims = 50000)

[out]: 77.318591953```

so the R code is a little less than three times slower here. (The compilation step doesn’t make a meaningful difference.) So, Julia isn’t hundreds of times faster, but it is noticeably faster than R, which is nice.

But speed in this sort of test isn’t the main factor. I’m really excited about multiple dispatch — it’s one of the few things in R that I really, really liked from a language standpoint. I really like what I’ve read about Julia’s support for parallelism (but need to learn more). And I like metaprogramming, even if I can’t do it myself yet. So Julia’s trying to be a fast, easy to learn, and elegantly designed language. That’s awesome. I want it to work.

ps: and it’s open source! Can’t forget that.

FYI, I’ve started putting economic graphs directly on Twitter instead of this blog: http://twitter.com/grayclhn. The captions turn out to not be that helpful…

# Cajetan, an interactive macroeconomics simulation and game

I might start coding up a proof-of-concept interactive macroeconomics simulation/game. If I do, I’ll post details here. (This post is mostly a placeholder to generate a URL.)

The game should be a pretty simple implementation of the ubiquitous baby-sitting economic model (published by Sweeney and Sweeney, 1977, [pdf] and publicized by Paul Krugman). The internal logic is probably 10 lines of SQL, so it’s just a matter of taking an hour or two to set everything up.

# Ok, Stalwartbucks are kind of cool

This is kind of cool: Joe Weisenthal’s forked Bitcoin:

Along with my friend Guan Yang, today I’m proud to announce a brand new digital currency that everyone can mine or trade.

See, one of the cool things about this new world of digital currencies is that they’re all based — for the most part — on open-source technology. So all you have to do is copy the code of someone else, get some servers running, and voilà, anyone can start creating the currency by having their computer solve math problems.

And, to make sure there’s a market for them:

Once every month, Guan and I will go out to get Korean barbecue — our favorite food to eat together — and invite a third person along who wants to talk about economics and technology (two of our favorite subjects). Of course, to buy your way into this dinner, you’ll have to pay using Stalwartbucks. Thus, unlike Bitcoin, we’re instantly creating a floor in the value of the coins by dint of the fact that they’ll be redeemable for something of real value.

Furthermore, I plan to auction off one post a month, where I will write on the topic of your choice: Again, you’ll have to pay in Stalwartbucks.

And hopefully it’s not just me. Other Twitterers, writers, pundits, and so forth will be encouraged to help create a thriving ecosystem whereby people can redeem their Stalwartbucks.

Maybe I should do something similar for my Principles class…

# US Inflation through November, 2013

US Consumer Price Index through Nov, 2013

Yesterday’s GDP post made me realize that I haven’t posted about CPI for a while. So…

• This figure plots US CPI over time for the indicated horizon: i.e. the bottom line plots CPI from Jan. 1947 through Jan. 1953. Each line contains one recession, and they’re aligned horizontally so that period “0″ is the month the recession ended. The periods overlap, so the second from the bottom line shares observations on 1949m2 to 1953 m1 with the line below it.

• You can enlarge the graph by clicking on it.

• The black lines plot core CPI, which excludes food and energy prices; the gray lines plot raw CPI for reference. Core CPI excludes housing and energy prices and is considered the more relevant series for policy analysis, because food and energy prices are volatile and can be driven by external factors. The recessions themselves are highlighted in light red.

• The picture uses a log scale along the vertical axis. This means that periods of constant inflation look like straight lines (i.e. an inflation rate of 3% per year looks like a straight line with a slope of 0.03). If the slope is steep, inflation was high for that period, and if the slope is curving upwards (as in the 1961–1979 segment), inflation is rising.

Bear in mind that the scale was chosen so that most lines would be about 45 degrees. This makes it easy to see differences in slope but difficult to read the exact slope of any particular segment.

• The time periods are chosen to contain the expansions before and after each recession. Since inflation has been largely positive over this period, the higher segments are later than those below them.

• A few things to notice: the price level sometime rises and sometimes falls during a recession, there’s no dominant tendency. Theory suggests that recessions accompanied by a falling price level are driven by demand shocks (e.g. by changes in investment or consumer spending) and those accompanied by a rising price level are driven by supply shocks (e.g. changes to costs, factors of production, etc.; for example, look at how steep the 1973–1975 recession is; it was associated with an oil crisis).

You can see the intuition by drawing a simple supply and demand curve and looking at the behavior of the equilibrium—the point where the curves intersect—as either curve falls.

• Also notice the big change in the slope beginning in the July 1981 recession. This was a recession that the Federal Reserve started as a matter of policy (to try to lower inflation). It’s probably fair to conclude that that event marks a significant shift in the Fed’s policies.

There’s also some evidence of a change in inflation’s behavior in the 50s: the earliest post-war segments are pretty volatile, but from the mid 50s through the early 80s, inflation grew steadily. Of course, this is why the Fed felt that it had to drive inflation down in the 80s

• Core inflation slowed down after the most recent recession, but then increased to be about the same rate as we’ve had over the last 20 years. Inflation in food and energy prices have been more volatile over that same period.

• The underlying data are available from the St. Louis Fed and the R code used to make the plot is on GitHub.

# Plot of US real GDP from Dec. 20, 2013

Real output and income for the US, through 3rd quarter of 2013 (third release).

Sorry for the delay; the data for these graphs was released on Friday, December 20th but I’m just getting to this now (the 30th). Holidays, end of semester, etc. The usual excuses. But, after trying to do some after-the-fact research for this post, it seems like lots of people ignored the release. So, as a public service, a link to the BEA’s news release. TL;DR Real GDP growth for the third quarter was revised up to an annualized 4.1%, which is very high.

### First, a generic overview of the series

• The figure above shows real output in the US over the post-war business cycles (WWII for those unfamiliar with the jargon). Each period was chosen to contain the expansions immediately before and after each recession, and the segments were aligned horizontally so that you can compare patterns.
• You can enlarge the graph by clicking on it.
• The black lines plot US real GDP and the gray lines plot real GDI (Gross Domestic Income) over time for the indicated horizon: i.e. the bottom lines show 1947 q1 through 1953 q2. Each period contains one recession, and they’re aligned horizontally so that period “0″ is the quarter the recession ended. The periods overlap, so the second from the bottom line shares observations on 1950 q 1 to 1953 q2 with the line below it.Basic economic accounting implies that the two series, GDP and GDI, should be nearly equal, as every dollar spent in the economy is a dollar that someone else earns in income. The two series can be different in reality because they are estimated using different surveys and data sets.
• The picture uses a log scale along the vertical axis. This means that periods of constant growth rate look like straight lines (i.e. a growth rate of 3% per year looks like a straight line with a slope of 0.03). Bear in mind that the scale was chosen so that most lines would be about 45 degrees, so it’s easy to see differences in slope but probably difficult to read off the slope of any segment itself.
• A few things jump out. There hasn’t been a “V-shaped” recession in the US in about 30 years. The dominant pattern lately has been for growth to resume at about the rate it was before the recession. Compare that to any of the earlier recessions, where there’s a period of faster growth once the recession ends.
• Also, and even more important over the long run (but less viscerally noticable at the moment), GDP growth has fallen over the last decade.You can see that, even ignoring the so-called “Great Recession,” the slope of GDP over in the top period is flatter than the previous one. This accumulates, so we’ll be much worse off in 10 years and 20 years than we would be otherwise unless that trend changes.

### Now, some a specific observation about the recent data

• Look at income (the gray line). The 4.1% increase in real GDP follows a pronounced dip, but real income has had a steady growth rate the whole time. I don’t see any reason to think that this high growth rate for the third quarter marks anything new, but we’re probably going to see about the same economic recovery that we’ve had since the recession ended. Especially since the unemployment rate is showing the same thing.
• So, should the Fed stop Quantitative Easing? Gavyn Davies at the FT has a nice blog post, “the long farewell to quantitative easing,” which has the line (paragraph)

The Fed has now declared victory, believing that there has finally been a substantial and self-sustaining improvement in the US labour market. It is hard to see how this could have occurred without QE, but the benefits of the policy are now fading. A significant number of FOMC members, led by Jeremy Stein, started to believe that the reach for yield in US credit may have been allowed to proceed too far.

• The Fed obviously looks at a lot more data than I do, but it’s hard to see that improvement in output or the unemployment rate. One presumes that there are other back-up plans in place in case the improvement isn’t as self-sustaining as hoped. Maybe those plans are better than continuing QE, but I don’t really know. It does seem hard to make the case that more QE is going to lead to a self-sustaining recovery, if there’s not one by now.(No, I have no idea how to answer my own rhetorical question.)

As usual, the data are available through the St. Louis Fed (GDP, GDI) and the R code used for the plots is on GitHub.

# Literate-style argument checking in R

I’ve played around with Literate Programming since early in grad school. Literate Programming was developed by Don Knuth (who also developed TeX and was generally a hugely influential computer scientist) and is grounded in the idea of embedding a computer program inside its documentation, rather than the other way around. A lot of people mistakenly believe that the point of this is to have nicely formatted documentation (there are pretty-printers, etc.) but the big advantage is that LP tools let you arrange your program in logical order, and the tools will reassemble it in the correct order for you. Norman Ramsey’s noweb program is an example of this.

Anyway, Literate Programming hasn’t caught on and I don’t do it anymore. (Tools for Reproducible Research, like Sweave and Knitr, generally don’t allow you to write the code in arbitrary order, so they’re close but don’t count.) For one thing, the tools are too specific to a particular workflow, which makes collaborating difficult. But another reason is that a lot of the benefits are available without using dedicated Literate Programming tools. R packages, for example, let you organize code logically and will get the order “correct” when it’s time to call them.

But one thing I miss is the construction (using R and Noweb’s syntax):

``````myfunction <- function(argument) {
<<````extensive error checking of arguments`>>
# Code that does the analysis goes here
# ...
}

<<`extensive error checking of arguments`>>=
# Make sure that the arguments make sense, and reformat
# them if necessary.
# For example:
argument <- as.data.frame(argument)
@```

and then call

``myfunction(xargument)``

where (if you’re not familiar with LP syntax), the code between `<<extensive error checking of arguments>>=` and `@` will be written into the appropriate part of `myfunction` before the code is executed. In my experience, separating this code visually makes it easier to understand the logic in `myfunction` and encourages me to write more error checking, since it won’t pollute the main function. (I’ve read this in a Knuth interview too, but I can’t find the source right now.)

But R is flexible, and we can mimic that structure by abusing environments. So I’ve written some code to do that. Using that code, we can write

``````myfunction <- function(argument) {
ExtensiveChecking()
# Code that does the analysis goes here
# ...
}

ExtensiveChecking <- raincheck({
# Make sure that the arguments make sense, and reformat
# them if necessary.
# For example:
argument <- as.data.frame(argument)
})
``````

and call

``````myfunction(xargument)
``````

where the line `argument <- as.data.frame(argument)` executes inside `myfunction`. `raincheck` is obviously a cute name to construct these sort of functions.

The code for `raincheck` is surprisingly simple:

``````raincheck <- function(expr) {
e <- substitute(expr)
function(env = parent.frame()) {
eval(e, env)
invisible(TRUE)
}
}
``````

`raincheck` returns a function that is intended to be called inside another function. `expr` is a block of unevaluated R code that will be executed inside that other function. (The line `env = parent.frame()` means that the R code will be executed in the calling environment by default, but that can be overridden by supplying another environment as an argument.)

I’ve made the `raincheck` function into a minimal R package that’s available on GitHub, cleverly titled Raincheck. The package also has a `scold` function that can be used to issue warnings and errors that appear to come from the top function, e.g. if we had written

``````ExtensiveChecking <- raincheck({
# Make sure that the arguments make sense, and reformat
# them if necessary.
# For example:
argument <- as.data.frame(argument)
scold("error message")
})
myfunction(xargument)
``````

we would get

``````Warning message:
In myfunction(xargument) : error message
``````

instead of

``````Warning message:
In eval(expr, envir, enclos) : error message``````

which is what would appear if we had used `warning("error message")`. This makes the messages more informative to end users — `eval(expr, envir, enclos)` could be anywhere. The `scold` function uses information from Hadley’s (2013) book, Advanced R Programming, especially the chapter on exceptions and debugging.

Obviously this was more of an educational exercise than anything else, but I will start using the package and see if it’s useful. Let me know if you have any suggestions.

# US unemployment rate through Nov, 2013

Plot of the US unemployment rate over the most recent recession (red) and recovery (black). The blue lines indicate past recoveries for comparison.

• This figure plots the US unemployment rate over the last recession and the subsequent recovery. The recession is highlighted in light red and its beginning and end is marked with a small circle. The light blue lines plot other recessions for visual reference.
• You can enlarge the graph by clicking on it.
• A few things to notice: unemployment rose a lot during the last recession; it has been falling at about the same rate as in the previous two recessions, but is still very high because its peak was so much higher. Before the last few recessions, there was a substantial recovery period where unemployment fell rapidly before slowing down, but that hasn’t happened since the 1981 recession.
• There have been some short upticks, but otherwise the unemployment rate has fallen steadily since its peak.
• The data are available from the St. Louis Fed and the R code to generate the graph is on GitHub.
• Please comment below with any other insights or questions you [dear reader] might have.

# New plot of US real GDP over the business cycle

These graphs are generated and posted automatically and the captions are written in advance. I will post more specific analysis later.

Graphs of real GDP and GDI

• These graphs are generated and posted automatically along with the following commentary; I’ll post more specific analysis later (probably).
• The figures above show real output in the US over the post-war business cycles (WWII for those unfamiliar with the jargon). Each period was chosen to contain the expansions immediately before and after each recession, and the segments were aligned horizontally so that you can compare patterns. There are additional graphs that emphasize one or the other series below.
• You can enlarge the graphs by clicking on them.
• The black lines plot US real GDP and GDI (Gross Domestic Income) over time for the indicated horizon: i.e. the bottom lines show 1947 q1 through 1953 q2. Each period contains one recession, and they’re aligned horizontally so that period “0″ is the quarter the recession ended. The periods overlap, so the second from the bottom line shares observations on 1950 q 1 to 1953 q2 with the line below it.Basic economic accounting implies that the two series, GDP and GDI, should be nearly equal, as every dollar spent in the economy is a dollar that someone else earns in income. The two series can be different in reality because they are estimated using different surveys and data sets.
• The picture uses a log scale along the vertical axis. This means that periods of constant growth rate look like straight lines (i.e. a growth rate of 3% per year looks like a straight line with a slope of 0.03). Bear in mind that the scale was chosen so that most lines would be about 45 degrees, so it’s easy to see differences in slope but probably difficult to read off the slope of any segment itself.
• A few things jump out. There hasn’t been a “V-shaped” recession in the US in about 30 years. The dominant pattern lately has been for growth to resume at about the rate it was before the recession. Compare that to any of the earlier recessions, where there’s a period of faster growth once the recession ends.
• Also, and even more important over the long run (but less viscerally noticable at the moment), GDP growth has fallen over the last decade.You can see that, even ignoring the so-called “Great Recession,” the slope of GDP over in the top period is flatter than the previous one. This accumulates, so we’ll be much worse off in 10 years and 20 years than we would be otherwise unless that trend changes.
• The data are available through the St. Louis Fed (GDP, GDI) and the R code used for the plots is on GitHub.

# Lecture by Christina Romer, “Monetary Policy in the Post-Crisis World: Lessons Learned and Strategies for the Future”

The text from a very clear and interesting lecture by Christina Romer.  Spoiler, here are the lessons (verbatim from the paper, obviously):

1. Financial crises can be very painful.
2. The zero lower bound on interest rates is a bigger constraint than we thought.
3. Expectations management is essential, but difficult.
4. Monetary policy can and should help ease the pain of deficit reduction.

Next, go to her webpage and read things at random.  There’s a lot of interesting stuff at different levels of mathematical sophistication.

# Interesting report by the CEA: “Economic Activity During the Government Shutdown and Debt Limit Brinksmanship”

The short report is here (14 page pdf).  A summary by Jim Stock starts,

The government shutdown and debt limit brinksmanship have had a substantial negative impact on the economy. A new report released today by the Council of Economic Advisers (CEA) attempts to estimate the actual impact of the shutdown and default brinksmanship on economic activity as measured by eight different daily or weekly economic indicators. Overall it finds that a range of eight economic indicators in what we call a “Weekly Economic Index” are consistent with a 0.25 percentage point reduction in the annualized GDP growth rate in the fourth quarter and a reduction of about 120,000 private sector jobs in the first two weeks of October (estimates use indicators available through October 12th). These estimates very likely understate the full economic effects of the episode because of its effects that continued, and will continue, past October 12th.

# jobs report and unemployment graphs

Unemployment rate across the most recent recession and expansion. The recession is in red, and the light blue lines plot the unemployment rate over past episodes..

• A full set of graphs of the unemployment rate are available in this post.  That post was uploaded automatically after the jobs report was released this morning (pdf of the report here).
• The first two sentences of the report are, “Total nonfarm payroll employment rose by 148,000 in  September, and the unemployment rate was little changed at 7.2 percent, the U.S. Bureau of Labor Statistics reported today.  Employment increased in construction, wholesale trade, and  transportation and warehousing.”  That’s probably a decent summary overall.
• If you look at the graph above, which emphasizes 2001 through September, you’ll see that we’re on essentially the same trajectory as we’ve been in since the recession ended (nationally, at least; I haven’t looked at state-level data).  Bear in mind, when the employment numbers are reported as “disappointing” or (in the past, at least) “surprisingly good,” that’s relative to forecasts, not an absolute statement (i.e., disappointing means “fewer jobs than expected” and not necessarily “very few jobs”).
• In absolute terms, these numbers are bad, but bad in more or less the same way that the recovery’s been bad all along.
• Looking around the FRED database at other measures of joblessness hasn’t really changed my view on that, they all look like slow recoveries (e.g. U6, median duration of unemployment, percent unemployed for 27 weeks or more, etc.).

# Updated plot of US unemployment rate over the business cycle

Update: Just realized that the plots are slightly mislabeled. These graphs contain today’s release, so they’re through “2013 m9.”  The labels have been corrected (~11:30 central).

• These graphs and captions are generated automatically. I’ll post more specific analysis later (update: it’s later).
• These figures plot the US unemployment rate over time for the indicated horizon; each plot contains one recession, and they’re aligned horizontally so that period “0″ is the month the recession ended. The periods overlap.

The recessions themselves are highlighted in light red and their beginning and end are marked with a small circle. The light blue lines plot the other segments, for visual reference.

Click on the graphs to enlarge them.

• The time periods are chosen to contain the expansions before and after each recession. The pictures are organized in chronological order.
• A few things to notice: unemployment rose a lot during the last recession; it has been falling at about the same rate as in the previous two recessions, but is still very high because its peak was so much higher. Before the last few recessions, there was a substantial recovery period where unemployment fell rapidly before slowing down, but that hasn’t happened since the 1981 recession.
• Data are available from the St. Louis Fed and code is on GitHub.

# Playing with more GDP graphs

Just playing with more graphs of GDP. No new data have been released since the last graph, but I added income data for fun. Details below:

• The figures above show real output in the US over the post-war business cycles (WWII for those unfamiliar with the jargon). Each period was chosen to contain the expansions immediately before and after each recession, and the segments were aligned horizontally so that you can compare patterns.

• The black lines plot US real GDP and GDI (Gross Domestic Income) over time for the indicated horizon: i.e. the bottom lines show 1947 q1 through 1953 q2. Each period contains one recession, and they’re aligned horizontally so that period “0″ is the quarter the recession ended. The periods overlap, so the second from the bottom line shares observations on 1950 q 1 to 1953 q2 with the line below it.

Basic economic accounting implies that the two series, GDP and GDI, should be nearly equal, as every dollar spent in the economy is a dollar that someone else earns in income. The two series can be different in reality because they are estimated using different surveys and data sets.

• The picture uses a log scale along the vertical axis. This means that periods of constant growth rate look like straight lines (i.e. a growth rate of 3% per year looks like a straight line with a slope of 0.03). Bear in mind that the scale was chosen so that most lines would be about 45 degrees, so it’s easy to see differences in slope but probably difficult to read off the slope of any segment itself.

• A few things jump out. There hasn’t been a “V-shaped” recession in the US in about 30 years. The dominant pattern lately has been for growth to resume at about the rate it was before the recession. Compare that to any of the earlier recessions, where there’s a period of faster growth once the recession ends.

• Also, and even more important over the long run (but less viscerally noticable at the moment), GDP growth has fallen over the last decade.You can see that, even ignoring the so-called “Great Recession,” the slope of GDP over in the top period is flatter than the previous one. This accumulates, so we’ll be much worse off in 10 years and 20 years than we would be otherwise unless that trend changes.

• The data are available through the St. Louis Fed (GDP, GDI) and the R code used for the plots is on GitHub.

# More on the Econ Nobel Prize

Some links related to yesterday’s nobel prize (in addition to yesterday’s post).

• I’ve been unable to find a good description or review of Hansen’s GMM.  Part of the problem is that it’s material that we typically teach graduate students but not undergrads, so the overviews are mathematically technical.  The reviews that aren’t too technical have focused on the statistical aspects of GMM, not the economic aspects, and so they miss the point a bit.  Just reading Hansen and Singleton’s (1982) application of GMM to rational expectation models should at least give you an idea of how it’s applied and why it’s interesting (JSTOR link, in case the first doesn’t work).

• Neil Irwin has an interview with Shiller on Wonkblog.  A quote:

NI: What do you see as the biggest implications of your conclusions on these market inefficiencies for policy and how policymakers should think?

RS: I have a very idiosyncratic recommendation. I talk about it in my book “Subprime Solution.

People should be encouraged to get professional help with their investing. We should be subsidizing financial advisers. In this country we seem to have come around to the idea that there might be a role for the government in subsidizing medical advice, though that is controversial, too. There might also be a role for subsidizing financial advice.

It’s already tax deductible, but that only helps people with significant incomes. The system is not arranged so that low-income people have any subsidy for financial advice. That should change. I’d like to see more low-income people getting good financial advice.

• Mark Thoma has lots of links, if you want to just read today.

# Econ Nobel Prize

You probably already know that winners of the Nobel prize in economics were announced this morning: Eugene Fama (who, I just learned 30 seconds ago, went to Tufts for undergrad), Lars Hansen (not a Tufts alum), and Robert Shiller (also not from Tufts).  I haven’t met any of them personally, but Shiller gave a seminar at UCSD my first year or two in grad school (so, this would have been in 2004 or 2005 or so), in which he explained how and why there was a housing bubble and some of the audience expressed skepticism.  Fama and Shiller are probably the household names, but Hansen completely deserves this for his work on GMM, and would have been a reasonable co-winner on Sims and Sargent’s earlier Nobel prize.

A summary of their research as a whole is available at the Nobel’s website (it’s a pdf) and is worth reading (but I’ve only started it so far).  And, thanks to John Cochrane’s blog for letting me know about these summaries.

update: Apparently there’s a more accessible summary of their research as well (also pdf).

# Another new plot: US unemployment rate across the business cycle

• These figures plot the US unemployment rate over time for the indicated horizon; each plot contains one recession, and they’re aligned horizontally so that period “0″ is the month the recession ended. The periods overlap.

The recessions themselves are highlighted in light red and their beginning and end are marked with a small circle. The light blue lines plot the other segments, for visual reference.

• The time periods are chosen to contain the expansions before and after each recession. The pictures are organized in chronological order.

• A few things to notice: unemployment rose a lot during the last recession; it has been falling at about the same rate as in the previous two recessions, but is still very high because its peak was so much higher. Before the last few recessions, there was a substantial recovery period where unemployment fell rapidly before slowing down, but that hasn’t happened since the 1981 recession.

# New plot, US CPI over the business cycle

US price levels over the business cycle. Expansions are the black lines and recessions the blue lines. Data are available from the St. Louis Federal Reserve at http://research.stlouisfed.org/fred2/series/CPIAUCSL

• This figure plots US CPI over time for the indicated horizon: i.e. the bottom line plots CPI from Jan. 1947 through June 1953. Each line contains one recession, and they’re aligned horizontally so that period “0″ is the month the recession ended. The periods overlap, so the second from the bottom line shares observations on Nov. 1949 to June 1953 with the line below it.

The recessions themselves are highlighted in light blue and their beginning and end are marked with small circles.

• The picture uses a log scale along the vertical axis. This means that periods of constant inflation look like straight lines (i.e. an inflation rate of 3% per year looks like a straight line with a slope of 0.03). If the slope is steep, inflation was high for that period, and if the slope is curving upwards (as in the 1961–1979 segment), inflation is rising.

Bear in mind that the scale was chosen so that most lines would be about 45 degrees. This makes it easy to see differences in slope but is difficult to read the exact slope of any particular segment.

• The time periods are chosen to contain the expansions before and after each recession. Since inflation has been largely positive over this period, the higher segments are later than those below them.

• A few things to notice: the price level sometime rises and sometimes falls during a recession, there’s no dominant tendency. Theory suggests that recessions accompanied by a falling price level are driven by demand shocks (e.g. by changes in investment or consumer spending) and those accompanied by a rising price level are driven by supply shocks (e.g. changes to costs, factors of production, etc.; for example, look at how steep the 1973–1975 recession is; it was associated with an oil crisis).

You can see the intuition by drawing a simple supply and demand curve and looking at the behavior of the equilibrium—the point where the curves intersect—as either curve falls.

• Also notice the big change in the slope beginning in the July 1981 recession. This was a recession that the Federal Reserve started as a matter of policy (to try to lower inflation). It’s probably fair to conclude that that event marks a significant shift in the Fed’s policies.

There’s also some evidence of a change in inflation’s behavior in the 50s: the earliest post-war segments are pretty volatile, but from the mid 50s through the early 80s, inflation grew steadily. Of course, this steady growth is why the Fed felt that it had to drive inflation down in the 80s.

• The underlying data are available from the St. Louis Fed and the R code used to make the plot is on GitHub.

# James Surowiecki: Government Shuts Down; Now the Dangerous Debt Ceiling Fight

The U.S. markets had been closed for several hours when Congress, at midnight, let the government shut down, but, even so, they already reflected how things were going in Washington. Stocks were down, continuing a slow-motion slide that’s seen the S. & P. 500 drop on eight of the past nine days. It’s hardly been a momentous decline so far—the S. & P. has fallen about two and a half per cent from its all-time high, and is still up for the month—but it seems clear that markets are getting a little queasy about the shutdown.

Even if the shutdown is resolved, though, investors have a bigger concern on their minds: namely, the possibility that Republicans might actually refuse to raise the nation’s debt ceiling in a couple of weeks. The ceiling is the legal limit on the amount of money that the government is allowed to borrow, and raising it is necessary not just to keep the government running in the future but to allow it to pay for obligations it’s already incurred. As Justin Wolfers and Betsey Stevenson convincingly showed last year, the 2011 imbroglio over the debt ceiling put a significant dent in both business and consumer confidence, held back hiring, and further weakened the recovery. It also sent the stock market tumbling—even though a debt-ceiling deal was eventually reached, the Dow fell almost fourteen per cent in less than a month during the crisis, in part because it made people realize that a U.S. default was no longer unthinkable. (It also led to the first downgrade of the U.S.’s credit rating in history.) So it’s hardly surprising that the standoff in Washington is spooking—if not yet terrifying—investors. Markets dislike uncertainty, and what the Republican hard-liners in the House of Representatives have done, most significantly, is to make the future look uncertain by suggesting that, if they do not get the concessions they want (above all, the repeal of Obamacare) they are willing to let the U.S. default…

The rest of Surowiecki’s post is available here.

# Ng and Wright: Facts and Challenges from the Great Recession for Forecasting and Macroeconomic Modeling

The abstract:

This paper provides a survey of business cycle facts, updated to take account of recent data. Emphasis is given to the Great Recession which was unlike most other post-war recessions in the US in being driven by deleveraging and financial market factors. We document how recessions with financial market origins are different from those driven by supply or monetary policy shocks. This helps explain why economic models and predictors that work well at some times do poorly at other times. We discuss challenges for forecasters and empirical researchers in light of the updated business cycle facts.

The paper is available through IDEAS or Serena’s web page.  It’s interesting, if very table heavy.

# Plot of real GDP over the business cycle

• The black lines plot US Real GDP over time for the indicated horizon: i.e. the bottom line plots GDP from 1947 q1 through 1953 q2. Each line contains one recession, and they’re aligned horizontally so that period “0″ is the quarter the recession ended. The periods overlap, so the second from the bottom line shares observations on 1950 q 1 to 1953 q2 with the line below it.
• The light blue lines just repeat the black lines, but recentered vertically for visual reference—each black line is represented by one of the blue lines behind the 1947–1953 plot, for example, so we can see that the recovery after that recession was faster than the others.
• The time periods are chosen to contain the expansions before and after each recession plotted.

# Statistics and R resources

I’m working with an undergraduate student who, quite reasonably, asked for some recommendations for books to learn more about R and statistical analysis. Naturally, I stalled because I didn’t have a great answer off the top of my head. Now that I’ve looked into it some more, I still don’t have a great answer, but I might have an acceptable one.

## Statistics & Graphics

This is probably not a standard list, but… I’ve learned an immense amount from:

• Cleveland’s books, especially Visualizing Data
• Manski’s Identification Problems in the Social Sciences
• Tufte’s four books
• The Visual Display of Quantitative Information
• Envisioning Information
• Visual Explanations
• Beautiful Evidence
• Howard Wainer’s books are excellent, especially Picturing the Uncertain World
• Frank Harrell’s Regression Modeling Strategies is very good.

At first, I really disliked Stephen Few’s Show me the numbers: Designing Tables and Graphs to Enlighten and viewed it as Tufte-lite. I still kind of think that, but some people might need to start with Tufte-lite, so there’s probably a role for the book. Plus, his blog is enjoyably opinionated (but, damn, the only “sharing” options on each post are del.icio.us and Digg This. Makes me feel old.)

## Using R

I really don’t know; I’d been programming for a long time before I started using R, so I’ve never had a “complete beginner” perspective. These look decent, though, and are either free or cheap:

Free:

Cheap:

• The R Graphics Cookbook
• The R Cookbook
• R in Action

I should note that the Iowa State Library claims to have online access for all three of them. The “cookbooks” really have a “cookbook” approach: you look up what you want to do and it gives you the steps in R.

## Both

• Data analysis and graphics using R: an example-based approach looks interesting and Cambridge University Press publishes some pretty spectacular and reasonably priced textbooks, so it may be worth checking out.
• I’ve heard good things about Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models , and I like Gelman’s blog, so it might be worth looking at more.

# More on numerical instability in Incanter

I wrote an earlier post on Incanter that drew a response from an Incanter user (who wants to stay anonymous, or I’d just quote the email). In short, the email pointed out that, if you know how Clojure handles dependencies and libraries, it’s not hard to verify that Incanter’s solve uses LAPACK’s DGESV from JBLAS to invert the $X'X$ matrix using an LU factorization, which is the exact same algorithm as R’s solve, so my suspicion there was misplaced. Great!

Obviously my first reaction to the email was astonishment that anyone’s read my blog. But I think my original point still stands. Looking for a variable named `xtxi` used to estimate OLS is a quick and dirty way to evaluate a statistics package, because inverting the $X'X$ matrix is numerically unsound compared to other methods of estimating OLS—R, for example, does not use solve for OLS, it uses the QR decomposition.

Here’s some R code where the difference matters (I don’t know Clojure, but this uses the same algorithms). This isn’t quite linear regression, it’s a comparison of different methods for constructing projection matrices, $P=X(X'X)^{-1}X'$ (so it’s basically identical to linear regression). Here are three different methods:

``````projection.LU1 <- function(x) x %*% solve(crossprod(x)) %*% t(x)

projection.LU2 <- function(x) crossprod(t(x), solve(crossprod(x), t(x)))

projection.QR <- function(x) {
QR <- qr(x)
tcrossprod(qr.Q(QR)[, QR\$pivot[seq_len(QR\$rank)], drop = FALSE])
}
``````

The first inverts $X'X$ using the same algorithm in Incanter; the second uses a slightly better version but is basically the same, and the third uses the QR decomposition, just like R.

From the mathematical definition, we can see that $P=PP$, a property called idempotence, which is an easy property to verify numerically. Here’s a set of 51 observations for 11 regressors (each column is z raised to the pth power for p=0,1,2,…,10 and z between zero and one).

``````X <- outer(seq(0, 1, 0.02), 0:10, "^")
``````

And now we can “verify” idempotence (up to numerical tolerance)

``````> all.equal(projection.LU1(X), projection.LU1(X) %*% projection.LU1(X))
[1] "Mean relative difference: 0.0002737938"

> all.equal(projection.LU2(X), projection.LU2(X) %*% projection.LU2(X))
[1] "Mean relative difference: 0.0001990939"

> all.equal(projection.QR(X), projection.QR(X) %*% projection.QR(X))
[1] TRUE
``````

All of the code is available for download here: https://gist.github.com/grayclhn/5717763

You can see (and verify it for yourself by downloading the code) that the first two methods of calculating $P$, which invert $X'X$ using the LU factorization just like Incanter, are not idempotent. The third method, which uses the QR decomposition just like R, is idempotent. So in this example, the QR decomposition works and the LU factorization doesn’t.

This example is obviously contrived, but it’s not isolated. Chapter 11 of Seber and Lee’s (2003) Linear Regression Analysis shows the same thing: that if the regressors are “badly” distributed, the QR decomposition is more reliable. (In the interest of full disclosure, I should admit, embarrassing as it is, that chapter 11 of Seber and Lee, along with the paper “What every computer scientist should know about floating point arithmetic,” is all I know about these issues, so I’m not claiming a lot of expertise).

As one last point, let me preempt anyone who might respond, “yes, these issues matter in those particular examples, but that will never come up in real research.” Try to guess why I look to see how a stats package calculates the linear regression coefficients, and why I have this particular criterion that I care about instead of any other, and why….

It’s obvious, right? This is something I’ve personally screwed up before. An early version of my job-market paper had fantastic empirical results that turned out to be entirely an artifact of using $(X'X)^{-1}$ to calculate the F-test statistic instead of using the QR decomposition and the “projection.QR” function in the code example is copied directly from that project (a later version). I was lucky and paranoid enough to catch it before circulating the paper but the event definitely left an emotional impression.

# It seems pretty lame to miss out on PRISM, so…

Links:

Ironically, going to all of those newspapers exposes you to social-media “sharing” buttons that send your IP address to google, facebook, twitter, yahoo, etc.  (although Twitter lets you opt out of this tracking in your settings).

A few quick thoughts,

• Unless you are a wealthy government (so, probably none of this site’s readers) I think you have to assume that if the NSA or another equivalent agency wants to know what you in particular are doing online, it already knows, and can do so lawfully, under pretty much any sensible legal regime.  Once you’re deemed “of interest” it strikes me as absurd that a court would make itself a significant hurdle.
• That’s much different than routine data collection on essentially everyone who uses the internet.  My biggest concern isn’t that the information would be useful for catching terrorists, of course it would be useful for that.  My biggest concern now is that data mining algorithms are pretty badly behaved.  Even assuming for argument’s sake that the NSA has exceptional data analysts (they probably do), I suspect that there are plenty of people who also get to access parts of PRISM’s output that have absolutely no idea how to interpret it. I also suspect that this has and will lead to false positives.
• Even “reasonable” people have completely legitimate reasons to be concerned about privacy; it seems like one prosecution strategy is to collect private information that would be ineligible in court, and leak it (the easiest example I could think of is from Dominique Strauss-Kahn’s case; he’s so unsympathetic a figure in the US that I hesitated to use him as an example, but that’s kind of the point: we know a lot of details about his life that have no bearing on his case). If you’re concerned about privacy, you should probably take further steps to be private on the internet, but that feeds into the problems in the previous bullet.  You probably won’t pull off being completely private; instead you’ll just look secretive.
• That said, you can view using Tor, Ghostery, OTR., as political statements; also removing social-media sharing buttons from websites.  I would count avoiding Chrome, Safari, and Internet Explorer in that category (leaving… Firefox, and possibly Opera); avoiding Android and the iPhone (leaving… what exactly, I’m not sure, but Firefox OS is looking better than it did earlier this week) too.  It’s probably unrealistic to think that it’s more than a political statement though.
• One downside of the Patriot Act/NCTC/other post-911 intelligence agency consolidation is that the PRISM information almost certainly will spill over to other agencies, local police and the FBI.  Whatever organization handling all of this information (presumably the NSA is the natural one to do it) really needs to be quarantined from the rest; e.g. this article from the NYer (with links to others).
• These are just preliminary thoughts; I expect to have better ones later.

A surprisingly stupid interactive graphic from the Economist,

http://www.economist.com/node/21578919

on when China’s production will overtake the US’s

Everything we know about economic growth implies that China’s growth in productivity (per capita output) will slow down as it gets closer to the US (this is the [conditional] “convergence” hypothesis).

It might be an interesting exercise to plot population growth and productivity growth separately and combine them into the sort of plot they display here under different scenarios… but that’s not what they did.

# http daringfireball net 2013 05 google versus I’m…

http://daringfireball.net/2013/05/google_versus

I’m surprised Gruber left off Google Docs, which seems like it was conceived/acquired only to choke off MS Office

# Moving this blog, but with nuance

Update on 6/5
Okay, so that didn’t last very long. I’ve discovered that I still want some form of a link-blog, and twitter isn’t really long enough and my homepage is too professional. So I’m reopening this blog.

But, with a catch: I’ve tried (and let me know if I’ve failed) to set it up so that anyone with a wordpress account can post here. So let’s try to make it a shared link blog and discussion site for people interested in Econometrics.

Original:
I’m moving this blog to my main website: http://gray.clhn.co/blog
Maintaining two different domains was annoying. I’ve moved some of the posts over and deleted the copy here (if I need to update or correct a post, I’ll almost certainly forget to do it in both places), but I’ll keep notes for my classes, etc., up here for a little while.

# Thoughts on economics grad school

A friend emailed me to ask about grad school; I thought I might as well share my thoughts here too. First, I should make it clear that I basically never recommend getting a PhD to anyone. Grad school pays almost nothing, is very stressful, damages most people’s relationships, exacerbates any latent depression or anxiety issues you might have, and the only payoff is that you’ll get to learn about one or two things in intense, intense detail for years. So I emphasize the negatives, reasoning that if you can hear all the downsides and still think, “screw it, sounds like fun”… then going to grad school might be a good idea. But if I can scare you away, then you never should have tried to get a PhD in the first place.

That said, here are some more thoughts, in no particular order…

• If you’re going to get a PhD in the social sciences, or anything close to the social sciences, it should be in economics. You can do pretty much any of the other disciplines, but you’ll get paid considerably more and there are more academic positions, private sector and government jobs, etc available. Doing economics in a business school may be even better.
• I’d only suggest getting a PhD if you want a job teaching and/or doing research as a professor or at a research institute or think-tank. I don’t think that there are that many other jobs where the training is going to help. I’m sure there are jobs where having a PhD would be a nice addition for a candidate, but probably not so much that it’s worth 5-6 years of full time school. Of course, you might know of particular positions that I don’t.
• If you actually, you know, have a job, career, life, etc., that you’d be putting on hold for grad school (which is the case for the friend I mentioned above, but wasn’t for me when I applied) you should probably be pretty selective about where you’d go; so top 10 or so only. Basically, you want a setup where, even if grad school goes badly and the job market is terrible when you graduate, you can have a job “better” than what you have now on graduation. That’s probably true if you go to Harvard, MIT, Stanford, etc., but not necessarily if you go to, say, UCSD. Obviously, you want to know who you plan to work with, but people change their minds once they get to grad school all the time, so being at a great school will give you options there too.
• If you’re coming straight out of college, or if you have a bad job with poor career prospects (I was working as a file clerk in the post-bubble Bay Area two years after I graduated from college), you can take a bigger risk and go to a top 30 school that matches your presumed research interests. UCSD turned out to be a great fit for me, mostly because I was right when I expected to do econometrics.
• The first year of grad school (in Econ) will be really tough and unpleasant if you’ve been out of school for a little while. There’s a lot of math and it may have been a little while since you sat down and worked through proofs, etc. Expect that the core sequence of classes will be tough.

Update on 6/2 I should’ve mentioned this post by Noah Smith, who’s much more positive than I am.

# Git workflow and links

I’m setting up a real. honest-to-goodness public opensource project, which is an opportunity to learn more about the software that I kind of take for granted. I use Git for version control, you should use it too, and store a lot of files on GitHub. Apparently people really like “no fast-forward” merges and I’ve never really understood it so I looked into it some more and decided that I’m quite happy with rebasing heavily (it seems like you can get the apparent benefits of no-ff just by tagging a lot, so I’m still a little mystified by its attraction).

This post is really an excuse for me to have a place to put these links, obviously.

# More news

More links to the news.  It’s just like classes were in session!  As always, I’m intentionally avoiding overtly political news: the IRS scandal, etc.

# News links

The semester is just about over; I still need to grade exams, etc., but last week was the last week of lectures.  And so I don’t feel the same sense of urgency to look for interesting news articles for my students.  But I’ll try to keep it going because it’s been fun and kept me in touch with what actually has happened in the world.

# NYT NFL draft charts and things

A post from visualscoreboard.com:

This is an “old” (by news terms at least, April 25… what does it say about the world that I feel like I need to apologize for writing about a two-week old article?) interactive graphic from the NYT about the NFL draft; historically, in which have the best players been drafted? They also have a write-up of the thought process behind the chart on Kevin Quealy’s chartsnthings blog; read it!

Anyway, just a few thoughts on the chart, which I like a lot. I know they need to make things visually interesting, but I’m not a fan of color coding each round (in the print version) or shading the first round (in the online version). That information’s already in the ordering, but having it color coded subtly encourages viewers to emphasize comparisons within groups and deemphasize comparisons across groups. I’d prefer just to add some whitespace between the groups. Also, the lines are higher than they need to be in the online version (the line height is uniform and adds no information). Minor issue, but my laptop screen can’t display the whole chart this way, and I assume smartphones wouldn’t be able to either.

# May 1st lecture prep

### Class today

We’ll start by talking about fixed exchange rates vs floating exchange rates.  Canada and the US have had floating exchange rates since 1971, while China pegs its currency to the US.

Canada / US exchange rate

China / US exchange rate

If there’s time, we’re going to talk about Purchasing Power Parity; the easiest way to start to understand it is to play with The Economist’s interactive “Big Mac Index” (it’s exactly what it sounds like).

# Identification in Macro

A student made the following comment about the conclusion of my Reinhart and Rogoff post: “Looking at the data in order to determine your identification strategy seems pretty suspicious to me.”

So I have a few thoughts on this. First, I’m sympathetic to the general idea; choosing a model after you’ve seen the data is a good way to fit your model to noise and spurious artifacts of the dataset, and well-run experiments try to minimize this sort of contamination (through double-blind, etc). But that’s not really an option in Macro because there’s only one dataset (even if you view the individual countries as different datasets, they’re highly interdependent: there was only one Great Depression; there’s been only one global financial crisis; and so on).

One consequence is that formal statistical tests, the kind that I devote an enormous amount of time studying and teaching, have very little influence on Macro theory. You can see this throughout the development of the RBC-style microfoundation literature, for example. Macro develops in phases: there are relatively calm periods where people tinker with and build on whatever models are established and in fashion; and there are periods of crisis or near-crisis that highlight the shortcomings of those models: the Great Depression; the Great Inflation; Japan’s lost decade; the global financial crisis; etc. A lot of existing models get thrown out when the crisis reveals that they’re missing important aspects of the economy (or are just fragile) and then new models get proposed and developed following the crisis. So you should view almost all of Macro as exploratory analysis and the behavior during a new crisis as potentially confirmatory analysis. It’s unfortunate that most journals require a veneer of statistical inference before they’ll publish this exploratory analysis, but that doesn’t change it’s fundamental nature.

It would be interesting to try to formalize this approach as an actual experimental strategy, but I have almost no idea how you’d do it.

And remember, it’s usually not the established researchers that are going to develop this new theory, it’s the current grad students, future grad students, and a few professors at various stages in their careers. So this process doesn’t require any established economist to change his or her mind (thank god).

# Notes for April 26th Macro lecture

### Today’s lecture

• We’re going to start covering Chapter 19, open-economy Macroeconomics
• This covers trade and foreign investment
• Next week, we’ll see how the exchange rate comes into play and how exchange rate policy affects monetary policy
• Chapters 9 and 19 are the only new chapters we’ll cover since the last midterm; we won’t get to Chapter 17, so it’s not going to be on the final exam.

# Notes before today’s Macro lecture

We’re going to finish chapter 9 in class today and talk about policies that can encourage growth.

# Some thoughts on the Reinhart and Rogoff debate

I’ve linked to some of the debate over Reinhart and Rogoff’s suddenly suspect results on debt and economic growth but haven’t said much other than that and was happy to leave it that way. But… I’m teaching a PhD time series class this semester and we just spent about a week on identification in SVARs (structural vector autoregressions) and then a student asked me about this follow-up “time series” analysis by Deepankar Basu that tries to get at causality (i.e. whether high debt causes low growth or low growth causes high debt) and there’s also this statistical analysis by Arindrajit Dube and… damn it, I probably need to actually have a professional opinion on this whole mess now (check out Felix Salmon’s summary of Dube’s results and Justin Wolfers’s of Basu’s too).

Here’s a really short summary of Basu’s results (since that’s what my student asked about). He looks at the annual growth rate of real GDP and the annual debt/GDP ratio and tries to forecast them with past values of both variables. He finds that the growth rate of GDP seems to have predictive power for the debt/GDP ratio and that the debt/GDP ratio doesn’t have statistically significant predictability for GDP growth. Taken at face value, this would be moderately convincing. It’s a bland truism that “correlation isn’t causation,” but sequential timing can help and unless you believe that the growth rate of GDP moves down in anticipation of high future values of the debt/GDP ratio then this suggests that the low GDP growth is causing the rise in the debt/GDP ratio. It’s not hard to tell stories where that sort of anticipation happens, though, since Macro and financial variables are often forward looking: if households save in anticipation of a high debt/GDP ratio, that would cause aggregate demand to fall, causing lower GDP growth. Note that that’s more or less the story that pro-Austerity politicians and pundits have been telling (essentially Paul Krugman’s confidence fairy), and it’s completely consistent with Basu’s model and statistical results.

That’s probably worth repeating: since investors and other economic actors act try to anticipate the future state of the economy, events are as good as caused by future events all the time.

So, for that reason alone, you shouldn’t take Basu’s result at face value. There are other reasons too: the debt/GDP ratio is highly persistent and has extreme starting points (I’ll have pictures later in the post) either of these can cause problems for these test statistics (this issue is discussed in Elliott and Stock’s 1994 paper and Cavanagh, Elliott, and Stock’s 1996 paper). The same persistence issue raises statistical problems with the rest of the analysis too. There are other more conceptual problems, so you can basically ignore the Impulse Response Functions (IRFs); the idea behind presenting IRFs is to show the effect of an economic shock, but as conducted here, it doesn’t tell you any more than the tests of predictability. (It’s hard to give an accessible explanation for that. but here’s where “correlation is not causation” is somewhat helpful. The data can only tell us about correlation, and you need to have extra knowledge about the system, maybe that the data come from a controlled experiment, to infer causation from that correlation. Basu estimates correlations from the data, then tries to get the data to identify the causal structure too without making any other explicit assumptions. This task is literally impossible).

The same issues are present to a lesser extent in Dube’s analysis, but I think his main analysis (his Figure 2) is less affected by the persistence issues; the timing issues are still there, though. If you wanted to, you could probably reconcile his Figure 2 with a confidence fairy argument, meaning that it doesn’t establish causality either.

So, what would I do? Well, remember:

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. (John Tukey)

We have annual data on economic growth and debt for 20 countries; without a lot of more nuanced data and information, we’re never going to have a bulletproof analysis, so don’t hold that up as the goal. Accept that with this data set, you’re not going to disprove the confidence fairy (ironically, if you want to understand the debt/growth relationship and how it should affect policy, you’d probably want to do a deep qualitative analysis of different periods of debt, as exemplified by… Reinhart and Rogoff’s This time is different).

A first step, and I’m going to argue that for my purposes (writing a blog post) this is a sufficient step, is to look at the data. I downloaded the dataset and Herndon, Ash, and Polin’s code here, plus see their readme.txt, and generated some very basic plots. The first gallery plots the GDP growth rate over time for each country, but using line color to show the years in which debt was higher than 90% of GDP (those years are red).

And, look. For countries where the 90% threshold is exceeded, it happens at the very beginning of the sample (i.e. WWII deescalation and rebuilding) or the towards the end. For some countries (Italy and Japan for example) there’s a clear downward trend over the last 50 years; so of course if the high debt is at the end of the sample, it’s going to be correlated with lower growth. Literally nothing in these pictures makes me especially concerned about debt over 90% of GDP (obviously I’ve played around with other thresholds too and found similar results). The R code used to generate these plots is straightforward and is available here.

Remember, each plot shows annual GDP growth for the listed country, with red indicating years where the debt/GDP ratio is greater than 90%.

Now we can flip the roles of growth and debt. The next gallery plots the debt/GDP ratio for each country and uses red to indicate years where GDP growth was below 1%. The figures are below; the red line indicates the low growth periods. Unlike before, the low growth periods are scattered through the series. We also see results that are at least suggestive: for many countries (Denmark, Canada, Belgium, the US, Sweden, and others), low growth in the early 80s was followed by an increase in the debt/GDP ratio. Same thing with Sweden, Finland, and Japan in the 90s. But, again, this doesn’t disprove the confidence fairy. The R code for these plots is here as well.

But we actually can learn something new from these plots. Notice that GDP growth moves in broadly the same direction across different countries. You can see that there’s some systematic comovement in the GDP plots, and you can also see that the red lines are pretty clustered in the debt ratio graph. And, this is the key, you see clustering at the same point in time, but not at the same level of debt. If the lower level of GDP growth anticipated a higher level of debt, we’d see more red lines before the higher debt levels. Instead, we see that the red lines happen before an increase in the debt level, but it doesn’t matter whether it’s an increase to a high level of debt or to a low level of debt.

So, those are the two key things that jump out of the graphs, particularly the “All countries” panel.

• Low growth periods happen at roughly the same time in different countries, suggesting that there’s a common element that’s at least partially responsible. The debt/GDP ratio has common patterns across countries, but at very long horizons, so it seems unlikely to be that common element.
• The low growth periods happen before an increase in the debt/GDP ratio, but it doesn’t appear to matter whether it’s an increase to a low or high level of debt/GDP. Confidence fairy stories seem like they’d imply that low growth should happen before a change to a high level of debt/GDP and not be as likely before a change to a low level of debt/GDP, which we don’t see at all in the data.

It might be possible to formalize either of those observations into an academically rigorous identification strategy; the second bullet especially lines up with statistical tools for empirical macro, although you’d need to actually write down a model that pins down the change vs. level distinction. Right now, this is just somewhat informed speculation. Of course, since there’s been a lot of structural change in the last 70 years, if we really want to understand our policy options, it’s probably best to look in detail at the last 20 years or so and draw conclusions from that. The aggregate statistical evidence is probably best as supporting, not primary, evidence.

Please let me know when you find errors; other comments and suggestions would be great too.

Update: Further comments on identification in general here.

# Brief notes before April 17th macro lecture

## Next lecture

We’re going to continue to talk about Economic growth (obviously); i.e. very general causes of growth and “growth accounting.”

## Other interesting stuff

So, the big news in Economics blogging is that a fairly influential study by Carmen Reinhart and Ken Rogoff was alleged in a new paper by Herndon, Ash and Pollin to have a lot of errors, stuff like using the wrong part of key Excel files (I originally wrote “found” instead of “alleged,” but honestly I haven’t worked through the errors myself, and I don’t care enough about this issue to take the time to do it properly.  Plus it’s been less than a day, so the new paper could have errors too).

• The new critical working paper that discusses the errors is here: Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogo ff
• Reinhart and Rogoff’s response is available through Business Insider (I’m not sure why, but I can’t find a “more original” copy of their response).  I feel obliged to point out that it’s very unusual for R&R to claim that their upcoming Journal of Economic Perspectives paper will vindicate their American Economic Review paper (which is the one alleged to have errors); the AER is almost certainly the most prestigious journal in economics, and the JEP is meant mainly to disseminate research to a wide audience and not so much for original findings.  So the AER paper should be the best and most correct version of the project.  (This was so unusual, in fact, that I verified that the “AER” publication was really a Papers and Proceedings article… and no one still reading cares about that distinction so I’ll stop there.)  (the JEP paper is available here)
• Some comments by Tyler Cowen and Paul Krugman (and another one after their response).
• I’ve read a lot of posts on this and there’s a clear political angle in the people attacking the paper: R&R’s paper had been interpreted as giving evidence that high debt was historically very bad and was used as justification for austerity policies; so people who opposed the austerity policies are happy to see this paper go down.  I’ve tried to find some conservative econ bloggers to balance out these links, but haven’t been able to find anyone discussing the paper.  Regardless, it’s seemed all along to me that this paper and R&R’s research program wasn’t going to tell us very much about the effects of high debt now, even without any errors; they’re looking at historical aggregates, but this seems like an issue where specifics matter a lot (and I know that it’s easy for me to make that claim now, don’t worry).

# MyIDEAS: your personal space on IDEAS

Gray:

I just discovered this service and am pretty excited about it. We’ll see how well it works over time, but I’ve wanted to follow individual authors in RePEc for a while (I even wrote a script a few years ago to scrape pages and print out diffs, but never maintained it).

Originally posted on The RePEc Blog:

We are proud to introduce an important new feature to the IDEAS website. MyIDEAS is a personal space for the IDEAS user where she can save the papers and articles found on the site and organize them into folders. Think of it like navigating an online store and selecting items for purchase. The difference is that your “cart” is a list of references that you can sort at will into categories you can name.

In addition, MyIDEAS allows you to follow additions to JEL codes, series and journals, as well as what authors may have added to their RePEc profiles.

To use this new service, authentication is necessary, which happens like for others services through an account in the RePEc Author Service, Once cleared, the user finds on all relevant IDEAS pages the option to “add” or “follow” the displayed person or item. This works thanks to a cookie whose sole purpose is to identify the user as he navigates the site. It does, however, not track the user. Contents of the MyIDEAS accounts remain entirely private.

View original

# Clojure, Incanter, and xpxi

## Updates

6/5: I’ve been told by an Incanter user that this post is too pessimistic. I’ll look into his specific critiques and write another post soon.

later: the new post is up.

## Original post

I’ve been casually interested in Lispy languages for a while; i.e. I’d like to learn one and am not going to let the fact that I only know bits of e-lisp after 15 years of using Emacs deter me. Clojure seems hot and I really like the talks I’ve seen by Rich Hickey, its creator. Simple Made Easy is especially good. Plus, Clojure even has a well-regarded statistics library, Incanter, so awesome.

Anyway, procrastinating tonight, I decided to check out Incanter’s source code on Github. I have a really simple method for evaluating open source statistics packages: find the linear regression function, and look for variables named xpxi or xtxi and, if they exist, basically avoid the package (for some reason, these variable names are ubiquitous.). Inverting the $X'X$ matrix is a pretty bad idea–it is a numerically unstable way of calculating the regression coefficients that (in problems I’ve worked on) sometimes leads to a non-idempotent projection matrix $X(X'X)^{-1}X'$ (or, using less terminology, $X'X(X'X)^{-1}$ may not equal the identity matrix). Needless to say this results in pretty bad estimates of the OLS coefficients. Douglas Bates talks about performance issues in this R-news article too, but I’m much more concerned about numeric instability. I don’t necessarily have the most informed opinion about the best way to get the OLS estimates, but I’ve gotten good results from the QR decomposition.

As of today, you can probably guess, Incanter fails this test. The source code and documentation are pretty unconcerned with the actual implementation of OLS, and I can’t figure out exactly what algorithm “solve uses” (I’m unpersuaded by the claim that it is “equivalent to R’s solve function” and can’t really track it down any further than that part of the code).

These details are important! I mean, I appreciate the effort and the good intentions that goes into developing open source packages like this. But if you’re developing statistical software for other people to use, you really need to understand the numeric properties of the routines you’re writing and you need to transparently communicate that understanding to other people who might use your code. So I guess I’ll stick with R for a while longer.