I've been reading through a metrics text and thinking about traditional significance testing. One of the problems is that you can data mine your way to significance regardless of whether or not your actual theory explains what's happening. Other tests compare two different estimators to determine if one does a better job of modeling the data than another.
This got me thinking. How about starting with a theory, creating a model based on that theory and use your data to estimate the parameters of that function. Next, set that aside and throw raw statistics at your data set to find the best estimator (or a few best estimators, based on multiple criteria) to model that data, feel free to fold, spindle and mutilate as much as your heart pleases, use parametric and nonparametric estimators, whatever functional forms you can think of, whatever, just get a good R^2 and high p-values.
For example, you are trying to estimate a firm's production function with a large chunk of data on inputs, factor prices, costs, output, etc. Theory might lead you to believe the firm has a Cobb-Douglas production function of the form y = A*K^(p)*L^(1-p), and you use your data to estimate a, K, L, and P. Next, you pull out all the stops and try as many models as possible to use the data available to estimate y most accurately, regardless of the functional form or transformations necessary to get "good" results. This second model becomes your null hypothesis against which you test your Cobb-Douglas estimate on a new data set.
Finally, the real test is to take your theory-based model and your statistics-based models, and apply them to a new data set (preferable with at least some observations out of sample compared to what you created the original models with.) Accept your theory based model if it does a better job predicting the dependent variable than the purely statistical models, reject if it doesn't, and think carefully about it if you get a mixed bag of results. This would seem to cut down on type I errors, but would it give you more type II errors? I'm thinking it would act something like a traditional significance test, except we replace the null hypothesis of 0 with a null hypothesis of a data mined model, and I would think that the result would be somewhat more informative.
Thursday, October 20, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment