Iasonas Lamprianou | 21 Mar 2012 09:59
Picon
Favicon

Re: regerssion issues

Dear all,
I have a question which can be expanded to the geeneral context of regression 
modelling in general. If you feel that this question is beyond the scope of this list, please say so and I will
apologize. However, this has to do with teaching. 

Question 1:  I am revieweing a paper and the author uses a sample size of around 50,000 cases to run a
logistic regression. He is using 22 independent 
variables. Using too many independent variables may cause collinearity 
problems. Beyond this, however, I am not aware of any other problems caused by 
using too many variables in a model. However, this is also related to the problem of massively throwing tens
of variables in amodel and then waiting for statistically significant results. Can anyone suggest
relevant literature to give to my students to read?

Question 2: Some coefficients of a diffrent logistic model in the same paper are marginally significant
e.g. b=-0.18 and se=0.08. The only reason this is signficant is because the researcher used in this model a
large sample size (around two thousand cases N=2000). The lower bound of the confidence interval is
almost zero. Can anyone suggest a good reference to say that in such a case we should also check the
"practical significance" and since the lower bound is so close to zero, we should be careful on what we
claim about the effect?

Thank you for your time
Jason  

 
Dr. Iasonas Lamprianou
Department of Social and Political Sciences
University of Cyprus

>
	[[alternative HTML version deleted]]
(Continue reading)

Donald Pianto | 21 Mar 2012 15:50
Picon

Re: regerssion issues

Dear Jason,

In relation to question 1, I believe that what is critical is the final use
of the regression. If one is making causal claims, then it is very
important to understand the causal structure since conditioning on
inappropriate variables can lead to nonsense results. If the use of the
regression is purely descriptive then collinearity may be the only problem,
but one must be careful not to make causal interpretations. The book
"Causality: Models, Reasoning and Inference" by Judea Pearl discusses the
causal question and is full of references.

In relation to question 2, I've seen mention of this question in many
books. In Wooldridge's Introductory Econometrics book he draws the
distinction between statistical and economic significance, however I don't
know if he cites any particular paper on the subject.

Good teaching,
Donald Pianto
Department of Statistics
University of Brasília

On Wed, Mar 21, 2012 at 5:59 AM, Iasonas Lamprianou <lamprianou <at> yahoo.com>wrote:

> Dear all,
> I have a question which can be expanded to the geeneral context of
> regression
> modelling in general. If you feel that this question is beyond the scope
> of this list, please say so and I will apologize. However, this has to do
> with teaching.
>
(Continue reading)

Iasonas Lamprianou | 22 Mar 2012 07:53
Picon
Favicon

Re: regerssion issues

Thanks, I will have a look. Judea Pearl's book seems to be famous!

 
Dr. Iasonas Lamprianou
Department of Social and Political Sciences
University of Cyprus

>________________________________
> From: Donald Pianto <dpianto <at> gmail.com>
>To: Iasonas Lamprianou <lamprianou <at> yahoo.com> 
>Cc: "r-sig-teaching <at> r-project.org" <r-sig-teaching <at> r-project.org> 
>Sent: Wednesday, 21 March 2012, 16:50
>Subject: Re: [R-sig-teaching] regerssion issues
> 
>
>Dear Jason,
>
>
>In relation to question 1, I believe that what is critical is the final use of the regression. If one is
making causal claims, then it is very important to understand the causal structure since conditioning on
inappropriate variables can lead to nonsense results. If the use of the regression is purely descriptive
then collinearity may be the only problem, but one must be careful not to make causal interpretations. The
book "Causality: Models, Reasoning and Inference" by Judea Pearl discusses the causal question and is
full of references.
>
>
>In relation to question 2, I've seen mention of this question in many books. In Wooldridge's Introductory
Econometrics book he draws the distinction between statistical and economic significance, however I
don't know if he cites any particular paper on the subject.
>
(Continue reading)

Jeff Laux | 21 Mar 2012 16:03
Picon

Re: regerssion issues

Q1:  Using many independent variables does not *cause* 
multicollinearity.  These are simply different issues.  It would lead to 
the problem of multiple comparisons, however, with 50k cases, it's hard 
to see that as a likely issue here.  Given the number of cases, this 
must be observational research with archival data.  That can lead to a 
number of issues if the researcher is trying to make causal claims.  
Some good background books for addressing such issues would be:
     - Shadish, Cook & Campbell, "Experimental & Quasi-experimental Designs"
     - Rosenbaum, "Observational Studies"
     - Pearl, "Causality"
     - Rothman, et al., "Modern Epidemiology"

Q2:  It is quite reasonable to imagine that variables are often, in 
reality, related to each other in some way, and all significance tests 
do is assess the power of your study.  With 2k cases, I would suspect 
that the estimate of the association may be fairly accurate (although 
this could depend of various factors regarding how the data were 
handled).  The question of practical significance is an important one.  
An easy first step is to look at Kirk's famous paper, Kirk, "Practical 
significance; An idea whose time has come", a google scholar search 
could tell you about subsequent papers that cited that, and give you a 
start through the related literature.

Best,
Jeff

On 3/21/2012 4:59 AM, Iasonas Lamprianou wrote:
> Dear all,
> I have a question which can be expanded to the geeneral context of regression
> modelling in general. If you feel that this question is beyond the scope of this list, please say so and I
(Continue reading)


Gmane