ouyeyu panyu | 26 Jul 00:59 2012
Picon

how weka predicts in logistic regression?

Hi Mark and Michae,

I'm trying to do logistic regression via WEKA.

First I trained the model against trainData.
    val cf_train: Logistic = new Logistic();
    cf_train.buildClassifier(trainData);

WEKA generated a model as below.
Variable                                
==========================================
price                                    0
sqft                                0.0003
nod                                -2.8426
ltv                                 -0.6966
age                               0.0014
chd                                 0.0001
asl                                    -0.0024
Intercept                           4.4403


Then I do prediction against testData.
    for (i <- 0 to (testData.size()-1))
    {
      val predictedValue: Array[Double] = cf_train.distributionForInstance(testData.get(i));
    }
For a specific record, its predictedValue = 0.07563123767893659.

I want to verify if this value is correct, so I did some manual calculation by substituting the trained coefficients into the following formula
manual_value = intercept + price*testPrice + sqft*testSqft + nod*testNod + ltv*testLtv + age*testAge + chd*testChd + asl*testAsl
                       = 4.4403    + 0                      + 1693*0.0003 + 0                  -0.6966*2    + 0.0014*4      + 0.0001*1947 -0.0024*0.0961
                       = 3.75506936

So my manual_value is totally different from weka's predictedValue.
Do you know why this happens?
Thanks in advance.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
ouyeyu panyu | 27 Jul 02:18 2012
Picon

Re: how weka predicts in logistic regression?

Hi there,

Do you know how weka predicts in logistic regression?
The common formula to calculate prediction probability is as below.

However, by putting weka model coefficients and test data into this formula, I can NOT get same probability as Weka yields.

An example is as below.
model_coeffs                                                test_data
=============================================
Intercept -4.440248          
price     0.00000271591                                   474700
sqft       -0.0003140287                                    1693
ltv_new  0.6966173                                          2
age_cat  -0.001434963                                    4
current_hold_days -0.0001069998                  1947
appr_since_last_sqr 0.002447026                   0.0961

BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 + 0.6966173*2 - -0.001434963*4 - -0.0001069998*1947 + 0.002447026*0.0961 = -3.779602634944

probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223

However, the probabilityOfTestData that weka's distributionForInstance method yielded is 0.07563123767893659.

Do you know why the two values are so different?
This issue is important to my work, any idea would be appreciated.
Thanks in advance.


2012/7/25 ouyeyu panyu <ouyeyu <at> gmail.com>
Hi Mark and Michael,

I'm trying to do logistic regression via WEKA.

First I trained the model against trainData.
    val cf_train: Logistic = new Logistic();
    cf_train.buildClassifier(trainData);

WEKA generated a model as below.
Variable                                
==========================================
price                                    0
sqft                                0.0003
nod                                -2.8426
ltv                                 -0.6966
age                               0.0014
chd                                 0.0001
asl                                    -0.0024
Intercept                           4.4403


Then I do prediction against testData.
    for (i <- 0 to (testData.size()-1))
    {
      val predictedValue: Array[Double] = cf_train.distributionForInstance(testData.get(i));
    }
For a specific record, its predictedValue = 0.07563123767893659.

I want to verify if this value is correct, so I did some manual calculation by substituting the trained coefficients into the following formula
manual_value = intercept + price*testPrice + sqft*testSqft + nod*testNod + ltv*testLtv + age*testAge + chd*testChd + asl*testAsl
                       = 4.4403    + 0                      + 1693*0.0003 + 0                  -0.6966*2    + 0.0014*4      + 0.0001*1947 -0.0024*0.0961
                       = 3.75506936

So my manual_value is totally different from weka's predictedValue.
Do you know why this happens?
Thanks in advance.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
ouyeyu panyu | 27 Jul 19:46 2012
Picon

Re: how weka predicts in logistic regression?

Anyone can help me on this issue?


2012/7/26 ouyeyu panyu <ouyeyu <at> gmail.com>
Hi there,

Do you know how weka predicts in logistic regression?
The common formula to calculate prediction probability is as below.
However, by putting weka model coefficients and test data into this formula, I can NOT get same probability as Weka yields.

An example is as below.
model_coeffs                                                test_data
=============================================
Intercept -4.440248          
price     0.00000271591                                   474700
sqft       -0.0003140287                                    1693
ltv_new  0.6966173                                          2
age_cat  -0.001434963                                    4
current_hold_days -0.0001069998                  1947
appr_since_last_sqr 0.002447026                   0.0961

BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 + 0.6966173*2 - -0.001434963*4 - -0.0001069998*1947 + 0.002447026*0.0961 = -3.779602634944

probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223

However, the probabilityOfTestData that weka's distributionForInstance method yielded is 0.07563123767893659.

Do you know why the two values are so different?
This issue is important to my work, any idea would be appreciated.
Thanks in advance.


2012/7/25 ouyeyu panyu <ouyeyu <at> gmail.com>
Hi Mark and Michael,


I'm trying to do logistic regression via WEKA.

First I trained the model against trainData.
    val cf_train: Logistic = new Logistic();
    cf_train.buildClassifier(trainData);

WEKA generated a model as below.
Variable                                
==========================================
price                                    0
sqft                                0.0003
nod                                -2.8426
ltv                                 -0.6966
age                               0.0014
chd                                 0.0001
asl                                    -0.0024
Intercept                           4.4403


Then I do prediction against testData.
    for (i <- 0 to (testData.size()-1))
    {
      val predictedValue: Array[Double] = cf_train.distributionForInstance(testData.get(i));
    }
For a specific record, its predictedValue = 0.07563123767893659.

I want to verify if this value is correct, so I did some manual calculation by substituting the trained coefficients into the following formula
manual_value = intercept + price*testPrice + sqft*testSqft + nod*testNod + ltv*testLtv + age*testAge + chd*testChd + asl*testAsl
                       = 4.4403    + 0                      + 1693*0.0003 + 0                  -0.6966*2    + 0.0014*4      + 0.0001*1947 -0.0024*0.0961
                       = 3.75506936

So my manual_value is totally different from weka's predictedValue.
Do you know why this happens?
Thanks in advance.



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Christian Schulz | 31 Jul 09:32 2012
Picon

Re: Re: how weka predicts in logistic regression?

Hi ,

the coefficent's might be for the reference '0' class ?

 > -4.440248 + (0.00000271591 *474700)  + ( - 0.0003140287*1693 ) + ( 
0.6966173*2 ) + (  -0.001434963*4 ) + ( -0.0001069998*1947)  + ( 
0.002447026*0.0961)
[1] -2.503255

p=1/(1+exp(-2.5035))= 0.9243868
1-p = 0.0756132

HTH
Christian

> Anyone can help me on this issue?
>
>
> 2012/7/26 ouyeyu panyu <ouyeyu <at> gmail.com <mailto:ouyeyu <at> gmail.com>>
>
>     Hi there,
>
>     Do you know how weka predicts in logistic regression?
>     The common formula to calculate prediction probability is as below.
>
>         \mathbb{E}[Y_i|\mathbf{X}_i] = p_i =
>         \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i)
>         = \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}
>
>     However, by putting weka model coefficients and test data into
>     this formula, I can NOT get same probability as Weka yields.
>
>     An example is as below.
>     *model_coeffs test_data*
>     =============================================
>     Intercept -4.440248
>     price     0.00000271591 474700
>     sqft       -0.0003140287 1693
>     ltv_new  0.6966173                                          2
>     age_cat  -0.001434963                                    4
>     current_hold_days -0.0001069998                  1947
>     appr_since_last_sqr 0.002447026                   0.0961
>
>     BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 +
>     0.6966173*2 - -0.001434963*4 - -0.0001069998*1947 +
>     0.002447026*0.0961 = -3.779602634944
>
>     probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223
>
>     However, the probabilityOfTestData that weka's
>     distributionForInstance method yielded is 0.07563123767893659.
>
>     Do you know why the two values are so different?
>     This issue is important to my work, any idea would be appreciated.
>     Thanks in advance.
>
>
>     2012/7/25 ouyeyu panyu <ouyeyu <at> gmail.com <mailto:ouyeyu <at> gmail.com>>
>
>         /Hi Mark and Michael,/
>
>
>         I'm trying to do logistic regression via WEKA.
>
>         First I trained the model against trainData.
>             val cf_train: Logistic = new Logistic();
>             cf_train.buildClassifier(trainData);
>
>         WEKA generated a model as below.
>         Variable
>         ==========================================
>         price                                    0
>         sqft                                0.0003
>         nod                                -2.8426
>         ltv                                 -0.6966
>         age                               0.0014
>         chd                                 0.0001
>         asl                                    -0.0024
>         Intercept                           4.4403
>
>
>         Then I do prediction against testData.
>             for (i <- 0 to (testData.size()-1))
>             {
>               val predictedValue: Array[Double] =
>         cf_train.distributionForInstance(testData.get(i));
>             }
>         For a specific record, its predictedValue = 0.07563123767893659.
>
>         I want to verify if this value is correct, so I did some
>         manual calculation by substituting the trained coefficients
>         into the following formula
>         manual_value = intercept + price*testPrice + sqft*testSqft +
>         nod*testNod + ltv*testLtv + age*testAge + chd*testChd +
>         asl*testAsl
>                                = 4.4403    + 0                      +
>         1693*0.0003 + 0                  -0.6966*2    + 0.0014*4     
>         + 0.0001*1947 -0.0024*0.0961
>                                = 3.75506936
>
>         So my manual_value is totally different from weka's
>         predictedValue.
>         Do you know why this happens?
>         Thanks in advance.
>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

ouyeyu panyu | 1 Aug 01:15 2012
Picon

Re: Re: how weka predicts in logistic regression?

Hi Christian,

Thank you for the info, it helps.

Btw, a 2nd question is how to get model coefficients with better precision?
For example, for the independent variable "price", R outputs the coeff as 2.715910e-06, while Weka rounds the coeff to 0.
However, when a price is big enough, 2.715910e-06*price will be much greater than 0.

Now I view Weka's trained logistic models by println(trainedModel.toString()).
Is there a way in Weka to get model coefficients with same precisions as R's?




2012/7/31 Christian Schulz <chschulz <at> email.de>
Hi ,

the coefficent's might be for the reference '0' class ?


> -4.440248 + (0.00000271591 *474700)  + ( - 0.0003140287*1693 ) + ( 0.6966173*2 ) + (  -0.001434963*4 ) + ( -0.0001069998*1947)  + ( 0.002447026*0.0961)
[1] -2.503255

p=1/(1+exp(-2.5035))= 0.9243868
1-p = 0.0756132

HTH
Christian





Anyone can help me on this issue?


2012/7/26 ouyeyu panyu <ouyeyu <at> gmail.com <mailto:ouyeyu <at> gmail.com>>


    Hi there,

    Do you know how weka predicts in logistic regression?
    The common formula to calculate prediction probability is as below.

        \mathbb{E}[Y_i|\mathbf{X}_i] = p_i =
        \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i)
        = \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}

    However, by putting weka model coefficients and test data into
    this formula, I can NOT get same probability as Weka yields.

    An example is as below.
    *model_coeffs test_data*

    =============================================
    Intercept -4.440248
    price     0.00000271591 474700
    sqft       -0.0003140287 1693
    ltv_new  0.6966173                                          2
    age_cat  -0.001434963                                    4
    current_hold_days -0.0001069998                  1947
    appr_since_last_sqr 0.002447026                   0.0961

    BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 +
    0.6966173*2 - -0.001434963*4 - -0.0001069998*1947 +
    0.002447026*0.0961 = -3.779602634944

    probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223

    However, the probabilityOfTestData that weka's
    distributionForInstance method yielded is 0.07563123767893659.

    Do you know why the two values are so different?
    This issue is important to my work, any idea would be appreciated.
    Thanks in advance.


    2012/7/25 ouyeyu panyu <ouyeyu <at> gmail.com <mailto:ouyeyu <at> gmail.com>>

        /Hi Mark and Michael,/



        I'm trying to do logistic regression via WEKA.

        First I trained the model against trainData.
            val cf_train: Logistic = new Logistic();
            cf_train.buildClassifier(trainData);

        WEKA generated a model as below.
        Variable
        ==========================================
        price                                    0
        sqft                                0.0003
        nod                                -2.8426
        ltv                                 -0.6966
        age                               0.0014
        chd                                 0.0001
        asl                                    -0.0024
        Intercept                           4.4403


        Then I do prediction against testData.
            for (i <- 0 to (testData.size()-1))
            {
              val predictedValue: Array[Double] =
        cf_train.distributionForInstance(testData.get(i));
            }
        For a specific record, its predictedValue = 0.07563123767893659.

        I want to verify if this value is correct, so I did some
        manual calculation by substituting the trained coefficients
        into the following formula
        manual_value = intercept + price*testPrice + sqft*testSqft +
        nod*testNod + ltv*testLtv + age*testAge + chd*testChd +
        asl*testAsl
                               = 4.4403    + 0                      +
        1693*0.0003 + 0                  -0.6966*2    + 0.0014*4             + 0.0001*1947 -0.0024*0.0961
                               = 3.75506936

        So my manual_value is totally different from weka's
        predictedValue.
        Do you know why this happens?
        Thanks in advance.





_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 7 Aug 11:11 2012

Re: Re: how weka predicts in logistic regression?

On 1/08/12 11:15 AM, ouyeyu panyu wrote:
> Hi Christian,
>
> Thank you for the info, it helps.
>
> Btw, a 2nd question is how to get model coefficients with better precision?
> For example, for the independent variable "price", R outputs the coeff
> as 2.715910e-06, while Weka rounds the coeff to 0.
> However, when a price is big enough, 2.715910e-06*price will be much
> greater than 0.
>
> Now I view Weka's trained logistic models by
> println(trainedModel.toString()).
> Is there a way in Weka to get model coefficients with same precisions as
> R's?

Not at present (without changing the code). It's on the to-do list to 
make precision for numeric output user configurable via a system 
property or something.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane