Attila Balogh | 22 Oct 13:03 2013
Picon

GradientBoostingRegressor with LogisticRegression

Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
Peter Prettenhofer | 22 Oct 13:16 2013
Picon

Re: GradientBoostingRegressor with LogisticRegression

Hi Attila, 

please use the following adaptor::

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)
    def fit(self, X, y):
        self.est.fit(X, y)

The one in the stackoverflow question returns an array of shape (n_samples,) but it should rather be (n_samples, n_classes).

PS: I still need to fix the init issue but any solution will most likely make the GBRT slower at prediction time (especially for single instance prediction).

best, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
Attila Balogh | 22 Oct 14:08 2013
Picon

Re: GradientBoostingRegressor with LogisticRegression

Hi Peter,

thanks for your answer. I have tried this before also, and the problem is that in this case I get
ValueError: operands could not be broadcast together with shapes (74) (148), because the y array is raveled and it has shape (74,2).

Do you need a self containing testcase which reproduces this error?

Cheers;
Attila


On Tue, Oct 22, 2013 at 1:16 PM, Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Attila, 

please use the following adaptor::

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)
    def fit(self, X, y):
        self.est.fit(X, y)

The one in the stackoverflow question returns an array of shape (n_samples,) but it should rather be (n_samples, n_classes).

PS: I still need to fix the init issue but any solution will most likely make the GBRT slower at prediction time (especially for single instance prediction).

best, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
Peter Prettenhofer | 22 Oct 14:18 2013
Picon

Re: GradientBoostingRegressor with LogisticRegression

Right, I thought you were using the multi-class loss function.

Please send me a testcase so that I can investigate the issue.

thanks, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi Peter,

thanks for your answer. I have tried this before also, and the problem is that in this case I get
ValueError: operands could not be broadcast together with shapes (74) (148), because the y array is raveled and it has shape (74,2).

Do you need a self containing testcase which reproduces this error?

Cheers;
Attila


On Tue, Oct 22, 2013 at 1:16 PM, Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Attila, 

please use the following adaptor::

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)
    def fit(self, X, y):
        self.est.fit(X, y)

The one in the stackoverflow question returns an array of shape (n_samples,) but it should rather be (n_samples, n_classes).

PS: I still need to fix the init issue but any solution will most likely make the GBRT slower at prediction time (especially for single instance prediction).

best, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
Peter Prettenhofer | 22 Oct 15:11 2013
Picon

Re: GradientBoostingRegressor with LogisticRegression

Ok, below is the adaptor that will work. The code requires that the output of predict is 2d.

Thanks for the test-case.

best, 
 Peter

class Adaptor(object):

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)[:, np.newaxis]
    def fit(self, X, y):
        self.est.fit(X, y)



2013/10/22 Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Right, I thought you were using the multi-class loss function.

Please send me a testcase so that I can investigate the issue.

thanks, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi Peter,

thanks for your answer. I have tried this before also, and the problem is that in this case I get
ValueError: operands could not be broadcast together with shapes (74) (148), because the y array is raveled and it has shape (74,2).

Do you need a self containing testcase which reproduces this error?

Cheers;
Attila


On Tue, Oct 22, 2013 at 1:16 PM, Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Attila, 

please use the following adaptor::

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)
    def fit(self, X, y):
        self.est.fit(X, y)

The one in the stackoverflow question returns an array of shape (n_samples,) but it should rather be (n_samples, n_classes).

PS: I still need to fix the init issue but any solution will most likely make the GBRT slower at prediction time (especially for single instance prediction).

best, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer



--
Peter Prettenhofer
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
Attila Balogh | 22 Oct 15:16 2013
Picon

Re: GradientBoostingRegressor with LogisticRegression

Hm, maybe I'm doing something wrong but I'm still getting the error:
ValueError: operands could not be broadcast together with shapes (3) (6)

I am using 0.14.1.

Full stacktrace:

Traceback (most recent call last):
  File "GB_problem.py", line 46, in <module>
    main()
  File "GB_problem.py", line 43, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 466, in _fit_stage
    residual = loss.negative_gradient(y, y_pred, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 349, in negative_gradient
    return y - 1.0 / (1.0 + np.exp(-pred.ravel()))
ValueError: operands could not be broadcast together with shapes (3) (6)


On Tue, Oct 22, 2013 at 3:11 PM, Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Ok, below is the adaptor that will work. The code requires that the output of predict is 2d.

Thanks for the test-case.

best, 
 Peter

class Adaptor(object):

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)[:, np.newaxis]
    def fit(self, X, y):
        self.est.fit(X, y)



2013/10/22 Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Right, I thought you were using the multi-class loss function.

Please send me a testcase so that I can investigate the issue.

thanks, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi Peter,

thanks for your answer. I have tried this before also, and the problem is that in this case I get
ValueError: operands could not be broadcast together with shapes (74) (148), because the y array is raveled and it has shape (74,2).

Do you need a self containing testcase which reproduces this error?

Cheers;
Attila


On Tue, Oct 22, 2013 at 1:16 PM, Peter Prettenhofer <peter.prettenhofer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Attila, 

please use the following adaptor::

    def __init__(self, est):
        self.est = est
    def predict(self, X):
        return self.est.predict_proba(X)
    def fit(self, X, y):
        self.est.fit(X, y)

The one in the stackoverflow question returns an array of shape (n_samples,) but it should rather be (n_samples, n_classes).

PS: I still need to fix the init issue but any solution will most likely make the GBRT slower at prediction time (especially for single instance prediction).

best, 
 Peter


2013/10/22 Attila Balogh <attila.balogh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Hi all,

first of all thanks for all the developers for working on scikit-learn, it is a wonderful library.
I am struggling for a while now with the following problem:
Trying to use GBR with LR as a BaseEstimator, and I'm getting the following error:

 File "main.py", line 110, in main
    score = np.mean(cross_validation.cross_val_score(rd, X, y, cv=4, scoring='roc_auc'))
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 517, in __call__
    self.dispatch(function, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 312, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 136, in __init__
    self.results = func(*args, **kwargs)
  File "C:\Python27\lib\site-packages\sklearn\cross_validation.py", line 1060, in _cross_val_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 890, in fit
    return super(GradientBoostingClassifier, self).fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 613, in fit
    random_state)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 486, in _fit_stage
    sample_mask, self.learning_rate, k=k)
  File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 172, in update_terminal_regions
    y_pred[:, k])
IndexError: too many indices

I have found a similar problem on stackoverflow (http://stackoverflow.com/questions/17454139/gradientboostingclassifier-with-a-baseestimator-in-scikit-learn) and tried to implement the adaptor but it didn't help, the error remained the same.

Does anyone have any ideas how to resolve this?

Cheers;
Attila

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
Peter Prettenhofer



--
Peter Prettenhofer

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk

Gmane