Luis Villa | 12 Nov 20:02 2012

objective criteria for license evaluation

On Mon, Nov 12, 2012 at 9:03 AM, Tzeng, Nigel H. > If you really want
to present a neutral "Popular/Widely Used/Strong" list
> then use one developed from actual metrics of what is widely used.  The
> OSRC one is the one I typically refer to.  You'd want to remove the
> non-OSI approved licenses off that list.
>
> http://osrc.blackducksoftware.com/data/licenses/
>
> If you don't like that data set then pick another comprehensive data set
> to base the widely used list on.

As previously suggested, the issue of replacing the existing "by
categories" groupings is an extremely fraught one and one that I think
will require an extremely long-term discussion to address.

That said, there is clearly interest in having objective criteria of
some sort, either to replace or to supplement the current criteria. I
thank Nigel for not just saying that, but actually providing a
specific example :)

So, here's another call for discussion:

What objective, factual criteria would you use to supplement or
replace the current categories?

Ideally, suggestions for criteria would include either:

1) a reliable third-party data source (like the blackduck survey Nigel
pointed to) or

(Continue reading)

John Cowan | 13 Nov 07:25 2012

Re: objective criteria for license evaluation

Luis Villa scripsit:

> What objective, factual criteria would you use to supplement or
> replace the current categories?
> 
> Ideally, suggestions for criteria would include either:
> 
> 1) a reliable third-party data source (like the blackduck survey Nigel
> pointed to)

Well, let's examine the current top 14 licenses in Blackduck's list, and
compare it with the OSI "popular, widely used, or with strong communities"
category.  Let's further combine the two versions of GPL into one, and
likewise with LGPL, as well as the 2-clause and 3-clause BSD licenses.
If we then merge the ordered Blackduck top-14 list with the unordered
OSI category and put them in Blackduck order, we get:

GPL, Apache, MIT, BSD, Artistic*, LGPL, EPL, CPOL**, MS-PL*, MPL, CDDL.

[*] Not in the OSI category.

[*] Not OSI certified at all; somewhat similar to the Apache license.

So it's basically a distinction without a difference.

--

-- 
"The serene chaos that is Courage, and the phenomenon   cowan <at> ccil.org
of Unopened Consciousness have been known to the        John Cowan
Great World eons longer than Extaboulism."
"Why is that?" the woman inquired.
(Continue reading)

Tzeng, Nigel H. | 13 Nov 17:18 2012

Re: objective criteria for license evaluation

Unless you do open source using Perl or C#.  Two widely used languages
with strong communities backing them.

Since it is a distinction without a difference in your opinion then may we
assume that you should have absolutely no problems with adopting such a
metrics driven list?

On 11/13/12 1:25 AM, "John Cowan" <cowan <at> mercury.ccil.org> wrote:

>Luis Villa scripsit:
>
>> What objective, factual criteria would you use to supplement or
>> replace the current categories?
>> 
>> Ideally, suggestions for criteria would include either:
>> 
>> 1) a reliable third-party data source (like the blackduck survey Nigel
>> pointed to)
>
>Well, let's examine the current top 14 licenses in Blackduck's list, and
>compare it with the OSI "popular, widely used, or with strong communities"
>category.  Let's further combine the two versions of GPL into one, and
>likewise with LGPL, as well as the 2-clause and 3-clause BSD licenses.
>If we then merge the ordered Blackduck top-14 list with the unordered
>OSI category and put them in Blackduck order, we get:
>
>GPL, Apache, MIT, BSD, Artistic*, LGPL, EPL, CPOL**, MS-PL*, MPL, CDDL.
>
>[*] Not in the OSI category.
>
(Continue reading)

John Cowan | 13 Nov 17:46 2012

Re: objective criteria for license evaluation

Tzeng, Nigel H. scripsit:

> Unless you do open source using Perl or C#.  Two widely used languages
> with strong communities backing them.

AFAIK most Perl work is done using a GPL/Artistic disjunction.
I know there is a lot of C# in the world as a whole; how heavily is
it used for open-source work, and how much of that is under the MS-PL?
(These questions are not rhetorical.)

> Since it is a distinction without a difference in your opinion then
> may we assume that you should have absolutely no problems with adopting
> such a metrics driven list?

Personally I would have no problem with it, excluding of course any
licenses that are not OSI certified.

The problem of course is when to stop.  I would be content to chop off
all licenses with less than 5% market share at Blackduck, which would
give a short and sweet list:  GPL (43%), Apache (13%), MIT (11%), LGPL
(9%), BSD (7%), Artistic (6%).  I think all further concerns would be
satisfied by a strong recommendation that if you are working within
a particular community, to use the standard license of that community
whatever it is.  To meet the objection that some of these licenses are
legacy, it would be interesting to see a crosstab of number of projects
started in a given year vs. their licenses, assuming that relicensing
events are rare enough to ignore.

(Note: I got the ordering wrong in my last post through failing to add
LGPL 2.1 and LGPL 3.0 numbers.)
(Continue reading)

Tzeng, Nigel H. | 13 Nov 18:48 2012

Re: objective criteria for license evaluation

Top 10 seems reasonable. If you collapse GPL into 1 category and LGPL into
one then that leaves:

GPL 44%
Apache 13%
MIT 11%
LGPL 9%
BSD 7%
Artistic 6%
EPL 2%
MS-PL 1%
MPL 1%
CDDL <1%

You could do top 9 with a 1% threshold but 10 is a rounder number and
there still are some significant projects/technologies using CDDL
(NetBeans, etc).  To not include MPL in the list would also strike me as
odd given the significance of some of the projects under MPL and the
(IMHO) importance of that license to Open Source in general.

As for C# these days I only loosely follow MonoMac and MonoTouch (mostly
from a lack of desire to learn ObjC) so I'm not the right one to ask.

That MS-PL is used more often than MPL surprises me.  I would not have
guessed that.

On 11/13/12 11:46 AM, "John Cowan" <cowan <at> mercury.ccil.org> wrote:

>Tzeng, Nigel H. scripsit:
>
(Continue reading)

Luis Villa | 13 Nov 18:58 2012

Re: objective criteria for license evaluation

At least some of these slightly unusual/surprising results are
probably a result of methodology; e.g., many CPOL "projects" are just
a code snippet, and most are just a file or two, so treating each of
them the same as other projects probably doesn't reflect the real
scope of license usage there.

As a general matter, I'd be hesitant to rely on any one source for
popularity numbers without a fair amount of transparency around the
data gathering methodology.

On Tue, Nov 13, 2012 at 9:48 AM, Tzeng, Nigel H. <Nigel.Tzeng <at> jhuapl.edu> wrote:
> Top 10 seems reasonable. If you collapse GPL into 1 category and LGPL into
> one then that leaves:
>
> GPL 44%
> Apache 13%
> MIT 11%
> LGPL 9%
> BSD 7%
> Artistic 6%
> EPL 2%
> MS-PL 1%
> MPL 1%
> CDDL <1%
>
> You could do top 9 with a 1% threshold but 10 is a rounder number and
> there still are some significant projects/technologies using CDDL
> (NetBeans, etc).  To not include MPL in the list would also strike me as
> odd given the significance of some of the projects under MPL and the
> (IMHO) importance of that license to Open Source in general.
(Continue reading)

Tzeng, Nigel H. | 13 Nov 19:52 2012

Re: objective criteria for license evaluation

I am unaware of another comprehensive dataset but I also haven't seriously
gone looking for one either.  The raw data is available and there is an
API to access it.

Ideally the OSI would collect such metrics but I understand the financial
limitations that the OSI is operating under.

On the other hand things like Ohloh and the OSRC metrics really should
have been OSI initiatives and perhaps with the affiliates and individual
memberships you can now afford to do things like this.

This is off-topic but I would be more inclined to join as an individual
member if there was more clarity into what concrete benefits the OSI will
provide (beyond advocacy) and what membership governance really means. Do
members get to vote for board members?

I can easily give the OSI $40 a year but I don't care to pay for just
advocacy.  Although I do wonder why membership costs $5 more to join the
OSI than the ACLU or NRA with actual DC lobbies and lawyers...the AARP is
a bargain at $16.

You guys should seriously consider an associate/student membership at a
$15-$20 rate.

On 11/13/12 12:58 PM, "Luis Villa" <luis <at> tieguy.org> wrote:

>At least some of these slightly unusual/surprising results are
>probably a result of methodology; e.g., many CPOL "projects" are just
>a code snippet, and most are just a file or two, so treating each of
>them the same as other projects probably doesn't reflect the real
(Continue reading)

Luis Villa | 5 Dec 00:28 2012

Re: objective criteria for license evaluation

Sorry about the long lag in response here; I thought some followup had
gone to this list but apparently not.

On Tue, Nov 13, 2012 at 10:52 AM, Tzeng, Nigel H.
<Nigel.Tzeng <at> jhuapl.edu> wrote:
> I am unaware of another comprehensive dataset but I also haven't seriously
> gone looking for one either.  The raw data is available and there is an
> API to access it.

Sure! Obviously this is better than nothing; I'm just hoping others
can point to other resources we could blend/combine with this one.
(And of course we can ask Ohloh about providing more context so that
we can more reliably use the existing numbers.)

> Ideally the OSI would collect such metrics but I understand the financial
> limitations that the OSI is operating under.
>
> On the other hand things like Ohloh and the OSRC metrics really should
> have been OSI initiatives and perhaps with the affiliates and individual
> memberships you can now afford to do things like this.

It is certainly the kind of thing that I'd love to see us push for
with more resources. And it isn't just a resource thing; I think in
general OSI is ready to be more proactive about this kind of thing
than we have been in years past - we see more things, like this one,
as potential topics for action. I definitely hope we're not just
limited to advocacy.

Anyone else have other suggestions for objective criteria we could
use? I know some folks here have been thinking about this issue for
(Continue reading)

Karl Fogel | 5 Dec 16:23 2012

Re: objective criteria for license evaluation

Luis Villa <luis <at> tieguy.org> writes:
>Anyone else have other suggestions for objective criteria we could
>use? I know some folks here have been thinking about this issue for
>some time.

Number of "forks" of software under a given license on GitHub, adjusted
for license popularity across GitHub?  (And the equivalent calculation
for other sites, where possible.)
_______________________________________________
License-discuss mailing list
License-discuss <at> opensource.org
http://projects.opensource.org/cgi-bin/mailman/listinfo/license-discuss

Matthew Flaschen | 5 Dec 22:36 2012
Picon

Re: objective criteria for license evaluation

On 12/05/2012 10:23 AM, Karl Fogel wrote:
> Luis Villa <luis <at> tieguy.org> writes:
>> Anyone else have other suggestions for objective criteria we could
>> use? I know some folks here have been thinking about this issue for
>> some time.
> 
> Number of "forks" of software under a given license on GitHub, adjusted
> for license popularity across GitHub?  (And the equivalent calculation
> for other sites, where possible.)

That could be misleading, depending on what we want to measure.  There
are a lot of forks doing real work (either true forks, or those that do
ongoing pull requests to keep synced).

However, there are also people that fork and make one or two changes, or
none at all.  There's nothing wrong with that, it just might not be a
meaningful metric for this purpose.

Matt Flaschen
_______________________________________________
License-discuss mailing list
License-discuss <at> opensource.org
http://projects.opensource.org/cgi-bin/mailman/listinfo/license-discuss

Karl Fogel | 7 Dec 00:35 2012

Re: objective criteria for license evaluation

Matthew Flaschen <matthew.flaschen <at> gatech.edu> writes:
>On 12/05/2012 10:23 AM, Karl Fogel wrote:
>> Luis Villa <luis <at> tieguy.org> writes:
>>> Anyone else have other suggestions for objective criteria we could
>>> use? I know some folks here have been thinking about this issue for
>>> some time.
>> 
>> Number of "forks" of software under a given license on GitHub, adjusted
>> for license popularity across GitHub?  (And the equivalent calculation
>> for other sites, where possible.)
>
>That could be misleading, depending on what we want to measure.  There
>are a lot of forks doing real work (either true forks, or those that do
>ongoing pull requests to keep synced).
>
>However, there are also people that fork and make one or two changes, or
>none at all.  There's nothing wrong with that, it just might not be a
>meaningful metric for this purpose.

Of course.  I meant that as a direction to look in, not as a literal
suggestion of methodology.  By number of forks at GitHub, I meant "look
at the forks, using some kind of intelligent criteria, statistical
methods, etc".

This is non-trivial work, of course.  Which is why it is so hard to get
good stats on license popularity and why the notion is rife with
fundamental definitional questions.
_______________________________________________
License-discuss mailing list
License-discuss <at> opensource.org
(Continue reading)

Luis Villa | 9 Dec 19:46 2012

Re: objective criteria for license evaluation

I'm a little surprised at how quiet this thread has been, especially
since I know some members of this list have been calling for objective
criteria for a while.

So let me restate the question to broaden it a bit. If you had a
*blue-sky dream* what subjective information would you look at?

For example, if you had the resources to scan huge numbers of code
repositories, what numbers would you look for?

* ranking by LoC under each license
* ranking by "projects" under each license
* ... ?

Similarly, if you could declare objective criteria for textual license
analysis and had the time/resources to read all of them, what would
those criteria be? e.g.,

* has/has not been retired by the author
* has/has not been obsoleted by a new license published by the same author
* has/doesn't have an explicit patent grant
* ... ?

These examples assume quantitative measures of adoption, the text, and
the explicit actions of the author are the only things about a license
that can actually be measured, but I am probably thinking small- other
examples welcome.

[As a reminder, this is not a purely theoretical exercise- I agree
with many on this list that a license process based on more objective
(Continue reading)

Lawrence Rosen | 9 Dec 23:14 2012

Re: objective criteria for license evaluation

Hi Luis,

There are many useful ways to cut the data. Even raw statistics on number of
lines of code under each license; number of independent foundations/projects
that have adopted each license; types of software under each license; etc.
can be interesting. I'd like to know which licenses are used by government
agencies; for-profit software companies; non-profits. Most useful would be a
way of listing "large" or "important" projects and the licenses they use, as
long as the list of such projects is broad and comprehensive. 

I have no idea how Black Duck or others calculate their statistics nor what
is included in their samples, so the lack of methodological openness is more
of a problem than the availability of "statistics". I hope that OSI can
address these questions as scientists would, rather than as religious
zealots for one sect or another.

Regarding the classification of licenses, I think it is most important to
categorize licenses in the same business-related terminology that relates to
business models. So you need to identify which licenses ignore or have
antiquated provisions regarding patents, and why that might matter; which
licenses require reciprocity; whether that reciprocity includes use by third
parties over a network or whether it is a "strong" or "weak" reciprocity;
which licensees contain defensive suspension provisions (patent only or
copyright also) that require due diligence before reliance on that software;
which licenses are definitely incompatible with each other for derivative
work purposes; which licenses are approved for use by the US or other
governments; which contain attribution requirements beyond a subset of basic
requirements; which contain jurisdiction or governing law provisions; etc.
Of course, OSI should identify licenses that have been superseded or
withdrawn by the author.
(Continue reading)

Engel Nyst | 27 Dec 18:46 2012
Picon

Re: objective criteria for license evaluation

Hello license-discuss,

As a software developer, interested to raise awareness on open
licensing, build a community of an Open Source project, and educate
myself and all involved people to understand and choose their open
licenses, I very much welcome this discussion. I admit I have been
looking for slightly more guidance on OSI pages, over the last couple
of years, than it is available currently. Please don't take that as
criticism, or not otherwise intended than simply a need for pointers.

If I may share a few thoughts from this user-side experience. I think
that OSI pages could greatly help if they contain hints or assistance
in particular for:

On 12/10/12, Lawrence Rosen <lrosen <at> rosenlaw.com> wrote:
> Regarding the classification of licenses, I think it is most important to
> categorize licenses in the same business-related terminology that relates
> to
> business models. So you need to identify which licenses ignore or have
> antiquated provisions regarding patents, and why that might matter; which
> licenses require reciprocity; whether that reciprocity includes use by
> third
> parties over a network or whether it is a "strong" or "weak" reciprocity;

I quote this for the reciprocity criterion first and foremost. I think
it's essential, including but not limited to, for developers looking
for a license, and for developers and community to understand open
licenses, their effects, their goals. For an educational purpose.

My own (poor) attempt at it has been the simplified and easy to
(Continue reading)

Gervase Markham | 10 Dec 11:57 2012
Picon

Re: objective criteria for license evaluation

On 09/12/12 18:46, Luis Villa wrote:
> So let me restate the question to broaden it a bit. If you had a
> *blue-sky dream* what subjective information would you look at?
>
> For example, if you had the resources to scan huge numbers of code
> repositories, what numbers would you look for?
>
> * ranking by LoC under each license
> * ranking by "projects" under each license
> * ... ?

If we are blue-sky dreaming, then I would like to rank by "_useful_, 
unique lines of code under each license". "Useful" in the sense that 
some half-finished barely-compiling "my first Windows CD player" on 
Sourceforge counts for nothing, whereas jQuery counts for a lot. 
"Unique", in the sense that I shouldn't be able to game the stats by 
going to github and forking every project with my preferred license.

> Similarly, if you could declare objective criteria for textual license
> analysis and had the time/resources to read all of them, what would
> those criteria be? e.g.,
>
> * has/has not been retired by the author

This is important; however some licenses such as the HPND have no 
identified author, but yet are deprecated.

> * has/has not been obsoleted by a new license published by the same author

- one can imagine a license which has been obsoleted by its author but 
(Continue reading)

Richard Fontana | 10 Dec 16:58 2012
Picon

Re: objective criteria for license evaluation

On Mon, Dec 10, 2012 at 10:57:10AM +0000, Gervase Markham wrote:
> On 09/12/12 18:46, Luis Villa wrote:
> >So let me restate the question to broaden it a bit. If you had a
> >*blue-sky dream* what subjective information would you look at?
> >
> >For example, if you had the resources to scan huge numbers of code
> >repositories, what numbers would you look for?
> >
> >* ranking by LoC under each license
> >* ranking by "projects" under each license
> >* ... ?
> 
> If we are blue-sky dreaming, then I would like to rank by "_useful_,
> unique lines of code under each license". "Useful" in the sense that
> some half-finished barely-compiling "my first Windows CD player" on
> Sourceforge counts for nothing, whereas jQuery counts for a lot.
> "Unique", in the sense that I shouldn't be able to game the stats by
> going to github and forking every project with my preferred license.

I can also imagine other metrics of license popularity. Download
statistics are problematic but it is the usual metric for distro
popularity. One might be able to measure the size of contributor and
user communities (numbers of committers, numbers of unique patch
authors for a given release, subscriptions to mailing lists...?).

[...]
> I think there is also a place for "lawyers generally think it's
> vague and has sub-optimal word choice", which might apply to e.g.
> Artistic v1.

(Continue reading)

Luis Villa | 10 Dec 18:23 2012

Re: objective criteria for license evaluation

On Mon, Dec 10, 2012 at 2:57 AM, Gervase Markham <gerv <at> mozilla.org> wrote:
> On 09/12/12 18:46, Luis Villa wrote:
>>
>> So let me restate the question to broaden it a bit. If you had a
>> *blue-sky dream* what subjective information would you look at?

By the way, I think this was probably obvious from the rest of the
email, but I meant *objective* here.

>> For example, if you had the resources to scan huge numbers of code
>> repositories, what numbers would you look for?
>>
>> * ranking by LoC under each license
>> * ranking by "projects" under each license
>> * ... ?
>
>
> If we are blue-sky dreaming, then I would like to rank by "_useful_, unique
> lines of code under each license". "Useful" in the sense that some
> half-finished barely-compiling "my first Windows CD player" on Sourceforge
> counts for nothing, whereas jQuery counts for a lot. "Unique", in the sense
> that I shouldn't be able to game the stats by going to github and forking
> every project with my preferred license.

How to define "useful" objectively? Size is the obvious,
plausibly-obtainable proxy here for "useful"- "projects over X LOC" or
something like that. I suppose if you had a custom crawler that had
knowledge of git/svn/cvs/etc., you could do "projects over 5
committers" or "projects with over 100 commits" or something along
those lines. Richard suggests community size, which would be great but
(Continue reading)

Gervase Markham | 10 Dec 18:31 2012
Picon

Re: objective criteria for license evaluation

On 10/12/12 17:23, Luis Villa wrote:
> How to define "useful" objectively? Size is the obvious,
> plausibly-obtainable proxy here for "useful"- "projects over X LOC" or
> something like that. I suppose if you had a custom crawler that had
> knowledge of git/svn/cvs/etc., you could do "projects over 5
> committers" or "projects with over 100 commits" or something along
> those lines. Richard suggests community size, which would be great but
> is probably not computable, no matter how many people/how much money
> you throw at it.

Perhaps we could have multiple criteria - either size, or being used in 
 > N other projects. If there were some way of detecting that. Some 
modern SCMs now allow you to explicitly pull in other repos; perhaps 
that could be detected.

>> This is important; however some licenses such as the HPND have no identified
>> author, but yet are deprecated.
>
> Deprecated by *who*? :) (Note that we don't even have a "deprecated"
> category right now; we've only gotten as far as "redundant with more
> popular licenses.")

Well, http://opensource.org/licenses/HPND says:

"This License has been voluntarily deprecated by its author."

:-P

>>> * has/doesn't have an explicit patent grant
>>
(Continue reading)


Gmane