anatoly techtonik | 18 Jul 2012 19:30
Picon
Gravatar

'' in 'abc' == True

I've just spotted inconsistency between string and lists handling:

>>> '' in 'abc'
True
>>> '' in 'abc'.split()
False
>>> [] in ['a', 'b', 'c']
False

Why strings here behave differently than other sequence types? Is that
by design?
Picon
Favicon

Re: '' in 'abc' == True

Strings have a common use case which lists do not: finding
subsequences/substrings.

Consider the following:

>>> string = 'abcdefghijklmnop'
>>> 'def' in string
True
>>> list('def') in list(string)
False

The contains operator ("in") has a different meaning than the contains
operator for a list. A list contains an object if (and only if) that
object is a single element of the list. A string contains another string
if (and only if) the other string is a substring of the first string.

Matthew Lefavor

NASA GSFC [Microtel, LLC]
Mail Code 699.0/Org Code 582.0
matthew.lefavor@...
(301) 614-6818 (Desk)
(443) 758-4891 (Cell)

On 7/18/12 1:30 PM, "anatoly techtonik" <techtonik@...> wrote:

>I've just spotted inconsistency between string and lists handling:
>
>>>> '' in 'abc'
>True
(Continue reading)

Masklinn | 18 Jul 2012 19:43

Re: '' in 'abc' == True

On 2012-07-18, at 19:30 , anatoly techtonik wrote:

> I've just spotted inconsistency between string and lists handling:
> 
>>>> '' in 'abc'
> True
>>>> '' in 'abc'.split()
> False
>>>> [] in ['a', 'b', 'c']
> False
> 
> Why strings here behave differently than other sequence types? Is that
> by design?

Erm… yes? `in` would not be very useful for strings if you could only use
it to check for a single character would it?
Masklinn | 18 Jul 2012 19:58

Re: '' in 'abc' == True

On 2012-07-18, at 19:43 , Masklinn wrote:
> On 2012-07-18, at 19:30 , anatoly techtonik wrote:
> 
>> I've just spotted inconsistency between string and lists handling:
>> 
>>>>> '' in 'abc'
>> True
>>>>> '' in 'abc'.split()
>> False
>>>>> [] in ['a', 'b', 'c']
>> False
>> 
>> Why strings here behave differently than other sequence types? Is that
>> by design?
> 
> Erm… yes? `in` would not be very useful for strings if you could only use
> it to check for a single character would it?

in fact, things used to work that way in older Python, this was
specifically changed to the current behavior *as noted in the documentation*:

> When s is a string or Unicode string object the in and not in
> operations act like a substring test. In Python versions before 2.3, x
> had to be a string of length 1. In Python 2.3 and beyond, x may be a
> string of any length.

A Python string, you may want to note, is a string. Not a sequence of
characters. The first item of a 1-character string is itself, all basic
(step-less) slices of a string are contained in itself (including itself
and the empty string), you can infinitely get the first item of a
(Continue reading)

Devin Jeanpierre | 18 Jul 2012 20:06
Picon
Gravatar

Re: '' in 'abc' == True

On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
> A Python string, you may want to note, is a string. Not a sequence of
> characters.

It's both (with the caveat that, in Python, a character is just a
string of length 1).

(See: http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy
)

-- Devin
Masklinn | 18 Jul 2012 20:16

Re: '' in 'abc' == True

On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>> A Python string, you may want to note, is a string. Not a sequence of
>> characters.
> 
> It's both (with the caveat that, in Python, a character is just a
> string of length 1).

That's playing with words, especially comparing strings with Python 3
binaries which *do* actually have a separate "character" type
(reified to an integer).

So Python strings don't have reified characters, a string's item and a
slice of size 1 are essentially identical which is pretty much unique
to them (as far as my knowledge of Python's sequences go).

Which is not a bad thing, mind you, it makes working with strings much
more pleasant.
Devin Jeanpierre | 18 Jul 2012 20:31
Picon
Gravatar

Re: '' in 'abc' == True

On Wed, Jul 18, 2012 at 2:16 PM, Masklinn <masklinn@...> wrote:
>> It's both (with the caveat that, in Python, a character is just a
>> string of length 1).
>
> That's playing with words, especially comparing strings with Python 3
> binaries which *do* actually have a separate "character" type
> (reified to an integer).

No it isn't. Strings are adherents to the sequence protocol. The
Python datatype reference echoes what I said, nearly exactly.

http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy

> So Python strings don't have reified characters, a string's item and a
> slice of size 1 are essentially identical which is pretty much unique
> to them (as far as my knowledge of Python's sequences go).

Nothing about that feature makes them not-sequences; instead, it makes
them a rather special kind of sequence.

-- Devin
Masklinn | 18 Jul 2012 21:02

Re: '' in 'abc' == True

On 2012-07-18, at 20:31 , Devin Jeanpierre wrote:

> On Wed, Jul 18, 2012 at 2:16 PM, Masklinn <masklinn@...> wrote:
>>> It's both (with the caveat that, in Python, a character is just a
>>> string of length 1).
>> 
>> That's playing with words, especially comparing strings with Python 3
>> binaries which *do* actually have a separate "character" type
>> (reified to an integer).
> 
> No it isn't. Strings are adherents to the sequence protocol. The
> Python datatype reference echoes what I said, nearly exactly.

This has no relevance to my messages, I have not claimed anywhere that
strings weren't sequences.

>> So Python strings don't have reified characters, a string's item and a
>> slice of size 1 are essentially identical which is pretty much unique
>> to them (as far as my knowledge of Python's sequences go).
> 
> Nothing about that feature makes them not-sequences; instead, it makes
> them a rather special kind of sequence.

I'm not sure why you're saying that. Again, I have never once claimed they
were not sequences (quite the opposite in fact). Why the strawmanning?
Devin Jeanpierre | 18 Jul 2012 22:02
Picon
Gravatar

Re: '' in 'abc' == True

On Wed, Jul 18, 2012 at 3:02 PM, Masklinn <masklinn@...> wrote:
> This has no relevance to my messages, I have not claimed anywhere that
> strings weren't sequences.

Sorry, I misinterpreted what you said. I'm tired. :(

-- Devin
Ethan Furman | 18 Jul 2012 21:32
Picon
Gravatar

Re: '' in 'abc' == True

Masklinn wrote:
> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>>> A Python string, you may want to note, is a string. Not a sequence of
>>> characters.
>> It's both (with the caveat that, in Python, a character is just a
>> string of length 1).
> 
> That's playing with words, especially comparing strings with Python 3
> binaries which *do* actually have a separate "character" type
> (reified to an integer).

Python 3 does not have a 'character' type; it has 'str' which is made up 
of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).

~Ethan~
Georg Brandl | 20 Jul 2012 22:56
Picon
Gravatar

Re: '' in 'abc' == True

On 07/18/2012 09:32 PM, Ethan Furman wrote:
> Masklinn wrote:
>> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
>>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>>>> A Python string, you may want to note, is a string. Not a sequence of
>>>> characters.
>>> It's both (with the caveat that, in Python, a character is just a
>>> string of length 1).
>> 
>> That's playing with words, especially comparing strings with Python 3
>> binaries which *do* actually have a separate "character" type
>> (reified to an integer).
> 
> Python 3 does not have a 'character' type; it has 'str' which is made up 
> of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).

That's what he said.  Could we stop the annoying "but I know it better
than you without reading your message" please?

Georg
Ethan Furman | 20 Jul 2012 23:18
Picon
Gravatar

Re: '' in 'abc' == True

Georg Brandl wrote:
> On 07/18/2012 09:32 PM, Ethan Furman wrote:
>> Masklinn wrote:
>>> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
>>>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>>>>> A Python string, you may want to note, is a string. Not a sequence of
>>>>> characters.
 >>>>
>>>> It's both (with the caveat that, in Python, a character is just a
>>>> string of length 1).
 >>>
>>> That's playing with words, especially comparing strings with Python 3
>>> binaries which *do* actually have a separate "character" type
>>> (reified to an integer).
 >>
>> Python 3 does not have a 'character' type; it has 'str' which is made up 
>> of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
> 
> That's what he said.  Could we stop the annoying "but I know it better
> than you without reading your message" please?

I am having trouble equating what I said with with Masklinn said. 
Perhaps you could explain how they say the same thing instead of 
assuming I didn't read his message?

~Ethan~
Antoine Pitrou | 20 Jul 2012 23:27

Re: '' in 'abc' == True

On Fri, 20 Jul 2012 14:18:20 -0700
Ethan Furman <ethan@...> wrote:

> Georg Brandl wrote:
> > On 07/18/2012 09:32 PM, Ethan Furman wrote:
> >> Masklinn wrote:
> >>> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
> >>>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
> >>>>> A Python string, you may want to note, is a string. Not a sequence of
> >>>>> characters.
>  >>>>
> >>>> It's both (with the caveat that, in Python, a character is just a
> >>>> string of length 1).
>  >>>
> >>> That's playing with words, especially comparing strings with Python 3
> >>> binaries which *do* actually have a separate "character" type
> >>> (reified to an integer).
>  >>
> >> Python 3 does not have a 'character' type; it has 'str' which is made up 
> >> of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
> > 
> > That's what he said.  Could we stop the annoying "but I know it better
> > than you without reading your message" please?
> 
> I am having trouble equating what I said with with Masklinn said. 
> Perhaps you could explain how they say the same thing instead of 
> assuming I didn't read his message?

"Python 3 binaries" probably means "Python 3 bytes objects" above.

(Continue reading)

Andrew Svetlov | 20 Jul 2012 23:19
Picon

Re: '' in 'abc' == True

Masklinn's explanation is comprehensive clean to me.

On Fri, Jul 20, 2012 at 11:56 PM, Georg Brandl <g.brandl@...> wrote:
> On 07/18/2012 09:32 PM, Ethan Furman wrote:
>> Masklinn wrote:
>>> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
>>>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>>>>> A Python string, you may want to note, is a string. Not a sequence of
>>>>> characters.
>>>> It's both (with the caveat that, in Python, a character is just a
>>>> string of length 1).
>>>
>>> That's playing with words, especially comparing strings with Python 3
>>> binaries which *do* actually have a separate "character" type
>>> (reified to an integer).
>>
>> Python 3 does not have a 'character' type; it has 'str' which is made up
>> of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
>
> That's what he said.  Could we stop the annoying "but I know it better
> than you without reading your message" please?
>
> Georg
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@...
> http://mail.python.org/mailman/listinfo/python-ideas

--

-- 
(Continue reading)

Steven D'Aprano | 21 Jul 2012 13:19

Re: '' in 'abc' == True

Andrew Svetlov wrote:
> Masklinn's explanation is comprehensive clean to me.

I'm glad that it's clear to someone, because to me the straight-forward, 
literal meaning of Masklinn's explanation (that Python 3 has a character type, 
and they're integers) is wrong. Python has no built-in "Char" type, under any 
spelling, let alone one which is also a subset of int. The non-literal meaning 
is hard to understand. I *guess* that Masklinn is trying to get across that 
Python 3 strings are Unicode strings, and characters in Unicode are actually 
code points, which are implemented at the C level as integers. If not that, I 
have no idea.

I've more or less forgotten why this was important, but I am enjoying watching 
people try to out-pedant each other :)

> On Fri, Jul 20, 2012 at 11:56 PM, Georg Brandl <g.brandl@...> wrote:
>> On 07/18/2012 09:32 PM, Ethan Furman wrote:
>>> Masklinn wrote:
>>>> On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
>>>>> On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@...> wrote:
>>>>>> A Python string, you may want to note, is a string. Not a sequence of
>>>>>> characters.
>>>>> It's both (with the caveat that, in Python, a character is just a
>>>>> string of length 1).
>>>> That's playing with words, especially comparing strings with Python 3
>>>> binaries which *do* actually have a separate "character" type
>>>> (reified to an integer).
>>> Python 3 does not have a 'character' type; it has 'str' which is made up
>>> of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
>> That's what he said.  Could we stop the annoying "but I know it better
(Continue reading)

Devin Jeanpierre | 21 Jul 2012 15:59
Picon
Gravatar

Re: '' in 'abc' == True

On Sat, Jul 21, 2012 at 7:19 AM, Steven D'Aprano <steve@...> wrote:
> Andrew Svetlov wrote:
>>
>> Masklinn's explanation is comprehensive clean to me.
>
>
> I'm glad that it's clear to someone, because to me the straight-forward,
> literal meaning of Masklinn's explanation (that Python 3 has a character
> type, and they're integers) is wrong. Python has no built-in "Char" type,
> under any spelling, let alone one which is also a subset of int. The
> non-literal meaning is hard to understand. I *guess* that Masklinn is trying
> to get across that Python 3 strings are Unicode strings, and characters in
> Unicode are actually code points, which are implemented at the C level as
> integers. If not that, I have no idea.

You're pretty far off. He was talking about bytes objects, not str objects.

-- Devin
Serhiy Storchaka | 21 Jul 2012 21:36
Picon

Re: '' in 'abc' == True

On 21.07.12 14:19, Steven D'Aprano wrote:
> I'm glad that it's clear to someone, because to me the straight-forward,
> literal meaning of Masklinn's explanation (that Python 3 has a character
> type, and they're integers) is wrong.

Yes, but Masklinn did not claim this.

Gmane