Opened 4 years ago

Closed 4 years ago

#553 closed defect (fixed)

Make inferring Python str type unsafe

Reported by: craigcitro Owned by: somebody
Priority: major Milestone: 0.13
Component: Code Generation Keywords:
Cc:

Description

Currently, the type inferencer is happy to infer any Python type. However, in the case that it infers that something is of type str, it will no longer allow automatic coercion between that object and a char *. While one could argue that this is morally a good idea, it's going to break huge amounts of existing user code (for instance in Sage).

An easy fix is to simply make the inference of str unsafe. This means that it's off by default, so users won't hit it by accident. Of course, this is frustrating in the case that a user does want to try out "unsafe" type inference, or if we want to start inferring str for other reasons. There are at least three obvious things we can do to fix this in the long term:

  • Revisit the issue of str <--> char * coercion, at least for Py2 only
  • Separate the "infer str type" decision from other type inference, via any of the usual mechanisms
  • Make people change their existing code.

Change History (5)

comment:1 Changed 4 years ago by scoder

If you allow str<->char* coercion to make the code work in Py2, people will write code that will only work in Py2, and will then have to fix their code when they notice that they may actually want it to run in Py3 as well. So disallowing it saves time and effort.

comment:2 Changed 4 years ago by scoder

Personally, I think "str" is something that users should really only use with great care. It's a plain Python type that cannot be mapped to any C type. Even worse, it's not a fixed type at all, but a kind of meta type that changes depending on the C compile time/Python runtime environment, and that behaves different in different environments. For example, str.decode() does not exist in Py3, whereas str.encode() behaves unpredictably in Py2. So the actual usefulness of this type is clearly limited, almost exclusively to error messages and docstrings, IMHO.

The problem here is not the type inference, but the type mutation of str. The type inference actually does the right thing and makes user errors more easily apparent.

comment:3 Changed 4 years ago by robertwb

Certainly str should be used with great care, but as of http://wiki.cython.org/enhancements/stringliterals there are many things that are (implicitly) typed as str. We could just disable this for literals, whereas explicitly typed strs (and the whole inference mechanisms) would not have to be changed.

comment:4 Changed 4 years ago by scoder

Special casing literals would lead to other hard to understand behaviour. For example, this currently works correctly:

cdef char* cs = "abc"

whereas this will lead to a Cython compile error, as it is not Py3 compatible:

pys = "abc"
cdef char* cs = pys

Not inferring 'str' for literals would mean that the latter would compile in Cython and then fail when running the code in Py3. I think that failing early is a virtue here.

comment:5 Changed 4 years ago by robertwb

  • Resolution set to fixed
  • Status changed from new to closed

While not all corner cases have been resolved, this has been fixed enough for the latest release.

Note: See TracTickets for help on using tickets.