Ticket #46 (closed defect: fixed)

Opened 5 years ago

Last modified 5 years ago

slicing string literals is broken

Reported by: Stefan Behnel Owned by: somebody
Priority: major Milestone: 0.9.8.1
Component: Code Generation Keywords:
Cc:

Description

I added a test case "run/literalslicing.pyx" to show that slicing string literals is broken. Basically, when you do

py_result = "abc"[2]

Cython will generate C code that accesses the C byte string directly and retrieves the second char in it. Fast and simple, but this triggers two bugs.

1) The result is a C char, not a Python string. Converting it to a Python object will return an int instead of a single-character string.

2) Doing the same with a unicode string will access the (multi-byte) UTF-8 representation of the string and thus return the wrong byte if the string contains non-ASCII characters.

Change History

Changed 5 years ago by Stefan Behnel

Regarding the unicode issue, I think it might be better to have a dedicated unicode type after all. That would allow us to give it non-C semantics while keeping the fast C semantics up for byte strings.

Changed 5 years ago by robertwb

  • status changed from new to closed
  • resolution set to fixed

Changed 5 years ago by robertwb

  • milestone set to 0.9.8.1
Note: See TracTickets for help on using tickets.