Discussion:
[melkjug-dev] 2-3x speedup from using utf-8 throughout pylons instead of unicode?
Joshua Bronson
2009-03-07 02:41:15 UTC
Permalink
from
http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:39:13
:

ianbickingokay; I was just planning on doing decoding in WebOb


zepolenianbicking: please
don't08:43<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:43:28>
zepolenurls don't have encoding


ianbickingzepolen: sure they do, just like POST vars
etc08:44<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:44:18>
zepolenbut implicitly assuming they are utf8 is imho
wrong08:44<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:44:40>
ianbickingit'd be the same policy as other unicode in
webob08:45<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:45:08>
ianbickingyou set request.charset or request.default_charset, and you start
getting unicode08:45<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:45:20>
zepolenwhat if you dont want unicode? can you set request.charset = None?


ianbickingzepolen: yeah


zepolenok then08:48<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:48:24>
zepolenyou probably think im crazy to use utf8 instead of unicode, but the
performance gain isn't
trivial08:49<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:49:26>
zepolenno decoding requests, no encoding template renders, no decoding (and
then reencoding) database
results08:50<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:50:01>
steve9001no .__cmp__() or .startswith() or
...08:50<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:50:49>
zepolennot really, i just decode when i need to do string
manipulation/comparison08:51<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:51:21>***
vpol has joined
#pylons08:51<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:51:57>
steve9001i don't have to worry about decoding any time i do something that
depends on the value. it's a tradeoff I guess, performance versus ease of
programming and code
clarity08:51<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:51:57>
zepolenit's actually clearer - you never have to guess what something is
08:52<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:52:46>
zepolenbut the issue is more py2.5's
fault08:53<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:53:02>
zepoleni'll probably use unicode with py3's better distinction between
unicode and bytes08:53<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:53:58>***
anilm has joined
#pylons08:54<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:54:22>
zepolenbut that's years
away08:54<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:54:31>
steve9001yeah i think i'm looking forward to it
though08:54<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:54:48>
zepolenin the mean time im happy that the app is 2-3x faster using straight
utf808:55<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:55:16>
zepolenthere are so many places it helps you don't even
fathom08:55<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:55:59>
zepolenstoring unicode strings in memcache means a
pickling/depickling+encoding on output
overhead08:56<http://pylonshq.com/irclogs/%23pylons/%23pylons.2009-03-06.log.html#t2009-03-06T08:56:35>
steve9001i profiled my app a bit with repoze the other day and the function
that is called more than any other is isinstance() to check if a string is
unicode. i think from the webhelpers.html.literal function

Loading...