Dowemo
0 0 0 0


Question:

While making my App Engine app I suddenly ran into an error which shows every couple of requests:

    run_wsgi_app(application)
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app
    run_bare_wsgi_app(add_wsgi_middleware(application))
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/util.py", line 118, in run_bare_wsgi_app
    for data in result:
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/appstats/recording.py", line 897, in appstats_wsgi_wrapper
    result = app(environ, appstats_start_response)
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 717, in __call__
    handler.handle_exception(e, self.__debug)
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 463, in handle_exception
    self.error(500)
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 436, in error
    self.response.clear()
  File "/home/ubuntu/Programs/google/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 288, in clear
    self.out.seek(0)
  File "/usr/lib/python2.7/StringIO.py", line 106, in seek
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 208: ordinal not in range(128)

I really have no idea where this could be, it only happens when I use a specific function but it's impossible to track all string I have. It's possible this byte is a character like ' " [ ] etc, but only in another language

How can I find this byte and possibly other ones?

I am running GAE with python 2.7 in ubuntu 11.04

Thanks.

*updated*

This is the code I ended up using: from codecs import BOM_UTF8 from os import listdir, path p = "path"

def loopPath(p, times=0):
    for fname in listdir(p):
        filePath = path.join(p, fname)
        if path.isdir(filePath):
            return loopPath(filePath, times+1)
        if fname.split('.', 1)[1] != 'py': continue
        f = open(filePath, 'r')
        ln = 0
        for line in f:
            #print line[:3] == BOM_UTF8
            if not ln and line[:3] == BOM_UTF8:
                line = line[4:]
            col = 0
            for c in list(line):
                if ord(c) > 128:
                    raise Exception('Found "'+line[c]+'" line %d column %d in %s' % (ln+1, col, filePath))
                col += 1
            ln += 1
        f.close()
loopPath(p)

Best Answer:


When I translated UTF-8 files to latin1 LaTeX I had similar problems. I wanted a list of all evil unicode characters in my files.

It is probably even more you need, but I used this:

UNICODE_ERRORS = {}
def fortex(exc):
    import unicodedata, exceptions
    global UNICODE_ERRORS
    if not isinstance(exc, exceptions.UnicodeEncodeError):
        raise TypeError("don't know how to handle %r" % exc)
    l = []
    print >>sys.stderr, "   UNICODE:", repr(exc.object[max(0,exc.start-20):exc.end+20])
    for c in exc.object[exc.start:exc.end]:
        uname = unicodedata.name(c, u"0x%x" % ord(c))
        l.append(uname)
        key = repr(c)
        if not UNICODE_ERRORS.has_key(key): UNICODE_ERRORS[key] = [ 1, uname ]
        else: UNICODE_ERRORS[key][0] += 1
    return (u"gpTastatur{%s}" % u", ".join(l), exc.end)
def main():    
    codecs.register_error("fortex", fortex)
    ...
    fileout = codecs.open(filepath, 'w', DEFAULT_CHARSET, 'fortex')
    ...
    print UNICODE_ERROS

helpful?

Here is the matching excerpt from the Python doc:

codecs.register_error(name, error_handler) Register the error handling function error_handler under the name name. error_handler will be called during encoding and decoding in case of an error, when name is specified as the errors parameter.

For encoding error_handler will be called with a UnicodeEncodeError instance, which contains information about the location of the error. The error handler must either raise this or a different exception or return a tuple with a replacement for the unencodable part of the input and a position where encoding should continue. The encoder will encode the replacement and continue encoding the original input at the specified position. Negative position values will be treated as being relative to the end of the input string. If the resulting position is out of bound an IndexError will be raised.




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs