Python
Archived Posts from this Category
Archived Posts from this Category
Posted by Jim DeLaHunt on 31 Jul 2022 | Tagged as: Python, robobait, software engineering
I recently started a filing project. It requires labels printed on slips of paper, each with a name and ID number in nice big letters. I authored my labels in the SVG graphics format. But editing SVG files for each label is impractical, so I searched for a way to treat the SVG as a template, and fill it out with a spreadsheet of data. I found a ridiculously easy way to do it in Python — with only 9 lines of clever code.
Continue Reading »Posted by Jim DeLaHunt on 31 Jan 2022 | Tagged as: i18n, Python, software engineering, technical support, Unicode, web technology
I have been active on StackOverflow for more than twelve years. StackOverflow is a phenomenally successful question and answer website, aimed at software developers seeking technical answers. Part of what makes StackOverflow successful is that it gamifies “reputation”: your reputation goes up when you write good answers, and ask good questions, and otherwise help. On 23 December 2021, my StackOverflow reputation rose past 10,000. This is a gratifying milestone.
I am user Jim DeLaHunt on StackOverflow. I apparently posted my first question there on 23. November, 2009. I asked if anyone could point me to “an XML language for describing file attributes of a directory tree?” I did not get a good direct answer. I did get a reference to the XML-dev email list, which I follow to this day. My first answer was to my own question about the XML language. My first answer to someone else’s question was about three weeks later, and it was about detecting a character encoding.
Over twelve years, I have written 133 answers, most of which languish in obscurity. Three have earned particularly many upvotes (and, between them, over 40% of my reputation):
StackOverflow turns the reputation score into a variety of rankings. They put me in the top 4% for reputation overall. This sounds very impressive, until you learn that I am only 24,308-ranked among all participants. Mind you, there are over 16 million participants. I imagine there is a long, inactive tail, compared to which my small activity looks great.
In a similar vein, StackOverflow ranks me among the top 5% in the topics of “Python” and “MySQL“; the top 10% in “Unicode“; and the top 20% in “Internationalization“, “UTF-8“, and “Django“. That reflects some combination of effort on my part, and flattery due to the long, inactive tail.
I put a lot of work, 8-10 years ago, into answering questions and building my reputation. Now I find that upvotes trickle in for my existing 133 questions. My reputation rises surprisingly steadily, even if I don’t contribute anything new, giving me a kind of StackOverflow pension. But I still get satisfaction from plugging away there every now and again, trying to find a good question and write a clear answer. Maybe, in less than 12 years from now, I might reach StackOverflow 20,000.
Posted by Jim DeLaHunt on 31 Jan 2017 | Tagged as: Python, robobait, software engineering
Recently I was writing a Python-language tool, and some of my doctests (text fixtures, within module comments) were failing. When I tried to import the StringIO module in my test, I got a quite annoying message, “Got nothing”, and the test didn’t work as I wanted. I asked StackOverflow. User wim there gave me a crucial insight, but didn’t explain the underlying cause of my problem. I read the doctest code, and came up with an explanation that satisfied me. I am posting it here, as an aid to others. The gist of the insight: What looks like a multi-line doctest fixture is in fact a succession of single-line doctest “Examples”, some which return no useful result but which set up state for later Examples. Each single-line Example should each have a >>> prefix, not a ... prefix. But, there are Examples that require the ... prefix. The difference lies in Python’s definition of an Interactive Statement.
I posted a question much like this to StackOverflow:
Why is importing a module breaking my doctest (Python 2.7)?
I tried to use a StringIO
instance in a doctest in my class, in a Python 2.7 program. Instead of getting any output from the test, I get a response, “Got nothing”.
This simplified test case demonstrates the error:
#!/usr/bin/env python2.7
# encoding: utf-8
class Dummy(object):
   Â
'''Dummy: demonstrates a doctest problem
   >>> from StringIO import StringIO
  Â
... s = StringIO()
  Â
... print("s is created")
  Â
s is created
  Â
'''
if __name__ == "__main__":
  Â
import doctest
  Â
doctest.testmod()
Expected behaviour: test passes.
Observed behaviour: test fails, with output like this:
% ./src/doctest_fail.py
**********************************************************************
File "./src/doctest_fail.py", line 7, in __main__.Dummy
Failed example:
from StringIO import StringIOÂ Â Â
s = StringIO()Â Â Â
print("s is created")Â Â Â
Expected:
s is created  Â
Got nothing
**********************************************************************
1 items had failures:
1 of 1 in __main__.Dummy  Â
***Test Failed*** 1 failures.
Why is this doctest failing? What change to I need to make in order to be able to use StringIO-like functionality (a literal string with a file interface) in my doctests?
(I had originally suspected the StringIO module of being part of the problem. My original question title was, “Why is use of StringIO breaking my doctest (Python 2.7)”. When I realised that suspicion was incorrect, I edited the question on StackOverflow.)
StackOverflow expert wim was quick with the crucial insight: “It’s the continuation line syntax (...) that is confusing doctest parser.” Wim then rewrote my example so that it functioned correctly. Excellent! Thank you, wim.
I wasn’t satisfied, however. I know from didn’t explain the underlying cause of my problem. I read the doctest code, and came up with an explanation that satisfied me. Below is an improved version of the answer I posted to StackOverflow at the time.
The example fails, because it uses the PS2 syntax (...
) instead of PS1 syntax (>>>
) in front of separate simple statements.
Change ...
to >>>
:
#!/usr/bin/env python2.7
# encoding: utf-8
class Dummy(object):
   Â
'''Dummy: demonstrates a doctest problem
   >>> from StringIO import StringIO
   >>>
s = StringIO()
   Â
>>> print("s is created")
  Â
s is created
  Â
'''
if __name__ == "__main__":
  Â
import doctest
  Â
doctest.testmod()
Now the corrected example, renamed doctest_pass.py
, runs with no errors. It produces no output, meaning that all tests pass:
% ./src/doctest_pass.py
Why is the >>>
syntax correct? The Python Library Reference for doctest, 25.2.3.2. How are Docstring Examples Recognized? should be the place to find the answer, but it isn’t terribly clear about this syntax.
Doctest scans through a docstring, looking for “Examples”. Where it sees the PS1 string >>>
, it takes everything from there to the end of the line as an Example. It also appends any following lines which begin with the PS2 string ...
to the Example (See: _EXAMPLE_RE
in class doctest.DocTestParser
, lines 584-595). It takes the subsequent lines, until the next blank line or line starting with the PS1 string, as the Wanted Output.
Doctest compiles each Example as a Python “interactive statement”, using the compile()
built-in function in an exec
statement (See: doctest.DocTestRunner.__run()
, lines 1314-1315).
An “interactive statement” is a statement list ending with a newline, or a Compound Statement. A compound statement, e.g. an if
or try
statement, “in general, […spans] multiple lines, although in simple incarnations a whole compound statement may be contained in one line.” Here is a multi-line compound statement:
if 1 > 0:
  Â
print("As expected")
else:
  Â
print("Should not happen")
A statement list is one or more simple statements on a single line, separated by semicolons.
from StringIO import StringIO
s = StringIO(); print("s is created")
So, the question’s doctest failed because it contained one Example with three simple statements, and no semicolon separators. Changing the PS2 strings to PS1 strings succeeds, because it turns the docstring into a sequence of three Examples, each with one simple statement. Although these three lines work together to set up one test of one piece of functionality, they are not a single test fixture. They are three tests, two of which set up state but do not really test the main functionality.
By the way, you can see the number of Examples which doctest
recognises by using the -v
flag. Note that it says, “3 tests in __main__.Dummy
“. One might think of the three lines as one test unit, but doctest
sees three Examples. The first two Examples have no expected output. When the Example executes and generates no output, that counts as a “pass”.
% ./src/doctest_pass.py -v
Trying:
  Â
from StringIO import StringIO
Expecting nothing
ok
Trying:
  Â
s = StringIO()
Expecting nothing
ok
Trying:
  Â
print("s is created")
Expecting:
  Â
s is created
ok
1 items had no tests:
  Â
__main__
1 items passed all tests:
  Â
3 tests in __main__.Dummy
3 tests in 2 items.
3 passed and 0 failed.
Test passed.
Within a single docstring, the Examples are executed in sequence. State changes from each Example are preserved for the following Examples in the same docstring. Thus the import
statement defines a module name, the s =
assignment statement uses that module name and defines a variable name, and so on. The doctest documentation, 25.2.3.3. What’s the Execution Context?, obliquely discloses this when it says, “examples can freely use … names defined earlier in the docstring being run.”
The preceding sentence in that section, “each time doctest finds a docstring to test, it uses a shallow copy of M’s globals, so that … one test in M can’t leave behind crumbs that accidentally allow another test to work”, is a bit misleading. It is true that one test in M can’t affect a test in a different docstring. However, within a single docstring, an earlier test will certainly leave behind crumbs, which might well affect later tests.
But there is an example doctest, in the Python Library Reference for doctest, 25.2.3.2. How are Docstring Examples Recognized?, which uses ...
syntax. Why doesn’t it use >>>
syntax? Because that example consists of an if
statement, which is a compound statement on multiple lines. As such, its second and subsequent lines are marked with the PS2 strings. It’s unfortunate that this is the only example of a multi-line fixture in the documentation, because it can be misleading about when to use PS1 instead of PS2 strings.
Posted by Jim DeLaHunt on 31 May 2015 | Tagged as: Python, software engineering
In recent weeks, I have been working intensively with SQLalchemy for a consulting client. SQLalchemy is a Python-language toolkit for using SQL databases in applications. I’ve used Python, and SQL databases, and SQL queries, and a different Python-language toolkit for using SQL databases in applications, this was my first in-depth encounter with SQLalchemy. I had to do a lot of learning. SQLalchemy, despite its scads of documentation, and good tutorials, didn’t tell me some important concepts. Here’s a brief list, in an attempt to gather my thoughts and insights.
This list doesn’t include the important concepts the documentation does include, just what it (to my reading) left out. And I haven’t attempted to flesh out these points. That might be a good future blog. It is concepts that I wish I had learned earlier and more easily.
Posted by Jim DeLaHunt on 25 Jan 2015 | Tagged as: Python, robobait
I just resolved a problem installing a Python module pycdio on my Mac OS X 10.10.1 “Yosemite” operating system. The error message was obscure: “Error: Unable to find ‘python.swg'”, and “Error: Unable to find ‘typemaps.i'”. The solution involved something non-obvious about how Mac Ports handles swig. Here’s my notes, in hopes of helping others seeing this error.
Posted by Jim DeLaHunt on 31 May 2013 | Tagged as: Python, software engineering, Unicode
I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my top-voted answer in StackOverflow so far.
The question, Django headache with simple non-ascii string, was asked by user Ezequiel in January 2010. In abbreviated form, it was:
Posted by Jim DeLaHunt on 29 Feb 2012 | Tagged as: Python, software engineering
There is a special place in heaven for those who make free-libre software engineering tools available to journeyman programmers like me. I’m grateful to the Eclipse project for their comprehensive integrated development environment. A few years ago, when I chose Eclipse as my Python-language programming environment, Eclipse wasn’t very easy to install, especially on the Mac. Into the gap rode the EasyEclipse project. They offered distributions of Eclipse and related modules, targeted at various kinds of developer and at various language preferences, in packaging that was simple and ready to go. I used their EasyEclipse for Python 1.3.1 product as my primary development environment for several years, and it was great for me.
Alas, the EasyEclipse project appears to be stagnating. Â They haven’t updated their builds to the latest version of Eclipse and language-specific plug-ins. (They still use Eclipse 3.3, current is 3.7.)Â Their Eclipse build is throwing errors in the Software Update feature, because the latest plugins are too new for their old Eclipse core. They aren’t responding to bug reports and forum posts. They aren’t even responding to my message sent in response to their plea for helpers to take over the project.
In the meantime, the Eclipse project’s distributions are now easier to use. You can download Eclipse builds for Mac OS. They have builds targeted to various segments of developers. They have extensive documentation. They have a update manager within Eclipse, to make it easier to stay current.
So, the question is: is the core Eclipse project now easy enough to install that there’s no more need for a project like EasyEclipse?  Is Eclipse.org easy enough to eclipse EasyEclipse.org? If not, can the core Eclipse project learn lessons from EasyEclipse and become easy enough? Or is there a niche for the EasyEclipse long-term? I recently downloaded a current Eclipse build, and fitted it out for Python and PHP programming. My experience gives me opinions on these questions.
Posted by Jim DeLaHunt on 31 Aug 2010 | Tagged as: Python, robobait, software engineering, Unicode, web technology
This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding. I decided to use the Django web framework and the Python language for the Twanguages analysis code. I know Python, but I was learning Django for the first time.
Django is really, really marvellous. When I tried this expression, and got the Python array of records I was expecting,
q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)
I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”
But Django gave me fits. It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log. Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles. I share them in the hopes of helping another Django novice.
Here are my Django gotchas. They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.
S2 = models.TwStatus.objects.get( key )
I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:
S2 = models.TwStatus.objects.get( id__exact=key )
The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development. Continue Reading »
Posted by Jim DeLaHunt on 06 Aug 2009 | Tagged as: Python, robobait, software engineering, web technology
One of the many nice touches of the Django framework is that it provides tools and instructions to make a standalone Django documentation set from its distribution. (Django is an application framework for the Python language that helps with database access and web application.) Standalone docs are great for people like me who work on a laptop and are sometimes off the net. But I’m using Mac OS X, I get my code through Macports, and Django’s instructions don’t quite cover this case. So I just figured it out. Here’s the tricks I needed. Maybe it will help you.