software engineering

Archived Posts from this Category

Python multi-line doctests, and “Got nothing” message

Posted by Jim DeLaHunt on 31 Jan 2017 | Tagged as: Python, robobait, software engineering

Recently I was writing a Python-language tool, and some of my doctests (text fixtures, within module comments) were failing. When I tried to import the StringIO module in my test, I got a quite annoying message, “Got nothing”, and the test didn’t work as I wanted. I asked StackOverflow. User wim there gave me a crucial insight, but didn’t explain the underlying cause of my problem. I read the doctest code, and came up with an explanation that satisfied me. I am posting it here, as an aid to others. The gist of the insight: What looks like a multi-line doctest fixture is in fact a succession of single-line doctest “Examples”, some which return no useful result but which set up state for later Examples. Each single-line Example should each have a >>> prefix, not a prefix. But, there are Examples that require the prefix. The difference lies in Python’s definition of an Interactive Statement.

The Question

I posted a question much like this to StackOverflow:

Why is importing a module breaking my doctest (Python 2.7)?

I tried to use a StringIO instance in a doctest in my class, in a Python 2.7 program. Instead of getting any output from the test, I get a response, “Got nothing”.

This simplified test case demonstrates the error:

#!/usr/bin/env python2.7
# encoding: utf-8

class Dummy(object):
    ”’Dummy: demonstrates a doctest problem
    >>> from StringIO import StringIO
    … s = StringIO()
    … print(”s is created”)
    s is created
    ”’

if __name__ == “__main__”:
    import doctest
    doctest.testmod()

Expected behaviour: test passes.

Observed behaviour: test fails, with output like this:

% ./src/doctest_fail.py
**********************************************************************
File "./src/doctest_fail.py", line 7, in __main__.Dummy
Failed example:
    from StringIO import StringIO
    s = StringIO()
    print(”s is created”)
Expected:
    s is created
Got nothing
**********************************************************************
1 items had failures:
    1 of 1 in __main__.Dummy
***Test Failed*** 1 failures.

Why is this doctest failing? What change to I need to make in order to be able to use StringIO-like functionality (a literal string with a file interface) in my doctests?

(I had originally suspected the StringIO module of being part of the problem. My original question title was, “Why is use of StringIO breaking my doctest (Python 2.7)”. When I realised that suspicion was incorrect, I edited the question on StackOverflow.)

The Answer

StackOverflow expert wim was quick with the crucial insight: “It’s the continuation line syntax () that is confusing doctest parser.” Wim then rewrote my example so that it functioned correctly. Excellent!  Thank you, wim.

The Explanation

I wasn’t satisfied, however. I know from  didn’t explain the underlying cause of my problem. I read the doctest code, and came up with an explanation that satisfied me. Below is an improved version of the answer I posted to StackOverflow at the time.

The example fails, because it uses the PS2 syntax (...) instead of PS1 syntax (>>>) in front of separate simple statements.

Change ... to >>>:


#!/usr/bin/env python2.7
# encoding: utf-8

class Dummy(object):
    ”’Dummy: demonstrates a doctest problem
    >>> from StringIO import StringIO
    >>> s = StringIO()
    >>> print(”s is created”)
    s is created
    ”’

if __name__ == “__main__”:
    import doctest
    doctest.testmod()

Now the corrected example, renamed doctest_pass.py, runs with no errors. It produces no output, meaning that all tests pass:

% ./src/doctest_pass.py

Why is the >>> syntax correct? The Python Library Reference for doctest, 25.2.3.2. How are Docstring Examples Recognized? should be the place to find the answer, but it isn’t terribly clear about this syntax.

Doctest scans through a docstring, looking for “Examples”. Where it sees the PS1 string >>>, it takes everything from there to the end of the line as an Example. It also appends any following lines which begin with the PS2 string ... to the Example (See: _EXAMPLE_RE in class doctest.DocTestParser, lines 584-595). It takes the subsequent lines, until the next blank line or line starting with the PS1 string, as the Wanted Output.

Doctest compiles each Example as a Python “interactive statement”, using the compile() built-in function in an exec statement (See: doctest.DocTestRunner.__run(), lines 1314-1315).

An “interactive statement” is a statement list ending with a newline, or a Compound Statement. A compound statement, e.g. an if or try statement, “in general, […spans] multiple lines, although in simple incarnations a whole compound statement may be contained in one line.” Here is a multi-line compound statement:

if 1 > 0:
    print(”As expected”)
else:
    print(”Should not happen”)

A statement list is one or more simple statements on a single line, separated by semicolons.


from StringIO import StringIO
s = StringIO(); print("s is created")

So, the question’s doctest failed because it contained one Example with three simple statements, and no semicolon separators. Changing the PS2 strings to PS1 strings succeeds, because it turns the docstring into a sequence of three Examples, each with one simple statement. Although these three lines work together to set up one test of one piece of functionality, they are not a single test fixture. They are three tests, two of which set up state but do not really test the main functionality.

By the way, you can see the number of Examples which doctest recognises by using the -v flag. Note that it says, “3 tests in __main__.Dummy“. One might think of the three lines as one test unit, but doctest sees three Examples. The first two Examples have no expected output. When the Example executes and generates no output, that counts as a “pass”.


% ./src/doctest_pass.py -v
Trying:
    from StringIO import StringIO
Expecting nothing
ok
Trying:
    s = StringIO()
Expecting nothing
ok
Trying:
    print(”s is created”)
Expecting:
    s is created
ok
1 items had no tests:
    __main__
1 items passed all tests:
    3 tests in __main__.Dummy
3 tests in 2 items.
3 passed and 0 failed.
Test passed.

Within a single docstring, the Examples are executed in sequence. State changes from each Example are preserved for the following Examples in the same docstring. Thus the import statement defines a module name, the s = assignment statement uses that module name and defines a variable name, and so on. The doctest documentation, 25.2.3.3. What’s the Execution Context?, obliquely discloses this when it says, “examples can freely use … names defined earlier in the docstring being run.”

The preceding sentence in that section, “each time doctest finds a docstring to test, it uses a shallow copy of M’s globals, so that … one test in M can’t leave behind crumbs that accidentally allow another test to work”, is a bit misleading. It is true that one test in M can’t affect a test in a different docstring. However, within a single docstring, an earlier test will certainly leave behind crumbs, which might well affect later tests.

But there is an example doctest, in the Python Library Reference for doctest, 25.2.3.2. How are Docstring Examples Recognized?, which uses ... syntax. Why doesn’t it use >>> syntax? Because that example consists of an if statement, which is a compound statement on multiple lines. As such, its second and subsequent lines are marked with the PS2 strings.  It’s unfortunate that this is the only example of a multi-line fixture in the documentation, because it can be misleading about when to use PS1 instead of PS2 strings.

“Building Localization Capacity Through Non-specialist Developers”, at IUC 39

Posted by Jim DeLaHunt on 30 Sep 2015 | Tagged as: Unicode, meetings and conferences, software engineering

I’m delighted to be presenting, once again, to the 39th Internationalization and Unicode Conference (IUC39).  The conference is the gathering of my “tribe”, people who are as enthusiastic about language, text, and software as I am. If you like this stuff, it’s the best place in the world to be for those three days, so please register and join us there.

My presentation is, Building Localization Capacity Through Non-specialist Developers. Here’s the abstract: Continue Reading »

Things the docs never told me about SQLalchemy

Posted by Jim DeLaHunt on 31 May 2015 | Tagged as: Python, software engineering

In recent weeks, I have been working intensively with SQLalchemy for a consulting client. SQLalchemy is a Python-language toolkit for using SQL databases in applications. I’ve used Python, and SQL databases, and SQL queries, and a different Python-language toolkit for using SQL databases in applications, this was my first in-depth encounter with SQLalchemy. I had to do a lot of learning. SQLalchemy, despite its scads of documentation, and good tutorials, didn’t tell me some important concepts. Here’s a brief list, in an attempt to gather my thoughts and insights.

This list doesn’t include the important concepts the documentation does include, just what it (to my reading) left out. And I haven’t attempted to flesh out these points. That might be a good future blog.  It is concepts that I wish I had learned earlier and more easily.

Continue Reading »

A Technology Globalization meetup for the Vancouver Area: (3) Where, When, and How

Posted by Jim DeLaHunt on 28 Feb 2015 | Tagged as: Unicode, Vancouver, culture, i18n, language, meetings and conferences, multilingual, software engineering

Our little meetup now has a name: Vancouver Globalization and Localization Users Group, or VanGLUG for short. Follow us as @VanGLUG on Twitter.  We had an outreach meeting in late January. So it’s long past time to conclude this series of thoughts about VanGLUG. Part 3 discusses “Where, When, and How”. Earlier in the series were A Technology Globalization meetup for the Vancouver Area: (1) What, Who (Oct 31, 2014), and A Technology Globalization meetup for the Vancouver Area: (2) Why, Naming (Dec 31, 2014).

Where

One challenge of an in-person meeting is where to hold it. The usual habit for such events is to meet in downtown Vancouver. This can be inconvenient, not to mention tedious, for those of us in Surrey or Burnaby. But I expect this is how we will start.

I would, however, be delighted if there was enough interest in other parts of the Lower Mainland to start up satellite groups in other locations as well.

Could we meet virtually?  In this day and age, it should be cheap and practical to do a simple webcast of meetings. Some may want to participate remotely. An IRC channel or Twitter “second screen” may emerge. But in my experience, the networking which I suspect will be our biggest contribution will come from in-person attendance.

When

In an era of busy schedules, finding a time to meet is likely an overconstrained problem. Our technology industry tends to hold meetings like this on weekday evenings, sometimes over beer, and I suspect that is how we will start. But it is interesting to consider breakfast or lunch meetings.

When to get started?  The arrival of Localization World 2014 in Vancouver got a dozen local localization people to attend, and provided the impetus to turn interest into concrete plans. After Localization world, we started communicating and planning. The net result was a first meeting in mid-day of Monday, December 8, 2014. Despite the holiday distraction, we were able to land a spot guest-presenting to VanDev on 6 essentials every developer should know about international. Our next opportunity to meet will likely be April 2015, perhaps March.

How

The Twitter feed @VanGLUG was our first communications channel. I encourage any Twitter user interested in monitoring this effort to follow @VanGLUG. We have 37 followers at the moment. We were using the twitter handle @IMLIG1604 before, and changed that name while keeping our followers. The present @IMLIG1604 handle is a mop-up account, to point stragglers to @VanGLUG.We created a group on LinkedIn to use as a discussion forum. This has the snappy and memorable URL https://www.linkedin.com/groups?home=&gid=6805530. If you use LinkedIn, are in the Lower Mainland or nearby, and are interested in localization and related disciplines, we welcome you joining the LinkedIn Group. We are also accepting members from out of area (for instance, Washington and Oregon) in the interests of cross-group coordination. But for location-independent localization or globalization discussion, there are more appropriate groups already on LinkedIn.

Subsequent communications channels might perhaps include a Meetup group (if we want to put up the money), an email list, an outpost on a Facebook page, and other channels as there is interest.

GALA (the Globalization and Language Association) is one of our industry organisations. It has a membership and affiliate list that includes people from the Vancouver region. I spoke with one of their staff at Localization World. They are interested in encouraging local community groups. I believe this initiative is directly in line with their interest: we can be the local GALA community for here.  They have included us in a list of regional Localization User Groups. We are also on IMUG’s list of “IMUG-style” groups.
Do you want to see this meetup grow? If so, I welcome your input and participation. You can tweet to @VanGLUG, post comments on this blog, or send me email at jdlh “at” jdlh.com. Call me at +1-604-376-8953.

See you at the meetings!

A Technology Globalization meetup for the Vancouver Area: (2) Why, Naming

Posted by Jim DeLaHunt on 31 Dec 2014 | Tagged as: Unicode, Vancouver, culture, i18n, language, meetings and conferences, multilingual, software engineering

I am helping to start a regular face-to-face event series which will bring together the people in the Vancouver area who work in technology globalization, internationalization, localization, and translation (GILT) for networking and learning. This post is the second in a series where I put into words my percolating thoughts about this group.  See also, A Technology Globalization meetup for the Vancouver Area: (1) What, Who (Oct 31, 2014).

Happily, this group has already started. We held our first meeting on Monday, Dec 8, 2014. Our placeholder Twitter feed is @imlig1604; follow that and you’ll stay connected when we pick our final name. And we have a group on LinkedIn for sharing ideas. The link isn’t very memorable, but go to LinkedIn Groups and search for “Vancouver localization”; you will find us. (We don’t yet have an account on the Meetup.com service.)  If you are in the Lower Mainland and are interested, I would welcome your participation.

Continuing with my reflections about this group, here are thoughts on why this group should exist, and what it might be named.

Continue Reading »

A Technology Globalization meetup for the Vancouver Area: (1) What, Who

Posted by Jim DeLaHunt on 31 Oct 2014 | Tagged as: Unicode, Vancouver, i18n, language, meetings and conferences, multilingual, software engineering

The time has come, I believe, for a regular face-to-face event series which will bring together the people in the Vancouver area who work in technology globalization, internationalization, localization, and translation (GILT) for networking and learning.  The Vancouver tech community is large enough that we have a substantial GILT population. In the last few weeks, I’ve heard from several of us who are interested in making something happen. My ambition is to start this series off by mid-December 2014.

Continue Reading »

How to extract URLs with Apache OpenOffice, from formatted text and HTML tables

Posted by Jim DeLaHunt on 31 Mar 2014 | Tagged as: robobait, software engineering

I use and value a good spreadsheet application the way chefs use and value good knives. I have countless occasions to do ad-hoc data processing and conversion, and I tend to turn to spreadsheets even more often I turn to a good text editor. I know a lot of ways to get the job done with spreadsheets. But recently I learned a new trick. I’m delighted to share it with you here.

The situation: you have an HTML document, with a list of linked text. Imagine a list of projects, each with a link to a project URL (the names aren’t meaningful):

The task is to convert this list of formatted links into a table, with the project name in column A, and the URL in column B.  The trick is to use an OpenOffice macro, which exposes the URL (and other facets of formatted text) as OpenOffice functions. Continue Reading »

A good-practice list of i18n API functionality

Posted by Jim DeLaHunt on 30 Nov 2013 | Tagged as: culture, i18n, meetings and conferences, multilingual, software engineering, web technology

Think of the applications programming interface (API) for an application environment: an operating system, a markup language, a language’s standard library. What internationalisation (i18n) functionality would you expect to see in such an API? There are some obvious candidates: a text string substitution-from-resources capability like gettext(). A mechanism for formatting dates, numbers, and currencies in culturally appropriate ways. Data formats for text that can handle text in a variety of languages. Some way to to determine what cultural conventions and language the user prefers. There is clearly a whole list one could make.

Wouldn’t it be interesting, and useful, to have such a list?  Probably many organisations have made such lists in the past. Who has made such a list? Are they willing to share it with the internationalisation and localisation community? Is there value in developing a “good practices” statement with such a list?  And, most importantly, who would like to read such a list? How would it help them? In what way would such a list add value? Continue Reading »

Top Posts: StackOverflow “How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?”

Posted by Jim DeLaHunt on 31 Jul 2013 | Tagged as: Unicode, robobait, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my second best-voted answer in StackOverflow so far.

The question, How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?,  was asked by user kvedananda in February 2012. In abbreviated form, it was:

Continue Reading »

Top Posts: StackOverflow “Django headache with simple non-ascii string”

Posted by Jim DeLaHunt on 31 May 2013 | Tagged as: Python, Unicode, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my top-voted answer in StackOverflow so far.

The question, Django headache with simple non-ascii string,  was asked by user Ezequiel in January 2010. In abbreviated form, it was:

Continue Reading »

Next Page »