software engineering

Archived Posts from this Category

How to extract URLs with Apache OpenOffice, from formatted text and HTML tables

Posted by Jim DeLaHunt on 31 Mar 2014 | Tagged as: robobait, software engineering

I use and value a good spreadsheet application the way chefs use and value good knives. I have countless occasions to do ad-hoc data processing and conversion, and I tend to turn to spreadsheets even more often I turn to a good text editor. I know a lot of ways to get the job done with spreadsheets. But recently I learned a new trick. I’m delighted to share it with you here.

The situation: you have an HTML document, with a list of linked text. Imagine a list of projects, each with a link to a project URL (the names aren’t meaningful):

The task is to convert this list of formatted links into a table, with the project name in column A, and the URL in column B.  The trick is to use an OpenOffice macro, which exposes the URL (and other facets of formatted text) as OpenOffice functions. Continue Reading »

A good-practice list of i18n API functionality

Posted by Jim DeLaHunt on 30 Nov 2013 | Tagged as: culture, i18n, meetings and conferences, multilingual, software engineering, web technology

Think of the applications programming interface (API) for an application environment: an operating system, a markup language, a language’s standard library. What internationalisation (i18n) functionality would you expect to see in such an API? There are some obvious candidates: a text string substitution-from-resources capability like gettext(). A mechanism for formatting dates, numbers, and currencies in culturally appropriate ways. Data formats for text that can handle text in a variety of languages. Some way to to determine what cultural conventions and language the user prefers. There is clearly a whole list one could make.

Wouldn’t it be interesting, and useful, to have such a list?  Probably many organisations have made such lists in the past. Who has made such a list? Are they willing to share it with the internationalisation and localisation community? Is there value in developing a “good practices” statement with such a list?  And, most importantly, who would like to read such a list? How would it help them? In what way would such a list add value? Continue Reading »

Top Posts: StackOverflow “How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?”

Posted by Jim DeLaHunt on 31 Jul 2013 | Tagged as: Unicode, robobait, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my second best-voted answer in StackOverflow so far.

The question, How do I get SQLAlchemy to correctly insert a unicode ellipsis into a mySQL table?,  was asked by user kvedananda in February 2012. In abbreviated form, it was:

Continue Reading »

Top Posts: StackOverflow “Django headache with simple non-ascii string”

Posted by Jim DeLaHunt on 31 May 2013 | Tagged as: Python, Unicode, software engineering

I post on various forums around the net, and a few of my posts there get some very gratifying kudos. I’ve been a diligent contributor to StackOverflow, the Q-and-A site for software developers. I’m in the top 15% of contributors overall, and one of the top 25 answerers of Unicode-related questions. Here’s my top-voted answer in StackOverflow so far.

The question, Django headache with simple non-ascii string,  was asked by user Ezequiel in January 2010. In abbreviated form, it was:

Continue Reading »

Eclipse + Mac OS X 10.5.8 + Java SE 6 => 64 bits

Posted by Jim DeLaHunt on 31 Aug 2012 | Tagged as: robobait, software engineering

I recently had an adventure trying to get a plugin, PDT, installed into my Eclipse software development environment. Diagnosis was hard, and the conclusion was non-obvious, though in hindsight, reasonable. It is that if you want to use an Eclipse plug-in that requires Java SE 6 on a Mac OS X 10.5.8 computer, you will need the 64-bit version of Eclipse.

Let me explain.

Continue Reading »

Is Eclipse.org easy enough to eclipse EasyEclipse.org?

Posted by Jim DeLaHunt on 29 Feb 2012 | Tagged as: Python, software engineering

There is a special place in heaven for those who make free-libre software engineering tools available to journeyman programmers like me. I’m grateful to the Eclipse project for their comprehensive integrated development environment. A few years ago, when I chose Eclipse as my Python-language programming environment, Eclipse wasn’t very easy to install, especially on the Mac. Into the gap rode the EasyEclipse project. They offered distributions of Eclipse and related modules, targeted at various kinds of developer and at various language preferences, in packaging that was simple and ready to go. I used their EasyEclipse for Python 1.3.1 product as my primary development environment for several years, and it was great for me.

Alas, the EasyEclipse project appears to be stagnating.   They haven’t updated their builds to the latest version of Eclipse and language-specific plug-ins. (They still use Eclipse 3.3, current is 3.7.)  Their Eclipse build is throwing errors in the Software Update feature, because the latest plugins are too new for their old Eclipse core. They aren’t responding to bug reports and forum posts. They aren’t even responding to my message sent in response to their plea for helpers to take over the project.

In the meantime, the Eclipse project’s distributions are now easier to use. You can download Eclipse builds for Mac OS.  They have builds targeted to various segments of developers. They have extensive documentation. They have a update manager within Eclipse, to make it easier to stay current.

So, the question is: is the core Eclipse project now easy enough to install that there’s no more need for a project like EasyEclipse?   Is Eclipse.org easy enough to eclipse EasyEclipse.org?  If not, can the core Eclipse project learn lessons from EasyEclipse and become easy enough?  Or is there a niche for the EasyEclipse long-term?  I recently downloaded a current Eclipse build, and fitted it out for Python and PHP programming.  My experience gives me opinions on these questions.

Continue Reading »

Ads Factory “GoogleX, GoogleY” means (lat, long) not (horizontal, vertical)

Posted by Jim DeLaHunt on 30 Jun 2011 | Tagged as: CMS, Joomla, robobait, software engineering, web technology

I want to pass along a tip about confusing field names used in the Ads Factory component for Joomla for geographic data.  I encountered this while customising this component for a client. At first I thought it was a bug, but now I think it’s just an odd naming convention.

Ads Factory, by Romanian developers The Factory,  is a commercial component for Joomla 1.5 which lets you add classified ads to your Joomla site. (My client had me working with version 1.x on Joomla 1.5, but I see there is also a version 2.1 of Ads Factory which is Joomla 1.6 native.) There are quite a few places where Ads Factory includes geographic information: each user record can record a latitude and longitude for that user; each ad can record a latitude and longitude for the advertised merchandise; and there is way to make a “radius search”, i.e. find all ads within a given distance of a user-specified location.

These latitude and longitude values are stored in database fields with name suffixes “X” and “Y”. The user’s latitude and longitude are stored in fields “GoogleX” and “GoogleY” of the Ads Factory user table. Similarly, but not completely consistently, the ad’s latitude and longitude are stored in fields “MapX” and “MapY” of the Ads Factory ads table. The confusion comes in understanding which field stores the latitude, and which stores the longitude.

Latitude is, of course, the signed number of degrees north of the equator of a point on the earth’s surface. It ranges from +90.0 (the North Pole) to 0.0 (the Equator) to -90.0 (the South Pole). Thus, it’s a vertical coordinate. Longitude is the signed number of degrees east of the 0° meridian (roughly Greenwich, England). It ranges from +180.0 to -180.0. My part of North America is 122-123° west of Greenwich, so we have longitudes of -123.0 to -122.0 or so. It’s a horizontal coordinate. This is a well-established convention in many mapping standards.

Tidy Cartesian mathematicians like me use the convention of (X,Y) coordinates, where X is the horizontal coordinate and Y is the vertical coordinate. This is a well-established convention in geometry and graphics (though there are some exceptions).

My first interpretation of Ads Factory field names like  “GoogleX” and “GoogleY” was to interpret them according to the Cartesian convention: X is horizontal, and so stores longitude, while Y is vertical, and so stores latitude. Thus (MapX, MapY) would be (longitude, latitude), the opposite of what one expects from mapping. Odd. I was surprised to find some parts of the code storing latitude in X (the horizontal coordinate!) and longitude in Y (the vertical!), which was surely a bug. I was horrified when it appeared that every part of this code had the same bug!

Then I understood the convention. Ads Factory’s developer appear to have used the (X, Y) convention to indicate just the order of the coordinates, but not their Cartesian meaning.  (MapX, MapY) means (latitude, longitude), as is conventional in mapping.  X is the vertical coordinate, Y is the horizontal coordinate, in the Ads Factory context. If you remember that X means “first”, not horizontal, and Y means “second”, not vertical, the Ads Factory field names are self-consistent, and the code uses them correctly.

I haven’t seen any Ads Factory documentation which explains this, so I hope this note will help some of you Ads Factory enhancers who are using these fields.

Postscript: what did my client ask me to do with Ads Factory for their site?  Modify the radius search to search around the user’s latitude and longitude, instead of a location the user enters. Also, to sort the keyword and category search results by distance from the user. Quite straightforward to do, though it requires customisations to the Ads Factory code that have to be re-done everytime one upgrades the Ads Factory component.

How to resolve EasyEclipse error ‘Eclipse… requires plug-in “system.bundle”‘

Posted by Jim DeLaHunt on 31 Mar 2011 | Tagged as: robobait, software engineering, web technology

I use the EasyEclipse distribution of Eclipse, the free (libre) software development environment. I just figured out how to fix an obscure error message:

Eclipse Web tools editors (2.0.1) requires plug-in "system.bundle"
Eclipse Data Tools (1.5.1) requires plug-in "system.bundle"

When I would start up EasyEclipse (version 1.3.1 for Mac OS X, with Python, C++, Java, PHP and more support added), it would tell me that I had some outdated components, and offer to update them for me.  But when I opened the menu item Help… Software Updates… Manage configuration, I would get the ominous error alert:

“The current configuration contains errors and this operation can have unpredictable results. Do you want to continue? [Cancel] [OK]”.

I wasn’t able to  find documentation about this problem specifically. (My purpose in writing this is to help others benefit from what I learned.)

Continue Reading »

Choosing between UTF-8 and UTF-16: which has the better bytes-per-character ratio?

Posted by Jim DeLaHunt on 31 Dec 2010 | Tagged as: Unicode, i18n, language, software engineering

Software engineers sometimes are called on to specify which encoding a text file format should use.  These days, the top contenders for encoding are UTF-8 and UTF-16, both based on the Unicode Standard. One factor (amongst several, and perhaps not the most compelling) in choosing between them is storage efficiency: the number of bytes per character, or amount of storage per unit of text. If a given text takes a kilobyte of storage in UTF-8 and twice that in UTF-16, that’s a difference, which may be meaningful.

I recently looked for quantitative data about space efficiency of UTF-8 and UTF-16, and couldn’t find very much. Engineering discussions about storage efficiency are better informed by quantitative data than by opinion and supposition. I want to give one morsel of quantitative data more visibility, and clarify this issue. Continue Reading »

11 Django gotchas

Posted by Jim DeLaHunt on 31 Aug 2010 | Tagged as: Python, Unicode, robobait, software engineering, web technology

This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding.  I decided to use the Django web framework and the Python language for the Twanguages analysis code.  I know Python, but I was learning Django for the first time.

Django is really, really marvellous.  When I tried this expression, and got the Python array of records I was expecting,

q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)

I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”

But Django gave me fits.  It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log.  Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles.  I share them in the hopes of helping another Django novice.

Here are my Django gotchas.  They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.

  1. API fails unhelpfully. I wrote a simple query expression like:
    S2 = models.TwStatus.objects.get( key )

    I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:

    S2 = models.TwStatus.objects.get( id__exact=key )

    The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development. Continue Reading »

« Previous PageNext Page »