software engineering

Archived Posts from this Category

Is easy enough to eclipse

Posted by on 29 Feb 2012 | Tagged as: Python, software engineering

There is a special place in heaven for those who make free-libre software engineering tools available to journeyman programmers like me. I’m grateful to the Eclipse project for their comprehensive integrated development environment. A few years ago, when I chose Eclipse as my Python-language programming environment, Eclipse wasn’t very easy to install, especially on the Mac. Into the gap rode the EasyEclipse project. They offered distributions of Eclipse and related modules, targeted at various kinds of developer and at various language preferences, in packaging that was simple and ready to go. I used their EasyEclipse for Python 1.3.1 product as my primary development environment for several years, and it was great for me.

Alas, the EasyEclipse project appears to be stagnating.   They haven’t updated their builds to the latest version of Eclipse and language-specific plug-ins. (They still use Eclipse 3.3, current is 3.7.)  Their Eclipse build is throwing errors in the Software Update feature, because the latest plugins are too new for their old Eclipse core. They aren’t responding to bug reports and forum posts. They aren’t even responding to my message sent in response to their plea for helpers to take over the project.

In the meantime, the Eclipse project’s distributions are now easier to use. You can download Eclipse builds for Mac OS.  They have builds targeted to various segments of developers. They have extensive documentation. They have a update manager within Eclipse, to make it easier to stay current.

So, the question is: is the core Eclipse project now easy enough to install that there’s no more need for a project like EasyEclipse?   Is easy enough to eclipse  If not, can the core Eclipse project learn lessons from EasyEclipse and become easy enough?  Or is there a niche for the EasyEclipse long-term?  I recently downloaded a current Eclipse build, and fitted it out for Python and PHP programming.  My experience gives me opinions on these questions.

Continue Reading »

Ads Factory “GoogleX, GoogleY” means (lat, long) not (horizontal, vertical)

Posted by on 30 Jun 2011 | Tagged as: CMS, Joomla, robobait, software engineering, web technology

I want to pass along a tip about confusing field names used in the Ads Factory component for Joomla for geographic data.  I encountered this while customising this component for a client. At first I thought it was a bug, but now I think it’s just an odd naming convention.

Ads Factory, by Romanian developers The Factory,  is a commercial component for Joomla 1.5 which lets you add classified ads to your Joomla site. (My client had me working with version 1.x on Joomla 1.5, but I see there is also a version 2.1 of Ads Factory which is Joomla 1.6 native.) There are quite a few places where Ads Factory includes geographic information: each user record can record a latitude and longitude for that user; each ad can record a latitude and longitude for the advertised merchandise; and there is way to make a “radius search”, i.e. find all ads within a given distance of a user-specified location.

These latitude and longitude values are stored in database fields with name suffixes “X” and “Y”. The user’s latitude and longitude are stored in fields “GoogleX” and “GoogleY” of the Ads Factory user table. Similarly, but not completely consistently, the ad’s latitude and longitude are stored in fields “MapX” and “MapY” of the Ads Factory ads table. The confusion comes in understanding which field stores the latitude, and which stores the longitude.

Latitude is, of course, the signed number of degrees north of the equator of a point on the earth’s surface. It ranges from +90.0 (the North Pole) to 0.0 (the Equator) to -90.0 (the South Pole). Thus, it’s a vertical coordinate. Longitude is the signed number of degrees east of the 0° meridian (roughly Greenwich, England). It ranges from +180.0 to -180.0. My part of North America is 122-123° west of Greenwich, so we have longitudes of -123.0 to -122.0 or so. It’s a horizontal coordinate. This is a well-established convention in many mapping standards.

Tidy Cartesian mathematicians like me use the convention of (X,Y) coordinates, where X is the horizontal coordinate and Y is the vertical coordinate. This is a well-established convention in geometry and graphics (though there are some exceptions).

My first interpretation of Ads Factory field names like  “GoogleX” and “GoogleY” was to interpret them according to the Cartesian convention: X is horizontal, and so stores longitude, while Y is vertical, and so stores latitude. Thus (MapX, MapY) would be (longitude, latitude), the opposite of what one expects from mapping. Odd. I was surprised to find some parts of the code storing latitude in X (the horizontal coordinate!) and longitude in Y (the vertical!), which was surely a bug. I was horrified when it appeared that every part of this code had the same bug!

Then I understood the convention. Ads Factory’s developer appear to have used the (X, Y) convention to indicate just the order of the coordinates, but not their Cartesian meaning.  (MapX, MapY) means (latitude, longitude), as is conventional in mapping.  X is the vertical coordinate, Y is the horizontal coordinate, in the Ads Factory context. If you remember that X means “first”, not horizontal, and Y means “second”, not vertical, the Ads Factory field names are self-consistent, and the code uses them correctly.

I haven’t seen any Ads Factory documentation which explains this, so I hope this note will help some of you Ads Factory enhancers who are using these fields.

Postscript: what did my client ask me to do with Ads Factory for their site?  Modify the radius search to search around the user’s latitude and longitude, instead of a location the user enters. Also, to sort the keyword and category search results by distance from the user. Quite straightforward to do, though it requires customisations to the Ads Factory code that have to be re-done everytime one upgrades the Ads Factory component.

How to resolve EasyEclipse error ‘Eclipse… requires plug-in “system.bundle”‘

Posted by on 31 Mar 2011 | Tagged as: robobait, software engineering, web technology

I use the EasyEclipse distribution of Eclipse, the free (libre) software development environment. I just figured out how to fix an obscure error message:

Eclipse Web tools editors (2.0.1) requires plug-in "system.bundle"
Eclipse Data Tools (1.5.1) requires plug-in "system.bundle"

When I would start up EasyEclipse (version 1.3.1 for Mac OS X, with Python, C++, Java, PHP and more support added), it would tell me that I had some outdated components, and offer to update them for me.  But when I opened the menu item Help… Software Updates… Manage configuration, I would get the ominous error alert:

“The current configuration contains errors and this operation can have unpredictable results. Do you want to continue? [Cancel] [OK]”.

I wasn’t able to  find documentation about this problem specifically. (My purpose in writing this is to help others benefit from what I learned.)

Continue Reading »

Choosing between UTF-8 and UTF-16: which has the better bytes-per-character ratio?

Posted by on 31 Dec 2010 | Tagged as: i18n, language, software engineering, Unicode

Software engineers sometimes are called on to specify which encoding a text file format should use.  These days, the top contenders for encoding are UTF-8 and UTF-16, both based on the Unicode Standard. One factor (amongst several, and perhaps not the most compelling) in choosing between them is storage efficiency: the number of bytes per character, or amount of storage per unit of text. If a given text takes a kilobyte of storage in UTF-8 and twice that in UTF-16, that’s a difference, which may be meaningful.

I recently looked for quantitative data about space efficiency of UTF-8 and UTF-16, and couldn’t find very much. Engineering discussions about storage efficiency are better informed by quantitative data than by opinion and supposition. I want to give one morsel of quantitative data more visibility, and clarify this issue. Continue Reading »

11 Django gotchas

Posted by on 31 Aug 2010 | Tagged as: Python, robobait, software engineering, Unicode, web technology

This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding.  I decided to use the Django web framework and the Python language for the Twanguages analysis code.  I know Python, but I was learning Django for the first time.

Django is really, really marvellous.  When I tried this expression, and got the Python array of records I was expecting,

q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)

I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”

But Django gave me fits.  It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log.  Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles.  I share them in the hopes of helping another Django novice.

Here are my Django gotchas.  They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.

  1. API fails unhelpfully. I wrote a simple query expression like:
    S2 = models.TwStatus.objects.get( key )

    I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:

    S2 = models.TwStatus.objects.get( id__exact=key )

    The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development. Continue Reading »

Why the PostScript language is Turing-complete

Posted by on 30 Apr 2010 | Tagged as: software engineering

A couple of weeks ago on the XML-dev mailing list, there was a discussion comparing declarative and procedural computer languages. Someone wondered why the PostScript language, though used mostly for declarative purposes like describing pages, was still a Turing-complete programming language. That’s actually a topic I know something about, so I contributed the following answer. I’m posting it here, lightly edited, because I thought it might be of wider interest. —JDLH

A good place to go for a discussion of why it is Turing-complete, despite being intended to describe page appearance, is in the Introduction (Chapter 1) of the PostScript Language Reference Manual.

In particular, it says, “The extensive graphics capabilities of the PostScript language are embedded in the framework of a general-purpose programming language. The language includes a conventional set of data types, such as numbers, arrays, and strings; control primitives, such as conditionals, loops, and procedures; and some unusual features, such as dictionaries. These features enable application programmers to define higher-level operations that closely match the needs of the application and then to generate commands that invoke those higher-level operations. Such a description is more compact and easier to generate than one written entirely in terms of a fixed set of basic operations.” Continue Reading »

How to make standalone Django documentation on Mac OS X 10.5 using MacPorts.

Posted by on 06 Aug 2009 | Tagged as: Python, robobait, software engineering, web technology

One of the many nice touches of the Django framework is that it provides tools and instructions to make a standalone Django documentation set from its distribution.  (Django is an application framework for the Python language that helps with database access and web application.)  Standalone docs are great for people like me who work on a laptop and are sometimes off the net. But I’m using Mac OS X, I get my code through Macports, and Django’s instructions don’t quite cover this case.  So I just figured it out.  Here’s the tricks I needed.  Maybe it will help you.

Continue Reading »

Heads up for 1234567890 day!

Posted by on 12 Feb 2009 | Tagged as: software engineering, time, Vancouver

1000000000 seconds since the POSIX epoch, as celebrated in Denmark in 2001During a high school class, my teacher interrupted his discussion of classical Greek history to say, “it’s twelve thirty-four on the fifth of June, 1978”. In other words, 12:34 5/6/78 (in the British notation). Alert people in the United States had already celebrated that moment on May 6th. If you missed that moment, you have another chance on Friday: 1234567890 day.

Humans love to find patterns, and dates have rich potential for that. For instance, I was walking through a train station on a business trip in Tokyo in February, 1990. I noticed that people were making an unusual fuss about the train tickets. 1990 was 平成2年 , or “Heisei year 2”, in the calendar based on the Japanese era name. The date was printed on the train tickets as “H2-2-2”. The symmetry made them collectors items. (I wish I could lay my hands on a ticket from that day, to convince myself I didn’t invent this memory…)

I have a fondness for finding leaks in the software engineering abstractions that represent our messy real world.  I wrote last year about POSIX time, and the limitations in its representation of modern calendars and time zones. So when a leaky abstractions turns up as a pretty pattern, it’s irresistible.  And that’s what happens this Friday.

Continue Reading »

Simple script-detection algorithm for font switching?

Posted by on 26 Aug 2008 | Tagged as: i18n, language, multilingual, software engineering, Unicode

Does anybody know of a simple script-detection algorithm (or heuristic) for font switching?

This came up with one of my clients. Suppose you have a guest book on your web site, and seven visitors left you the following inspiring messages:

  1. すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利とについて平等である。
  2. 人人生而自由,在尊严和权利上一律平等。
  3. Semua orang dilahirkan merdeka dan mempunyai martabat dan hak-hak yang sama.
  4. 人人生而自由,在尊嚴和權利上一律平等。
  5. Alle Menschen sind frei und gleich an Würde und Rechten geboren.
  6. ‘Ολοι οι άνθρωποι γεννιούνται ελεύθεροι και ίσοι στην αξιοπρέπεια και τα δικαιώματα.
  7. 모든 인간은 태어날 때부터 자유로우며 그 존엄과 권리에 있어 동등하다.

(It looks like your visitors all read the Universal Declaration of Human Rights courtesy of the UDHR in Unicode project).

Now suppose you are so touched that you want to lay out all seven messages in a PDF file, and print it out as a booklet.  You have a beautiful layout template, and various complementary fonts: Latin script, Japanese, Korean, simplified Chinese, Traditional Chinese, and Greek script.

Which font to you apply to each message?  More importantly, is there a simple heuristic by which software can make the choice? (More after the jump.)

Continue Reading »

“Do all languages in the world use Western numerals (1, 2, 3 etc) to express numerical values?”

Posted by on 02 Apr 2008 | Tagged as: culture, i18n, language, software engineering

One of the answers I occasionally write at LinkedIn Answers seemed worth reposting here. The question was: “Do all languages in the world use Western numerals (1, 2, 3 etc) to express numerical values?“. My answer (slightly revised):

The simple answer to your question is, “No”. Or, “Yes”. It depends which exact question you are asking.

Is it the case that all languages in the world use only Western numerals (usually known as “Arabic” or “Hindu-Arabic numerals“, by the way) to express numerical values? No. Many languages use multiple number forms, depending on context. In the English language, for example, a numerical value could be expressed with words (“one”) in text, Hindu-Arabic numerals (“1”) in a technical context, or Roman numerals (“i”, “I”) in lists. Arabic, Hindi, Japanese, and Chinese all have native characters to express numerical values, which are used in some contexts.

Do all languages in the world use Western numerals sometimes, in some contexts, to express numerical values? Yes — mostly, probably. The qualifications are because I hate to make generalisations about human culture; it’s so diverse. And, note that languages without written forms probably don’t use Hindu-Arabic numerals at all.

Is it the case that Western numerals are — in all cultures, in all contexts — the idiomatic, preferred way to express numerical values? No. They aren’t even sufficient for all contexts in English (viz “one”, “i”).

Do all cultures which use Western numerals to express numerical values do so in the same way? No. In particular, the punctuation between the whole and the fractional part of a number, and the grouping of digits, differ by cultures. North America uses “1,234,567.89”; many European cultures use “1.234.567,89”; I’ve seen Japanese texts that say “123,4567.89”. See the CLDR number format patterns, creating international number formats in Excel, and the user guide to ICU formatting numbers.

Let’s shift focus from expressing numbers in cultures to implementing numbers in software products.

If you were making priority decisions for a software product (that’s my background) to expand its market internationally, and that product expresses numerical values using Hindu-Arabic numerals in some contexts appropriate in North America, can you be confident that it’s the only system you’ll need to express numerical values? No. You of course need to look at the cultural requirements of each new market as you go. But I’m confident that over time, some market will require some system other than Hindu-Arabic numerals to express numerical values. So I’m confident that sooner or later, you will have to give that software product the ability to express numerical values in a variety of ways (i.e., to internationalise it).

Postscript: the questioner, LinkedIn product manager Minna King, was kind enough to mark this as the “Best Answer” of the six posted.

[Edited for clarity based on reader feedback.]

« Previous PageNext Page »