software engineering

Archived Posts from this Category

A tale of cascading open source tasks: MacPorts documentation

Posted by on 30 Apr 2022 | Tagged as: software engineering, web technology

All I wanted to do was to add a mention of Github changeset links to the MacPorts documentation.

I like contributing to free software projects. One way I pick which specific contribution to make, is that I recall something which was an obstacle for me. I come up with a change which would reduce that obstacle. I look through the project’s open source for how to make that change. Then I make a pull request, or submit a patch, which contains that change.

Sometimes, in order to make the change, I need to learn the tools or languages which that project uses for its code or documentation. That can lead to a cascade of subsidiary tasks or related changes. This is a story of that happening to a ridiculous degree, about 20 months ago.

Continue Reading »

StackOverflow 10K

Posted by on 31 Jan 2022 | Tagged as: i18n, Python, software engineering, technical support, Unicode, web technology

I have been active on StackOverflow for more than twelve years. StackOverflow is a phenomenally successful question and answer website, aimed at software developers seeking technical answers. Part of what makes StackOverflow successful is that it gamifies “reputation”: your reputation goes up when you write good answers, and ask good questions, and otherwise help. On 23 December 2021, my StackOverflow reputation rose past 10,000. This is a gratifying milestone.

I am user Jim DeLaHunt on StackOverflow. I apparently posted my first question there on 23. November, 2009. I asked if anyone could point me to “an XML language for describing file attributes of a directory tree?” I did not get a good direct answer. I did get a reference to the XML-dev email list, which I follow to this day. My first answer was to my own question about the XML language. My first answer to someone else’s question was about three weeks later, and it was about detecting a character encoding.

Over twelve years, I have written 133 answers, most of which languish in obscurity. Three have earned particularly many upvotes (and, between them, over 40% of my reputation):

  1. How to escape apostrophe (‘) in MySql?” This is a pretty simple answer. I suspect that it gets a lot of upvotes because many people ask this question. My answer also has the virtue that it quotes a specific clause in the official documentation to prove that the answer is correct. Not all StackOverflow answers cite reliable sources. This answer has earned 226 votes to date, bringing in over 22% of my total reputation.
  2. Is there a way to pass optional parameters to a function?” This too is a simple answer to a frequently-asked question. I cited an official source in this answer also. This answer has earned 116 votes to date, bringing in over 11% of my total reputation.
  3. What exactly is a “raw string regex” and how can you use it?” I think this is the best answer of the three. It finds a way to clarify a particularly murky area of the Python language, which often baffles people. I think it is easier to understand than the official documentation. This answer has earned 108 votes to date, bringing in over 10% of my total reputation. I think it was a vote on this question which put me over 10,000. I like that.

StackOverflow turns the reputation score into a variety of rankings. They put me in the top 4% for reputation overall. This sounds very impressive, until you learn that I am only 24,308-ranked among all participants. Mind you, there are over 16 million participants. I imagine there is a long, inactive tail, compared to which my small activity looks great.

In a similar vein, StackOverflow ranks me among the top 5% in the topics of “Python” and “MySQL“; the top 10% in “Unicode“; and the top 20% in “Internationalization“, “UTF-8“, and “Django“. That reflects some combination of effort on my part, and flattery due to the long, inactive tail.

I put a lot of work, 8-10 years ago, into answering questions and building my reputation. Now I find that upvotes trickle in for my existing 133 questions. My reputation rises surprisingly steadily, even if I don’t contribute anything new, giving me a kind of StackOverflow pension. But I still get satisfaction from plugging away there every now and again, trying to find a good question and write a clear answer. Maybe, in less than 12 years from now, I might reach StackOverflow 20,000.

I learned about software engineering from that: the exception

Posted by on 31 Jul 2021 | Tagged as: robobait, software engineering

I have been writing software for a long time. But every now and then I get flummoxed when my code misbehaves. It can take me a long time to figure out what is wrong. Often, it turns out to be a combination of basic mistakes which I have made, which remind me of some basic lessons in software engineering, which I didn’t realise that I was not applying. This happened to me recently. It is a story I call, “the exception”. Maybe, if I tell you the story, you can learn from my mistakes, and I can reinforce the lesson to myself.

I had been working on a set of command-line tools for a consulting client in recent years. The purpose of the tools is to take some input files, modify them in related ways, and write out the modified files. (This whole explanation is very simplified, to protect the client’s confidentiality, and to clarify the story.) Let’s say that I started off with a tool that would merge input file A and input file B, and write output file AB. Then as an experiment, I made a tool which would modify the contents of file B during the merge with A, and write output file ABʹ. I put this experiment away in a branch for quite a while. In the meantime, I extended the tool program so that it would take a directory D in place of file B, and merge file A with every file in D. So if D contained files D1, D2, D3, …, the tool merged and wrote AD1, AD2, AD3, etc.

The tools were a set of Python packages and modules. There was a module, merge_command.py, which was a command-line tool. There was also a package, modifiers, with four modules: common.py, to do the basic work common to all tools; and merge.py, to do the specific work of the merge tool. There were also other modules for other modifications, represented by shorten.py. They don’t matter for this story. Finally, like every Python package, modifiers had an __init__.py module to provide an interface that hides the internal structure of the package.

The code structure was something like this:

src/
    merge_command.py    # command-line program
    modifiers/          # package to do file modifications
        __init__.py       # package interface 
        common.py         # basic work for all tools
        merge.py          # specific work of merge tool
        shorten.py        # specific work another tool 

There were some further details which matter to the story. Each of the modules in modifiers defined a class to do its work: Common to do the work for all tools, Merge to do the work of the merge tool, etc. Class Merge and other classes inherited from class Common. So, merge_command.py gathered and checked command-line arguments, including paths for the input and output files, then passed them to an instance of class Merge to do the work. Merge in turn did some preparation to express the work in terms of what Common could do, then called superclass methods in Common to complete the work.

merge also defined some exceptions, represented here by exception InputFileNotFound. If class Merge could not find one of the files given in its path arguments, it raised this exception. (Classes like Merge never interact with the user; their caller has that responsibility.) merge_command.py caught this specific exception, and printed out an appropriate error message, and continued with any further work it could do. merge_command.py also caught any other exception. It would be unexpected and the tool had to stop. merge_command.py printed a general error message and stopped.

Finally, there was a test module for almost every code module. The only one that matters for this story was merge_command_test.py, which had tests of running the entire command-line tool. One test was test_missing_file(), which ran the tool with a path to an intentionally-missing input file, and checked that the tool printed the correct error message. I didn’t notice at the time, but this test contained my first error, which I will discuss below.

Thus, in more detail, the structure of the code was like this:

src/
    merge_command.py    # command-line program
       imports InputFileNotFound from modifiers
       imports Merge from modifiers
       creates instance of class Merge
       calls Merge instance with file paths etc.
       handles InputFileNotFound, prints specific message
       handles any other exception, prints general message
    merge_command_test.py  # tests merge_command overall
        test_missing_files() 
            calls merge_command with path known to not exist
            confirms that merge_command raised exception
            checks error message against expected text
    modifiers/          # package to do file modifications
        __init__.py # package interface
            imports Common from common
            imports Merge from merge
            imports InputFileNotFound from merge
            # more, but details don't matter for this story
        common.py # basic work for all tools
            defines class Common
        merge.py # specific work of merge tool
            imports Common from modifiers
            defines class Merge, inherits from Common
            if file missing, raise InputFileNotFound
            defines InputFileNotFound
        shorten.py # specific work another tool
            # details don't matter for this story

At first, only class Merge worked with paths to directories. Other classes in package modifiers worked with paths to individual files. But then I wanted other classes to work with paths to directories also. At that point, I decided to abstract out the parts of Merge which worked with directories and iterated through the files, into a new class Directories, which inherited from Common. I changed Merge to inherit from Directories instead. merge_command.py did not need to change, because package modifiers protected it from the details of the package implementation. This is normal object-oriented design and normal refactoring.

I copied merge.py into a new module directories.py, and began deleting the parts that were about merging rather than about directory handling. I now defined InputFileNotFound in directories.py. I added new entries to __init__.py to import class Directories and InputFileNotFound. I changed merge.py so that class Merge inherited from class Directories instead of class Common.

But as I created directories.py, I made my second mistake: I failed to delete the existing definition of InputFileNotFound in merge.py, and I failed to delete the import of InputFileNotFound in __init__.py.

However, all my tests passed. Particularly, test_missing_file() passed. Thus I knew that the command merge_command.py still printed the same error message, when a path it was given pointed to a file which did not exist.

At this point, the code looked something like this:

src/
    merge_command.py    # command-line program
       imports InputFileNotFound from modifiers
       imports Merge from modifiers
       creates instance of class Merge
       calls Merge instance with file paths etc.
       catches InputFileNotFound, specific message
       catches any other exception, prints general message
    merge_command_test.py  # tests merge_command overall
        test_missing_files() 
            calls merge_command with path known to not exist
            confirms that merge_command raised exception
            checks error message against expected text
    modifiers/          # package to do file modifications
        __init__.py # package interface
            imports Common from common
            imports Directories from directories
            imports InputFileNotFound from directories
            imports Merge from merge
            imports InputFileNotFound from merge # mistake!
            # more, but details don't matter for this story
        common.py # basic work for all tools
            defines class Common
        directories.py # directory handling for all tools
            imports Common from modifiers
            defines class Directories, inherits from Common
            if file missing, raise InputFileNotFound
            defines InputFileNotFound
        merge.py # specific work of merge tool
            imports Directories from modifiers
            defines class Merge, inherits from Directories
            # no use of InputFileNotFound any more
            defines InputFileNotFound # mistake!
        shorten.py # specific work another tool
            # details don't matter for this story

The code stayed in this structure for quite a long time. I had made two mistakes, but I wasn’t aware of them. My tests passed. My client was satisfied.

But recently, my client asked for improvements to the tool. These involved bringing the old experiment, wrote output file ABʹ, up to date with the current code. Specifically, I merged the current code of the main branch into the old code of the experimental branch. A number of modules conflicted, as I had expected. Many of the conflicts had to do with directory handling, and class Directories. The experimental code predated this new class. I worked to resolve the conflicts. This involved looking at the main branch’s code, and the experimental code, and their differences, and deciding what to take from the main branch, what to take from the experimental branch, and what to rewrite to reconcile the two. I wanted to have one set of code on the experimental branch which preserved all the current behaviour of the tool, and also had the experimental behaviour.

As I was resolving these conflicts, I noticed that __init__.py imported InputFileNotFound twice. The second import, from merge, was no longer correct. I deleted it. I then discovered the left-over definition of InputFileNotFound in merge.py, and deleted it also.

All of a sudden, the test, test_missing_file(), began failing. The output from merge_command.py was different than my reference expected text. I checked, carefully, for the edit when the problem arose. It arose when I merged the main code into the experimental. Perhaps I had made a mistake in merging the conflicts? But none of code for directories or missing files was in conflict, because it hadn’t existed in the experimental code. I ran merge_command.py in a debugger. The InputFileNotFound was raised, and handled, (apparently) as expected.

This kind of problem is confusing, frustrating, and demoralising for a developer. There is a temptation to run the same tests the same way several times, hoping for a different result. But that is insanity.

The way out of a problem like this is similar to the way out of a frozen lake after you have fallen through the ice: crawl gradually, remaining at a low level, testing assumptions, until on solid ground.

I read the exception handling in merge_command.py, and it was simple and very clear: there was code to handle an InputFileNotFound, and I could not see how it would fail. But then I used a debugger to step through the test_missing_file(), and was astonished. The Directories class raised an InputFileNotFound exception, as expected. The handler which should have handled the InputFileNotFound unexpectedly did not handle it. The general handler, to catch any other exception, handled it instead. When I examined the exception’s value, it looked just like an InputFileNotFound. But when I used the Python expression to see if it (ex) was what it thought it was — isinstance(ex, InputFileNotFound)— the answer was False.

How could an InputFileNotFound exception not be an InputFileNotFound?

Eventually, the answer came to me. The name InputFileNotFound was defined twice: in directories.py, and an old left-over definition in merge.py. The Directories class used the name in directories.py to create and raise the exception. The modifiers package interface in __init__.py imported the name twice, the first time from directories, and the second time from merge. The second definition was what merge_command.py got when it imported the name from modifiers.

The InputFileNotFound handler in merge_command.py was matching against the exception defined in merge.py, rather than the correct one defined in directories.py. Even though they had the same name, they referred to different classes. The Computer Science name for this is “aliasing”. When used correctly, it is wonderful. When used unintentionally, it has effects that are really, really confusing.

My first software engineering mistake was to not delete the old definitions in, and imports from, merge.py when I extracted the exception to directories.py. If I had deleted the old definition, then the import in __init__.py would have caused an error, and I would have known to remove it. When I finally did this cleanup, as part of the branch merge, then merge_command.py finally got the correct definition for its InputFileNotFound name.

But that means that merge_command.py had been handling exceptions wrong the whole time. How had the tests passed? Well, that was my second software engineering mistake. I implemented merge_command.py first, then wrote the test_missing_file() code to check that it always handled the exception the same way it did at first. I failed to check the nature of that handling for correctness. I failed to check for the InputFileNotFound handler’s “specific message”. Instead, I took the (incorrect) “general message” as the reference correct output. This is sloppy test writing.

Thus, the failing test test_missing_file() was the sign of a change in the code under test, all right, but it was a change from broken to working!

So, I learned about software engineering from that. My three lessons are:

  1. When refactoring, be careful to clean up lingering old code in the original location.
  2. When writing tests, be careful to check that the code’s behaviour is in fact correct, before setting it into the concrete of the text fixture.
  3. When faced with a confusing, frustrating failure, letting go of beliefs about the code, and proceeding with deliberate debugging, is an effective way to diagnose the problem and get back to solid ground.

How to update the JRE in LiClipse

Posted by on 28 May 2021 | Tagged as: robobait, software engineering, technical support

I use the LiClipse software development environment (IDE) for most of my software development. It is a distribution of the Eclipse Project IDE, with PyDev, and its own Java runtime environment (JRE), and some extras. I like it because I can, by installing more components, work with more software languages. Most of the time, one can update the various components of an Eclipse IDE using the internal “Check for Updates” function. However, my installation had reached the point where updating the components required upgrading the JRE. The documentation for the latter upgrade was inadequate. Here is what I figured out. It worked for me. I hope these instructions are helpful for others.

Continue Reading »

How to fix table contents turned to “0” in LibreOffice

Posted by on 30 Apr 2021 | Tagged as: robobait, software engineering, technical support, web technology

Well, that was a fright. I was editing a large report with the LibreOffice word processor. I had a table of results. It consisted of a placeholder table — a header and a couple of rows — with Table Styles applied to get the rows formatted right. I pasted in dozens of rows of content from a spreadsheet. I saved the file for the night. The next morning, I opened the document, and to my horror, found that text values in many of the columns had been turned to “0”. This is LibreOffice bug 131025. And here is how I recovered from the error, and got my table contents back. I hopes it helps others who encounter this bug.

Continue Reading »

How to change and delete attributes in an XML file using XSLT

Posted by on 31 Mar 2021 | Tagged as: robobait, software engineering

I recently had a need to modify a large XML file by changing attributes of one kind of element — changing the value of a certain attribute, and deleting another attribute — while preserving everything else: element text, surrounding elements, etc. I did not readily find code for modifying an attribute value in my searches. Here is what worked for me. Perhaps it will be helpful for someone else.

Continue Reading »

Link from Trac to GitHub using changeset links

Posted by on 30 Sep 2020 | Tagged as: robobait, software engineering, web technology

There is an elegant way to link from the wiki and bug tracker called Trac to GitHub by means of changeset links. Unfortunately, Trac’s documentation on WikiFormatting and on TracLinks does not currently describe this. But one project which I use, MacPorts, uses Trac to track MacPorts bugs and to host wiki documentation. They prefer that contributors use these changeset links rather than URLs when referring to MacPorts on GitHub. So, this is the documentation for these changeset links, in a form which I wish was in the main Trac documentation. It is here to remind me. Maybe it will also help others searching for how to do this.

Continue Reading »

PostScript code converting UTF-8 to UTF-16

Posted by on 31 Aug 2020 | Tagged as: robobait, software engineering, Unicode

This is a little bit of code which was fun and nostalgic to write, even though the motivating project fell through. I wrote PostScript language functions to convert strings with UTF-8 contents, into strings with UTF-16 contents. This was intended to be part of a batch tool to convert PDF documents to PDF/A format, but that did not work out. However, the code works, and here it is.

Continue Reading »

FFmpeg’s fps filter, documented

Posted by on 30 Apr 2020 | Tagged as: FFmpeg, robobait, software engineering, web technology

The FFmpeg media editing software is a valuable tool, but its documentation is only barely adequate. It certainly does not answer all the questions I have, as a user trying to understand why FFmpeg is not doing what I want it to do.

Fortunately, FFmpeg is open source, so when the documentation fails, one can read the source. I wanted to learn about presentation time stamps and time bases. The fps video filter source code, in file libavfilter/vf_fps.c, was an instructive read.

I took what I learned from reading that source, and did a complete rewrite of the fps filter documentation. It is longer than the original fps filter documentation (as archived in April 2020 — you can check if the present documentation is any better). I believe the rewrite is more complete and more accurate. I contributed the rewrite to the FFmpeg project. I submitted it as a patch to the ffmpeg developers list. Discussion continues. I don’t know if this contribution will ultimately get accepted.

So, for the benefit of FFmpeg users who are web-searching for answers, here is my documentation of FFmpeg’s fps video filter.

Continue Reading »

Duplicate entry names in a single directory on a file server: solved!

Posted by on 31 Aug 2019 | Tagged as: robobait, software engineering

I have just seen — and solved — the most remarkable thing in a deep corner of my large archive disk: a single directory containing two entries (subdirectories) with the same name and same inode number. I will describe the problem, the diagnosis, and the cure for the benefit of others who encounter the same problem.

I was moving my archive of old files from one Network-Attached Storage (NAS) file server on my home network to another. Both old and new servers use netatalk AFP software to present Mac=style volumes to my Mac computer. Both run an underlying Unix-like OSs and file systems (but different ones for each).

I moved the archive by dragging the top-level directory data folder, using Finder on my Mac, from the old server to the new. Partway through, the copy aborted, with an error message like,  “a directory with the name .externalToolBuilders already exists”. This is remarkable. Each directory on the old server might have many entries or few, but each entry must has a different name. It is one of the fundamental rules of file systems. I was not combining two directories together, where an entry from one directory might collide with an entry with the same name from the other directory. Continue Reading »

Next Page »