Duplicate entry names in a single directory on a file server: solved!
Posted by Jim DeLaHunt on 31 Aug 2019 at 07:54 pm | Tagged as: robobait, software engineering
I have just seen — and solved — the most remarkable thing in a deep corner of my large archive disk: a single directory containing two entries (subdirectories) with the same name and same inode number. I will describe the problem, the diagnosis, and the cure for the benefit of others who encounter the same problem.
I was moving my archive of old files from one Network-Attached Storage (NAS) file server on my home network to another. Both old and new servers use netatalk AFP software to present Mac=style volumes to my Mac computer. Both run an underlying Unix-like OSs and file systems (but different ones for each).
I moved the archive by dragging the top-level directory data folder, using Finder on my Mac, from the old server to the new. Partway through, the copy aborted, with an error message like, “a directory with the name .externalToolBuilders
already exists”. This is remarkable. Each directory on the old server might have many entries or few, but each entry must has a different name. It is one of the fundamental rules of file systems. I was not combining two directories together, where an entry from one directory might collide with an entry with the same name from the other directory.
I looked at the directory on the original file server, from the Mac OS shell on my Mac. I saw a listing like this (but this is bowlderised to preserve my confidentiality).
% ls -liaF path/to/directory/with/duplicate/entries
total 56
1227293 drwxr-xr-x 1 jdlh staff 264 3 Jan 2016 ./
1227288 drwxr-xr-x 1 jdlh staff 264 3 Jan 2016 ../
1227364 drwxr-xr-x 1 jdlh staff 264 20 Feb 2009 .externalToolBuilders/
1227364 drwxr-xr-x 1 jdlh staff 264 20 Feb 2009 .externalToolBuilders/
1227367 -rw-rw-rw- 1 jdlh staff 859 20 Feb 2009 other_files
There are two entries named externalToolBuilders/
in this directory. They appear to share the same inode number (in the leftmost column), so they refer to the same thing, not just to two things which share the same name.
I could reproduce the error, “a directory with the name .externalToolBuilders
already exists”, by copying just this directory to another file system volume using Finder. I could refer to both entries with a single wildcard like diff -rq .ext*
, and the command treated this as expanding to two arguments. However, I could use some command line commands like cp -r
.externalToolBuilders/
, referring to the duplicate subdirectory, and only one of the entries would get copied.
I was baffled. I posted my findings to the Ask Different Q&A site, as Two entries with identical name and inode in same directory? I asked: Has anyone seen this on a Mac OS file system before? How can it be constructed at will? Are there known drawbacks of it? How can one clean it up?
I finally figured out the underlying cause. I connected to the Unix-like shell of the original server’s underlying Linux operating system. I looked at the corresponding directory in the server’s underlying file system. This is what I saw (again, bowlderised for confidentiality and simplified for clarity):
% ls -la /share/Volume/path/to/directory/with/duplicate/entries total 56 drwxr-xr-x 1 jdlh everyone 4096 Jan
3
2016 ./ drwxr-xr-x 1 jdlh
everyone
4096
Aug 24
00:21 ../ drwxr-xr-x 1 jdlh
everyone
Aug
4096
24
01:32 .externalToolBuilders/ drwxr-xr-x 1 jdlh
everyone
Feb
4096
20
2009 :2eexternalToolBuilders/ -rw-rw-rw- 1 jdlh
everyone
Feb
4096
20
2009 other_files
The crucial observation is the directory name, :2eexternalToolBuilders
. It begins with the string “:2e”, while the other directory entry begins with the string “.”. From the point of view of the underlying operating system and file system, there are no duplicate entries in this directory. The two “externalToolBuilder” directories have different names.
The layers of software on top of the server’s operating system — quite probably the netatalk AFP software — interpret the prefix “:2e” as standing for “.”. When presenting the underlying directory entry :2eexternalToolBuilders
through AFP to my Mac, it rewrites the entry’s name to .externalToolBuilders
. It fails to notice that there is another entry named .externalToolBuilders
in that directory. The result is that my Mac sees, in the original server, a directory with an unexpected, and rule-violating, duplication.
I suspect that the use of prefix “:2e” in place of prefix “.” is a convention from old Server Message Block (SMB) file server software. SMB allowed Mac OS files to be stored on underlying Windows file systems. The Windows file system of the time did not permit filenames with a leading “.”. “2e” can be read as a hex ASCII code for period “.”. The colon “:” can be read as an escape character, meaning that it plus the following two hex digits should be presented as the character represented by the digits. Thus “:2e” in an underlying directory entry name stands for “.” in the directory entry name presented by the server.
It turns out that the data on the original server was old enough to have been copied forward through multiple versions of server and server software. Sometimes I accessed it through SMB, and other times through AFP. I expect that the directory :2eexternalToolBuilders
was created first, and the companion directory .externalToolBuilders
was created later. They coexisted, especially for old data which I didn’t access. Only when I used Finder to copy the directory did the conflict become apparent.
I speculate that the inconsistent behaviour I saw on my Mac, looking at the volume presented through AFP, is caused by Mac OS utilities treating directory entries differently depending on whether they look up a specific name in a directory, or enumerate all names. The software no doubt assumed there can be no duplicate names among the entries. Utilities looking for a specific name will find one or other of the duplicates, and stop. There is no reason to look for another of that name, because none should exist. Utilities enumerating all names, or all names matching a wildcard, return all matching entries, not caring that some are duplicates. The duplicate inode number can be explained by the software enumerating all names, then for each name, using that name to look up the inode number corresponding to that name. The software returning inode numbers would of course end its search with the same directory entry both times, because it was looking for the same name both times.
The solution for me was to patrol the underlying filesystem of my original server, looking for cases of duplicates separated by “.” and “:2e” prefixes. I found about five cases, with names like .externalToolBuilders
, .svn
, .libs
, and .metadata
. I used shell command to merge all files into the entry with the “.” prefix, then delete the entry with the “:2e” prefix. This removed the duplication, and let the Finder copy succeed.
This is perhaps a rare situation, caused by a combination of old data, NAS file servers, and a combination of SMB and AFP servers. But its rarity made it hard to learn about. I hope this report, and the answer to the Ask Different question, will make someone else’s diagnoses and fix easier.
[…] descubrà la causa subyacente. Lo siguiente está adaptado de mi publicación de blog con mis […]