I use and value a good spreadsheet application the way chefs use and value good knives. I have countless occasions to do ad-hoc data processing and conversion, and I tend to turn to spreadsheets even more often I turn to a good text editor. I know a lot of ways to get the job done with spreadsheets. But recently I learned a new trick. I’m delighted to share it with you here.

The situation: you have an HTML document, with a list of linked text. Imagine a list of projects, each with a link to a project URL (the names aren’t meaningful):

The task is to convert this list of formatted links into a table, with the project name in column A, and the URL in column B.  The trick is to use an OpenOffice macro, which exposes the URL (and other facets of formatted text) as OpenOffice functions.

(I described my recent real-life instance of this situation in the blog Open Data Day 2014, and a dataset dataset for Vancouver. The HTML document was the City of Vancouver’s Open Data Catalogue. It was an HTML table, not a list, which I wanted to convert to a spreadsheet — but that difference wasn’t all that material.)

An obvious way to start converting such a list is to select the list, Copy, and Paste into an empty Apache OpenOffice spreadsheet. OK so far. Unfortunately, each list item becomes one cell in the spreadsheet. The text in the cell is formatted, and the link is buried in the text formatting. It looks like this:


A B C
1 Example “Aphids” Project    
2 Example “Beeswax” Project    
3 Example “Clover” Project    

The tool we want is a spreadsheet function =CELL_URL( A1 ) , in column C, which extracts the  URL from the cell in column A and exposes it for further manipulation. Column B is the place for the link text, unformatted. The formula =A1 accomplishes that.  If we had a CELL_URL() function, our spreadsheet would look like this:


A B C
1 Example “Aphids” Project Example “Aphids” Project http://example.com/pr/aphids/
2 Example “Beeswax” Project Example “Beeswax” Project http://example.com/pr/beeswax/
3 Example “Clover” Project Example “Clover” Project http://example.com/pr/clover/

Unfortunately, the standard OpenOffice spreadsheet doesn’t include a function like CELL_URL().

I’m indebted to OpenOffice.org forum user Villeroy for their post [Calc, Basic] Introspective cell functions (Apache OpenOffice forums, Code Snippets forum, Mon Feb 18, 2008 9:53 pm). They, bless their hearts, posted a set of macros which implement pretty much the CELL_URL() function we want. Instructions for how to add a macro to an OpenOffice spreadsheet are out of scope here, and available from the OpenOffice project.

The CELL_URL() function from Villeroy’s code has the arguments:  CELL_URL( sheetnum, rownum, columnnum, paranum ). The parameters are absolute coordinates for a particular paragraph of text in a particular cell within the entire spreadsheet.  sheetnum is an integer, with 1 for the first sheet in a document, 2 for the second, and so on. rownum is 1 for row 1 of the current worksheet, 2 for row 2, etc.  columnnum is 1 for column A, 2 for column B, etc. paranum is an index into the paragraphs (strings separated by newlines) in a cell. We don’t have them in this example, but I did have them in the HTML table of the Vancouver Open Data catalogue.

Absolute coordinates are awkward. In spreadsheets we are used to relative cell references. In C1, a cell reference “A1″ in a formula means “the cell in the same row but two columns to the left”. Paste the same formula into cell E4, and the cell reference changes to C4. Fortunately, there are OpenOffice functions SHEET(), ROW(), and COLUMN(). Each takes a cell reference, and returns the absolute coordinate of that cell in the sheet, row, and column dimension. So, I found it convenient to always use these functions with relative or absolute cell references when I called CELL_URL(). The result is an expression which has the relative cell reference behaviour we want. My usage looked like: =CELL_URL(SHEET(A1);ROW(A1);COLUMN(A1);1)

Filling a formula like this into column C of our example spreadsheet gives us cells with string values containing the URLs we seek. That’s the trick. From that point, processing the URLs and link text into some useful form is straightforward. For me, as detailed in the other blog, my product was wikitext for a table.

Here is the macro code posted by OpenOffice.org forum user Villeroy in their post [Calc, Basic] Introspective cell functions (Apache OpenOffice forums, Code Snippets forum, Mon Feb 18, 2008 9:53 pm). I’m keeping a copy here as insurance against the original forum disappearing. Unfortunately, Villeroy didn’t post a formal licence with the code. It’s clear from context that they are comfortable sharing their code.

REM  *****  BASIC  *****
'from: https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=2762
'written by: Villeroy, https://forum.openoffice.org/en/forum/memberlist.php?mode=viewprofile&u=107
'Retrieved by Jim DeLaHunt, 2014-02-22, for Vancouver Open Data Day
'Usage: =CELL_URL(SHEET(A1);ROW(A1);COLUMN(A1);B1)
'Note usage of SHEET() to get current sheet number,
‘ROW() and COL() to be able to use relative references.
‘4th parameter (optional) is 1-based index of text item in cell.
   REM  *****  BASIC  *****
   REM ################### RETURNING STRING #################################################
   Function CELL_NOTE(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns annotation text
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_NOTE = v.Annotation.getText.getString
      else
         CELL_NOTE = v
      endif
   End Function
   Function CELL_URL(vSheet,lRowIndex&,iColIndex%,optional n%)
   'calls: getSheetCell
   REM returns URL of Nth text-hyperlink from a cell, default N=1)
   Dim v
      If isMissing(n) then n= 1
      If n < 1 then
         CELL_URL = Null
         exit function
      endif
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         if v.Textfields.Count >= n  then
            CELL_URL = v.getTextfields.getByIndex(n -1).URL
         else
            Cell_URL = Null
         endif
      else
         CELL_URL = v
      endif
   End Function
   Function CELL_FORMULA(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM return unlocalized (English) formula
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_FORMULA = v.getFormula()
      else
         CELL_FORMULA = v
      endif
   End Function
   Function CELL_STYLE(vSheet,lRowIndex&,iColIndex%,optional bLocalized)
   'calls: getSheetCell
   REM return name of cell-style, optionally localized
   Dim v,s$,bLocal as Boolean
      if not isMissing(bLocalized) then bLocal=cBool(bLocalized)
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         if bLocal then
            s = thisComponent.StyleFamilies("CellStyles").getByName(v.CellStyle).DisplayName
         else
            s = v.CellStyle
         endif
         CELL_STYLE = s
      else
         CELL_STYLE = v
      endif
   End Function
   Function CELL_LINE(vSheet,lRowIndex&,iColIndex%,optional n)
   'calls: getSheetCell
   REM Split by line breaks, missing or zero line number returns whole string.
   REM =CELL_LINE(SHEET(),1,1,2) -> second line of A1 in this sheet
   Dim v,s$,a(),i%
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         s = v.getString
         if not isMissing(n) then i = cInt(n)
         if i > 0 then
            a() = Split(s,chr(10))
            If (i <= uBound(a())+1)then
               CELL_LINE = a(i -1)
            else
               CELL_LINE = NULL
            endif
         else
            CELL_LINE = s
         endif
      else
         CELL_LINE = v
      endif
   end Function
   REM ################### RETURNING NUMBER #################################################
   Function CELL_ISHORIZONTALPAGEBREAK(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_ISHORIZONTALPAGEBREAK = Abs(cINT(v.Rows.getByIndex(0).IsStartOfNewPage))
      else
         CELL_ISHORIZONTALPAGEBREAK = v
      endif
   End Function
   Function CELL_ISVERTICALPAGEBREAK(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_ISVERTICALPAGEBREAK = Abs(cINT(v.Columns.getByIndex(0).IsStartOfNewPage))
      else
         CELL_ISVERTICALPAGEBREAK = v
      endif
   End Function
   Function CELL_CHARCOLOR(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns color code as number
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_CHARCOLOR = v.CharColor
      else
         CELL_CHARCOLOR = v
      endif
   End Function
   Function CELL_BACKCOLOR(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns color code as number
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_BACKCOLOR = v.CellBackColor
      else
         CELL_BACKCOLOR = v
      endif
   End Function
   Function CELL_VISIBLE(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns visibility state as number 0|1
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_VISIBLE = Abs(v.Rows.isVisible)
      else
         CELL_VISIBLE = v
      endif
   End Function
   Function CELL_LOCKED(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns locked state as number 0|1
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_LOCKED = Abs(v.CellProtection.isLocked)
      else
         CELL_LOCKED = v
      endif
   End Function
   Function CELL_NumberFormat(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM returns the number format index
   Dim v
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         CELL_NumberFormat = v.NumberFormat
      else
         CELL_NumberFormat = v
      endif
   End Function
   Function CELL_NumberFormatType(vSheet,lRowIndex&,iColIndex%)
   'calls: getSheetCell
   REM return a numeric com.sun.star.util.NumberFormat which describes a format category
   Dim v,lNF&
      v = getSheetCell(vSheet,lRowIndex&,iColIndex%)
      if vartype(v) = 9 then
         lNF = v.NumberFormat
         CELL_NumberFormatType = ThisComponent.getNumberFormats.getByKey(lNF).Type
      else
         CELL_NumberFormatType = v
      endif
   End Function
   '################### HELPERS FOR ABOVE CELL FUNCTIONS #########################################
   Function getSheet(byVal vSheet)
   REM Helper for sheet functions. Get cell from sheet's name or position; cell's row-position; cell's col-position
   on error goto exitErr
      select case varType(vSheet)
      case is = 8
         if thisComponent.sheets.hasbyName(vSheet) then
            getSheet = thisComponent.sheets.getByName(vSheet)
         else
            getSheet = NULL
         endif
      case 2 to 5
         vSheet = cInt(vSheet)
         'Wow! Calc has sheets with no name at index < 0,
         ' so NOT isNull(oSheet), if vSheet <= lbound(sheets) = CRASH!
         'http://www.openoffice.org/issues/show_bug.cgi?id=58796
         if(vSheet <= thisComponent.getSheets.getCount)AND(vSheet > 0) then
            getSheet = thisComponent.sheets.getByIndex(vSheet -1)
         else
            getSheet = NULL
         endif
      end select
   exit function
   exitErr:
   getSheet = NULL
   End Function
   Function getSheetCell(byVal vSheet,byVal lRowIndex&,byVal iColIndex%)
   dim oSheet
   '   print vartype(vsheet)
      oSheet = getSheet(vSheet)
      if varType(oSheet) <>9 then
         getSheetCell = NULL
      elseif (lRowIndex > oSheet.rows.count)OR(lRowIndex < 1) then
         getSheetCell = NULL
      elseif (iColIndex > oSheet.columns.count)OR(iColIndex < 1) then
         getSheetCell = NULL
      else
         getSheetCell = oSheet.getCellByPosition(iColIndex -1,lRowIndex -1)
      endif
   End Function