Writing a simple unicode selector

One of the emacs packages I use, counsel, has a pretty nifty command called counsel-unicode-char that lets you lookup unicode characters by code point or name and inserts them into the current buffer. In non-emacs speak, this is essentially just an easy way to type emoji, symbols, and any other unicode glyph. I thought it might be handy to write a quick script that would allow me to do this anywhere on my machine and copy the character to the clipboard.

Enter unicodedata

So… programmatically getting a UTF8 character name? Naturally we turn to python - if there's a quick way to do it, python usually has it. I was glad to find out that python comes with a bulit-in library called unicodedata that could do exactly what I wanted. So I quickly whipped up the following:


import sys, unicodedata

# Print all named unicode chars
    # Max of range chosen experimentally, lol
    for i in range(32,918000):
            char = chr(i)
        except ValueError:
except (BrokenPipeError, IOError):


We start at the first printable character, with an integer value of 32, and loop all the way to 918000, the largest character that this library has a name for. We print the hexadecimal codepoint, the char itself, and the character name, ignoring characters that don't have a name.

Then we can pass this into rofi to create our selector!

unicode.py | rofi -i -dmenu | cut -d$'\t' -f2 | xclip -r -selection clipboard

This is why we're catching BrokenPipeError and IOError and closing stderr - in case we exit rofi earlier than the python script finishes producing output.

Here's what this looks like:


Can we do better?

So there I was, feeling pretty good about myself - it's python, it's built-in, it's portable, what's not to love? Well, pretty often when I write a program the next question in my head is either "Can I do it in scheme?" or "Can I do it faster?".

I began looking into the source code for python's unicodedata library and found that they were manually generating a database based on unicode.org's UnicodeData.txt and couple other files. Very impressive, over-my-head sort of stuff. I then found that unicode.org has a C library for working with unicode, and wondered why python doesn't use that - probably something to do with not relying on that dependency.

While I could just use the C library directly, Chicken Scheme is more fun to write, and I thought it might be a useful library to have in that language too. So after way too much time and head scratching, I present: icu. Here's what the same program looks like in Chicken:

(import chicken.format
(do ((i 32 (add1 i)))
    ((= i 918000))
  (let* ((char (integer->char i))
         (name (char-string-name char)))
    (when name
      ;; Currently printf doesn't support utf8
      ;; (printf "U+~a\t~a\t~a\n" (number->string i 16) (string char) name)
      (display "U+")
      (display (number->string i 16))
      (display #\tab)
      (display char)
      (display #\tab)
      (display name)

The shebang here is from the wonderful autocompile egg. The speed gain here turned out not to be that great, maybe about 2x, but the fun gain…

Date: 2020-12-18