Skip to main content

Using Iconv to convert UTF-8 to ASCII (on Linux)

·207 words·1 min· ·
General RubyOnRails Features Ruby
Ariejan de Vroom
Author
Ariejan de Vroom
Jack of all Trades, Professional Software Craftsman
Table of Contents

There are situations where you want to remove all the UTF-8 goodness from a string (mostly because of legacy systems you’re working with). Now, this is rather easy to do. I’ll give you an example: çéß

Should be converted to cess. On my mac, I can simply use the following snippet to convert the string:

s = "çéß"
s = Iconv.iconv('ascii//translit', 'utf-8', s).to_s # returns "c'ess"
s.gsub(/\W/, '') # return "cess"

Very nice and all, but when I deploy to my Debian 4.0 linux system, the I get an error that tells me that invalid characters were present. Why? Because the Mac has unicode goodness built-in. Linux does not (in most cases).

So, how do you go about solving this? Easy! Get unicode support!

sudo apt-get install unicode

Now, try again.

Bonus
#

If you want to convert a sentence (or anything else with spaces in it), you’ll notice that spaces are removed by the gsub command. I solve this by splitting up the string first into words. Convert the words and then joining the words together again.

words = s.split(" ")
words = words.collect do |word|
  word = Iconv.iconv('ascii//translit', 'utf-8', word).to_s
  word = word.gsub(/\W/,'')
end
words.join(" ")

Like this? Why not write a mix-in for String?

Related

Action Mailer: All mail comes from MAILER DAEMON
·122 words·1 min
General RubyOnRails Features Ruby
Ultimate List of Ruby Resources
·214 words·2 mins
General Web Development RubyOnRails Features Lists Ruby
Installing RMagick Ruby Gem on Mac OS X 10.4.9
·254 words·2 mins
General RubyOnRails Features Mac OS X Ruby