The ‘ugliness’ of Python

Sunday, 23 January 2011

This is a look at functions and objects in my favorite languages. Last year as a Rubyist, I decided to learn Python. I realized that even though Ruby is a powerful language, it doesn’t have the community support I need for science coding. Now I appreciate Python’s style, and I’d like to take a fresh look at “Is Python ugly?”

Not what you think

I want to take on a big question, one that nobody’s really answered: which language is more beautiful, Ruby or Python?

You read that sentence and have already decided this is a flame war waiting to happen. But hear me out. I’m actually not going to call Python “ugly”, much as I started out thinking just that. Instead I’m going to look at differences to build a case for how each language wants us to code. A couple of notes before I start:

  • This is a big topic. Hence, a substantial article.
  • I’ve coded Ruby for 6 years and Python for 2 months. (Read: I do not know everything about Python.)
  • I’m not going to explain Ruby 1.9 block syntax. There are plenty of resources out there that explain things like [1, 2].map &:next.
  • I learned Python via The Quick Python Book (Manning).
  • I’ve picked “pivot points” that separate Ruby and Python. I’m only going to focus on those points.
  • This is a functional perspective.

With that said, let’s dig in.

Hooks & humanity

When I first came across Python’s hooks (__len__(), __str__(), __init__(), etc.), I was horrified. I sat down to pen an epic diatribe against Python and its aesthetic flaws. About how I could never actually like coding in such an ugly language. About how it just sucked.

If you’d have seen the post I had planned, it would’ve read something like:

I hate the underscore methods. Presumably they are underscored–four times, mind you–to avoid naming conflicts. Okay I get it, so if you want to define your very own init() method that isn’t a constructor, you’re good to go. Er … if, of course, you ever need to define an init() method that doesn’t, well, init. I loathe this take on code. I don’t think it’s necessary or good or humane. Ruby’s got hooks, too, after all, but they’re simple. If I ever do want to make an initialize method that doesn’t initialize, well, a) it’s not a big deal and b) I’ll man up and use a different name.

Then I talked it out with a therapist, who said I should be more worried about things like world hunger and AIDS in Africa.

Now, in my mind, these method names are decidedly ugly. They aren’t friendly, short, clean, straightforward or simple. I don’t think anyone would argue with me there. Certainly no one’s actually happy typing the extra underscores.

But maybe vagueness is the point. Maybe Python wants to steer us away from these methods, at least until we actually need them. __len__() is not something you should have to define every day, after all, so why not indicate that with underscores?

I think I can get behind that.

“You’re reading too much into this”. I’m sure you’re thinking it, and maybe you’re right. But I know one thing. When I define __init__(), it seems hackish, wrong. I feel like I’m goin’ rogue, like I’m dipping into one of Apple’s much-guarded private Cocoa APIs or possibly a secret government lab in Area 51. (Is there a difference?) I mean, what other reason would there be to surround a name with so much negative space?

(I think of it as a moat.)

And so I don’t define objects in Python. It doesn’t feel natural. I use maps and lists and functions as much as possible and leave objects to libraries. Turns out that’s a fantastic way to actually get things done.

Functions, meet objects: Python’s take

Python’s really an ideal language for functional programming. We can create pure functions without much ado and pass them like any other variable.

But you can’t always use functions with Python’s extensive libraries. (And libraries are why I’ve learned Python at all.) Why not? Because a lot of Python’s libraries are built out of objects, and Python doesn’t let us combine our favorite functions (for example, map) with object methods.

Now, we definitely can deal with objects in Python without resorting to loops and if statements, but we do have to abandon standard techniques for that. Alternatively, we can disguise our methods as functions and hold onto map and reduce. Let’s take a quick example. Where Ruby would let me capitalize every element of a list by mapping the capitalize method onto each letter …

letters = ['a', 'b', 'c']
letters.map &:capitalize
# => ['A', 'B', 'C']

… Python makes me do one of two things:

# Option 1: List comprehension
letters = ['a', 'b', 'c']
[a.capitalize() for a in letters]
# => ['A', 'B', 'C']

# Option 2: Wrap the method in a function
capitalize = lambda x: x.capitalize()
map(capitalize, letters)
# => ['A', 'B', 'C']

I’m not especially happy about the extra effort. Should we really have to abandon our favorite functions just because we want to call a method instead of a function?

Update. Apparently it’s possible to map to methods. Who knew? The syntax here is something like map(str.capitalize, ['a', 'b', 'c']). That weakens this point a little, but this way of doing things complicates everything. To write this code, you really have to understand a lot about Python’s implementation before you can even start functional programming. For example, here you have to understand that str is a class that acts like a function when you need it to and like a class everywhere else. So map(str, coll) and map(str.capitalize, coll) are each valid but are doing completely different things! So I still think this is a pivot point between Ruby and Python.

Now, I’m not discounting list comprehensions. They’re elegant and rather powerful. A great example is identifying palindromes:

s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]  

Assuming that p() returns true for all palindromes, that one line gives us all palindromes in s. (Try writing that one-liner in Ruby.)

Still, I don’t like using list comprehensions when a regular function should do.

In Ruby, things are different.

Hurdles for Ruby

Ruby is beautiful because it balances consistency with convenience. Consistency, though, makes functional programming a little more difficult. I see three hurdles to functional programming in Ruby:

  1. Ruby needs extra syntax for function passing
  2. There are no pure functions in Ruby
  3. Ruby will not let us map over top-level def methods (TLDMs)

1. Syntax

This is not much of an objection, but I do want to mention it. Pythonistas will say, “Python lets me pass and call functions without any effort, but Ruby makes it hard!” That’s true, function passing is painless in Python and a little harder in Ruby. But coll.map &f isn’t so bad, and for me, it’s definitely not a deal breaker. I’ve actually gotten used to it. To me, the extra syntax is the price we pay for internal consistency.

Aside. For a lot of Rubyists, the solution is blocks. That is, instead of passing functions to methods, they stick raw code inside of blocks. If blocks get too long, they refactor that code into a method or helper.

2. Pure functions

I’m going to go ahead and say it: Ruby’s got no functions. And how can you program functionally without functions? (Ouch.)

Let’s step back and think for a second.

Convention says Array#map is not a pure function. Why? Because it relies on data that you’re not passing to it as a parameter–the array itself. That is, numbers.map &:to_s could give different values in the same code–even though its argument doesn’t change–because numbers carries its own logic and state. Put another way, the map method draws on more information than you pass in to return a value.

So that’s that. Object-orientation, by definition, means no pure functions. Right?

Well, let’s think about this for a second. It would be pretty trivial to make map look like a pure function:

module Kernel
  def map(coll, &f)
    coll.map f
  end
end

Ruby likes to group functionality into classes, so you don’t see this in the standard library. But this is a possible patch. And it would almost look like Python.

I think this patch has a valuable point, even if we don’t implement it.

Maybe, just maybe, we should think of coll.map &f as a real, pure function–one that takes coll and f as arguments with a special syntax. One that translates without contradiction into map(coll, &f). Is this is the path to enlightenment?

Aside. Some objects have methods that can’t be functions because of state. For example, an object that returns different to_s strings (maybe based on the flags you’ve set) can’t give you anything close to a pure function. This isn’t much of a concern, though, for data structures.

3. Top-level methods

In Python, to create an algorithm f(x) and apply it to an array of arrays, we can use the standard def syntax:

def f(x):
    # My algorithm goes here

map(f, data)

In Ruby, though, we can’t pass around these top-level def methods (TLDMs) because they belong to an object. And that makes functional programming difficult. If I want to create an algorithm and then apply it to a list of numbers, I have four options:

# Option 1: Bulky lambda
f = lambda do |x|
  # algorithm
end

data.map &f

# Option 2: Monkey patch
class Array
  def f
    # algorithm
  end
end

data.map &:f

# Option 3: Explicit block
def f(x)
  # algorithm
end

data.map {|x| f(x) }

# Option 4: Method method
# (Thanks to several of you for this suggestion.)
def f(x)
  # algorithm
end

data.map &method(:f)

Each of those solutions has problems or is clunky. There’s really no getting around this. This is so much a problem that I think Ruby needs a new keyword, one that lets us define functions. Something like defn. defn could create a named lambda (Proc), or some other object that responds to call and insert it into the global namespace. That way, we could use a defn method as a real function, but it would integrate cleanly with existing Ruby.

So Ruby’s problem is the opposite of Python’s: In Python, functional programming is stellar, and object-oriented programming is all right. But the two don’t mix so well. In Ruby, though, functions and objects mix beautifully (if you consider Array#map a function), but pure functional programming is a little awkward and could be easier.

The more I’ve compared Ruby and Python, the more I’ve come to appreciate Ruby’s block construct. It solves several problems at once and is really pretty elegant.

General mayhem: blocks & transformations

Blocks have a big impact on Ruby code, especially object-oriented functional code. The best way to illustrate the effects I’m talking about is to dive into an example.

Example: Stock market. Suppose I have a set of news articles in Markdown format. Each article mentions a company zero or more times and links to it. Now, say I decide to count how many times a company shows up in one day’s articles. (I want to run statistics on the data, maybe to predict the stock market using a shiny new algorithm … Go big or go home, right?)

Each company name is listed by its URL each time, so if I see http://apple.com three times in an article, I know Apple is mentioned three times. (That’s “thrice” if you’re joining us from 17th century England.) After I extract all the domain names and capitalize them, I’ll compare them against a master list of companies I care about and decide which company is most popular for that day’s articles.

Now in Java, this is a task for sure. I can hear committees forming already. But in Ruby and Python, it’s an quick hack. In fact, I’d probably code it in a REPL.

Ruby’s blocks let us chain methods over several lines. More importantly, they let us mix internal object transformations with broader list transformations. This is powerful stuff:

# Companies are listed in 'companies', one per line
companies = File.read('companies').split("\n")
articles = ['article1.md', 'article2.md', 'article3.md']

# Simplified URL matcher
http_regex = /http:\/\/(?:\w+\.)*(\w+)\.(?:com|org|net)/

# NOT the simplest implementation, but it shows off blocks
# I'm using the regex to return a Match object, whose
# only element is our domain name
def parse_article(a)
  File.open(a) {|f| f.read }.
    scan(http_regex).
    flatten.
    map {|x| x.capitalize }.
    select {|x| companies.include? x }
end

articles.map {|x| parse_article x }.
  reduce(Array.new) {|x,y| x.concat y }

In fifteen lines of code, we use blocks six times for different reasons. If we wanted, we could also use blocks to build anonymous functions, continuations or loop over collections with side effects (via each). For the record, though, I don’t really like each.

Before I move on to the Python implementation, I have to show you how I would actually code this. You see, blocks are awesome, but too many braces get overwhelming. It’s nice to whittle down and organize your code a little when you use them all the time. I tend to give my blocks names (essentially, they’re functions) and keep them separate from my method chains and transformations. Here is equivalent code:

important = lambda {|x| companies.include? x }
articles.map do |a|
  File.read(a).scan(http_regex).flatten.
    map(&:capitalize).select(&important)
end.flatten

Flows a little better, no? And yes, people, that transformation took just five lines of code.

Aside. The mini-point here? Blocks are so important that we have loads of ways to write them. Without a sea of braces. (Huh. Look at that. “Sea of braces” is almost a pun. Now how to make that work …)

Python doesn’t give us blocks or the end keyword. So it’s not elegant to chain transformations. If we want to do a direct translation of the code above, we can either turn to Lispy nesting or to throwaway variables. (Edit. I’ve updated the Python regular expression to actually work. Thanks to many of you for pointing me to the findall method.)

import re

with open('companies') as f:
    companies = f.read().split("\n")

articles = ['article1.md', 'article2.md', 'article3.md']

http_regex = "http:\/\/(?:\w+\.)?(\w+)*\.(?:com|org|net)"
url        = lambda x: findall(http_regex, x)
name       = lambda x: x.capitalize()
important  = lambda x: x in companies

def parse_article(a):
    with open(a) as f:
        return \
            filter(important, 
                   map(name, 
                       map(url, f.read().split())))


[y for x in map(parse_article, articles) for y in x]

# ... or, for the verbose, an alternative implementation
# of parse_article ...

def parse_article(a):
    with open(a) as f:
        lines = f.read().split()
        urls = map(url, lines)
        names = map(name, urls)
        companies = filter(important, names)

    return companies

My point here? Python does fine without blocks. It can do all of the nifty stuff that blocks do in Ruby. But lambda, with, for, and list comprehensions all come with their own special syntax. Maybe that’s not necessary.

Chains & side effects

I want to give you one more Ruby block example. Because it’s just cool.

tap is a method that exists purely for side effects, just like each. It was designed to give us those side effects inside of a chain. tap yields its receiver to a block and then returns its receiver from that same block. You can print or examine anything about the object within that block, and it doesn’t affect the chain in the least. Say we want to debug our transformation above. Given an info method that examines our current array and prints its size, we can say:

def parse_article(a)   
  File.read(a).split.   tap {|x| info x }.
    scan(http_regex).   tap {|x| info x }.
    flatten.            tap {|x| info x }.
    map(&:capitalize).  tap {|x| info x }.
    select(&important). tap {|x| info x }
end

parse_article 'article1.md'

# Hypothetical output:
# Array (1000)
# Array (1)
# Array (30)
# Array (30)
# Array (10)

Try debugging like that in Python.

The verdict

After mulling this over for a while, I’ve reached a tentative conclusion: Python is an excellent language for functional programming and decent one for object-oriented programming, but if you want to use both together, you have to completely change your coding style. (Update. See my earlier note. You don’t have to change your style, but you do have to really dig into Python’s implementation do any functional programming.) This isn’t a problem if you only use functional libraries. But when you’re using libraries built out of objects, you’ll run into road blocks. Changing coding style in the middle of a project is a drag, and I think Python really needs is an uncluttered way to mix methods and functions.

Another great feature would be a chain operator that lets us chain over multiple lines. (Clojure gives us ->.)

So functional programming is beautiful in Python. Ruby’s beauty, on the other hand, comes from unity and adaptability.

Ruby is fantastically simple when you need it to be. (I know people who get by never having to pass a function.) But it scales really well with complexity. If your project is so big that you want to organize it with objects, you can. And you don’t have to sacrifice your functional style of programming to do so. If, instead, you want to organize your project around data structures and functions, you can do that too. The language is malleable beyond belief, and its unifying principles really make it fun to use.

Ruby is not as elegant as Scheme, to be sure, nor is it quite as pliable. And it could use Python’s list comprehensions, because they are phenomenal. But I’m very happy using Ruby when I can, because it adapts to me, not the other way around. What syntax Ruby does have, it uses well. For example coll.map &f is equivalent to map(coll, &f), but the first is more intuitive to lots of people.

The problem I have with Ruby is that there is no clean way to build traditional functions. Every function I try to define is tied to an object by default. And that makes concurrent programming hard. What Ruby needs is a new keyword–something like defn–that can create functions. defn would create a named function/lambda/Proc that we could access anywhere in scope and call via call. This would integrate functions seamlessly with the existing Ruby infrastructure and would make functional programming as painless as in Python.

Regardless of how much I like Ruby, work demands that I use Python. But Python isn’t nearly as ugly as I anticipated. Over time, my initial disgust has turned into curiosity. My bafflement has turned into respect. And I’ve decided: I really do like this language. There is nothing that I can’t do in Python that I could do in Ruby, and there was almost no learning curve adding it to my tool belt.

Instead of thinking Ruby or Python–or any language–is more beautiful than the other, this article has pushed me to think about how a language makes me code. In the case of Python, I’ve decided it wants to be coded like Scheme, so I don’t build object hierarchies–ever. Ruby, on the other hand, blends functions and objects seamlessly. I am okay building shallow hierarchies every once in a while because I don’t have to give up functional programming goodness. And that works, too.

In a way, this makes me a divided programmer: When I move between these similar languages, my style changes completely. But I’ve decided to accept that. It keeps the languages distinct in my mind and makes me value them both for the insight they bring to the table.

Shameless plug. This article took about a month to put together. If you like what you read, you can always give me some love on Hacker News!