Won in Translation, by David Jacobs

Programming

The ‘ugliness’ of Python

by David Jacobs January 23, 2011

This is a look at functions and objects in my favorite languages. In 2010 as a Ruby guy, I decided to learn Python. I realized that even though Ruby is a powerful language, it doesn’t have the community support I need for science coding. Now I appreciate Python’s style, and I’d like to take a fresh look at whether Python is actually ugly.

Not what you think

I want to take on a big question, one that nobody’s really answered: which language is more beautiful, Ruby or Python?

You read that sentence and have already decided this is a flame war waiting to happen. But hear me out. I’m actually not going to call Python “ugly”, much as I started out thinking just that. Instead I’m going to look at differences to build a case for how each language wants us to code. A couple of notes before I start:

This is a big topic. Hence, a substantial article.
I’ve coded Ruby for 6 years and Python for 2 months. (Read: I do not know everything about Python.)
I learned Python via The Quick Python Book (Manning).
I’ve picked some syntactic and semantic points that separate Ruby and Python. I’m only going to focus on those points.
This is a functional perspective, at least, as much as possible in a multi-paradigm language.

With that said, let’s dig in.

Hooks & humanity

When I first came across Python’s hooks (__len__(), __str__(), __init__(), etc.), I was pretty horrified. I sat down to pen an epic diatribe against Python and its aesthetic flaws. About how I could never actually like coding in a language with syntax as heavy as that.

If you’d have seen the post I had planned, it would’ve read something like a rant:

I hate the underscore methods. Presumably they are underscored– four times , mind you–to avoid naming conflicts with methods you might want to create. Okay I get it, so if you want to define your very own init() method that isn’t a constructor, you can. That is, if you ever actually need to define an init() method that doesn’t … er … init. You can probably already tell: I come from a Ruby world. I like conventions that don’t surprise me. Ruby’s got hooks, too, after all, but they’re simple and (at least to me) make sense. If I ever do want to make an initialize method that doesn’t initialize, well, I’ll get over it and come up with a new name.

Then I talked it out with a therapist, who said I should be more worried about things like global warming and AIDS in Africa.

Now, in my mind, these method names are definitely ugly. They aren’t short, clean or straightforward. I don’t think anyone is actually happy typing the extra underscores.

But maybe vagueness is the point. Maybe Python wants to steer us away from these methods, at least until we really need them. If __len__() isn’t something you have to define every day, why not indicate that with underscores?

I think I can get behind that.

“You’re reading too much into this”. I’m sure you’re thinking it, and maybe you’re right. But I know one thing. When I define __init__(), it seems hackish, wrong. Rather than feeling like I’m using a normal feature of object-oriented program design, I feel like I’m going rogue; off the beaten path. I mean, what other reason would there be to surround a name with so much negative space?

(I think of it as a moat.)

And so I don’t define objects in Python. It doesn’t feel natural. I use maps and lists and functions as much as possible and leave objects to libraries. Turns out that’s a great way to actually get things done.

Functions, meet objects: Python’s take

Python, even though it’s a multi-paradigmn language, makes functional programming fun. We can create pure functions without much work and pass them around as a variable.

But you can’t always use functions with Python’s extensive libraries. (And libraries are why I learned Python in the first place.) Why not? Because lots of Python’s libraries are built out of objects, and it’s not straightforward to combine our favorite functions (for example, map) with object methods.

Now, we can, of course, deal with objects in Python without resorting to loops and if statements, but the most comfortable way is to use list comprehensions rather than map and reduce. Let’s take a quick example. Where Ruby encourages me to capitalize every element of a list by mapping the capitalize method onto each letter …

letters = ['a', 'b', 'c']
letters.map &:capitalize
# => ['A', 'B', 'C']

… Python encourages me to use a list comprehension:

letters = ['a', 'b', 'c']
[a.capitalize() for a in letters]
# => ['A', 'B', 'C']

It is possible to map object methods onto data structures in Python:

map(str.capitalize, letters)
# => ['A', 'B', 'C']

However, you don’t see that much in the wild, and I’d guess that’s because Python doesn’t make it feel really natural to work this way. (In my opinion, it’s conceptually more advanced than mapping a simple function. How is self determined here, for example.)

I’m not especially happy about using list comprehensions for basic data structure transformations. Should we really be pushed to abandon our favorite functions just because we want to call a method instead of a function? What’s more, I think that, for some things, list comprehensions distract us from the classes of data transformations that we’re actually doing.

Now, I’m not discounting list comprehensions. They’re elegant, powerful and have lots of applications. A great example is identifying palindromes:

s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]

Assuming that p() returns true for all palindromes, that one line gives us all palindromes in s.

In Ruby, things are different.

Hurdles for Ruby

Ruby is beautiful because it balances consistency with convenience. Consistency makes functional programming accessible in many cases, but it also stops us from using some of the more advanced functional programming constructs. I see at least three hurdles to functional programming in Ruby:

Ruby needs extra syntax for function passing
There are no pure functions in Ruby
Ruby will not let us map over top-level def methods (TLDMs)

1. Syntax

This is not much of an objection, but I do want to mention it. Pythonistas will say, “Python lets me pass and call functions without any effort, but Ruby makes it hard!” That’s true, function passing is painless in Python and a little harder in Ruby. The issue is that in Ruby, naming a function automatically calls it. (This is great for writing DSLs without parentheses, but the cost is pretty high.) So you have to precede function or method names with an ampersand if you want to pass them around. And while coll.map &f isn’t a deal breaker, it is a hurdle to function passing, and I find that most new-to-intermediate Ruby programmers tend to stick to non-reusable blocks rather than function passing for most application code.

2. Pure functions

I’m going to go ahead and say it: Ruby doesn’t have functions. And how can you program functionally without functions? (Ouch.)

Let’s step back and think for a second.

Convention says Array#map is not a pure function. Why? Because it relies on data that you’re not passing to it as a parameter–the array itself. That is, numbers.map &:to_s could give different values in the same code–even though its argument doesn’t change–because numbers carries its own logic and state. Put another way, the map method draws on more information than you pass in to return a value.

So that’s that. Object-orientation, by definition, means no pure functions. Right?

Well, let’s think about this for a second. It would be pretty trivial to make map look like a pure function:

module Kernel
  def map(coll, &f)
    coll.map f
  end
end

Ruby likes to group functionality into classes, so you don’t see this in the standard library. But this is a possible patch. And it would almost look like Python.

I think this patch has a valuable point, even if we don’t implement it.

Maybe, just maybe, we should think of coll.map &f as a real, pure function–one that takes coll and f as arguments with a special syntax. One that translates without contradiction into map(coll, &f). Is this is the path to enlightenment?

Of course, there’s really no way of forcing a method to be a pure function in Ruby, especially because a method always has implicit access to self. However, it’s at least possible to code this way for teams that want to avoid the problems that come with mutable state.

3. Top-level methods

In Python, to create an algorithm f(x) and apply it to an array of arrays, we can use the standard def syntax:

def f(x):
    # My algorithm goes here

map(f, data)

In Ruby, though, we can’t pass around these top-level def methods (TLDMs) because they belong to an object. And that makes functional programming difficult. If I want to create an algorithm and then apply it to a list of numbers, I have four options:

# Option 1: Bulky lambda
f = lambda do |x|
  # algorithm
end

data.map &f

# Option 2: Monkey patch
class Array
  def f
    # algorithm
  end
end

data.map &:f

# Option 3: Explicit block
def f(x)
  # algorithm
end

data.map {|x| f(x) }

# Option 4: Method method
# (Thanks to several of you for this suggestion.)
def f(x)
  # algorithm
end

data.map &method(:f)

Each of those solutions has problems or is clunky. There’s really no getting around this, mainly because of the “Ruby immediately calls any method that you name” issue we talked about earlier.

So Ruby’s problem is the opposite of Python’s: In Python, functional passing is easy, and object-oriented programming is okay. But the two don’t mix in a really seamless way. In Ruby, though, functions and objects mix pretty well (if you consider Array#map a function), but pure functional programming is a little awkward and could be easier.

The more I’ve compared Ruby and Python, the more I’ve come to appreciate Ruby’s block construct. It solves several problems at once and is really pretty elegant.

General mayhem: blocks & transformations

Blocks have a big impact on Ruby code, especially object-oriented functional code. The best way to illustrate the effects I’m talking about is to dive into an example.

Example: Stock market. Suppose I have a set of news articles in Markdown format. Each article mentions a company zero or more times and links to it. Now, say I decide to count how many times a company shows up in one day’s articles. Because Big Data.

Each company name is listed by its URL each time, so if I see https://apple.com three times in an article, I know Apple is mentioned three times. (That’s “thrice” if you’re joining us from 17th century England.) After I extract all the domain names and capitalize them, I’ll compare them against a master list of companies I care about and decide which company is most popular for that day’s articles.

Now in Java, this is a chore. I can hear committees forming already. But in Ruby and Python, it’s an quick hack. In fact, I’d probably code it in a REPL.

Ruby’s blocks let us chain methods over several lines. More importantly, they let us mix internal object transformations with broader list transformations. This is powerful stuff:

# Companies are listed in 'companies', one per line
companies = File.read('companies').split("\n")
articles = ['article1.md', 'article2.md', 'article3.md']

# Simplified URL matcher
http_regex = /http:\/\/(?:\w+\.)*(\w+)\.(?:com|org|net)/

# NOT the simplest implementation, but it shows off blocks
# I'm using the regex to return a Match object, whose
# only element is our domain name
def parse_article(a)
  File.open(a) {|f| f.read }.
    scan(http_regex).
    flatten.
    map {|x| x.capitalize }.
    select {|x| companies.include? x }
end

articles.map {|x| parse_article x }.
  reduce(Array.new) {|x,y| x.concat y }

In fifteen lines of code, we use blocks six times for different reasons. If we wanted, we could also use blocks to build anonymous functions, continuations or loop over collections with side effects (via each). For the record, though, I don’t really like each.

Before I move on to the Python implementation, I have to show you how I would actually code this. You see, blocks are awesome, but too many braces get overwhelming. It’s nice to whittle down and organize your code a little when you use them all the time. I tend to give my blocks names (essentially, they’re functions) and keep them separate from my method chains and transformations. Here is equivalent code:

important = lambda {|x| companies.include? x }
articles.map do |a|
  File.read(a).scan(http_regex).flatten.
    map(&:capitalize).select(&important)
end.flatten

Flows a little better, no? And yes, people, that transformation took just five lines of code.

Aside. The mini-point here? Blocks are so important that we have loads of ways to write them. Without tons of braces.

Python doesn’t give us blocks or the end keyword, and whitespace matters—so chaining isn’t always awesome. If we want to do a direct translation of the code above, we can either turn to Lispy nesting or to throwaway variables. ( Edit. I’ve updated the Python regular expression to actually work. Thanks to many of you for pointing me to the findall method.)

import re

with open('companies') as f:
    companies = f.read().split("\n")

articles = ['article1.md', 'article2.md', 'article3.md']

http_regex = "http:\/\/(?:\w+\.)?(\w+)*\.(?:com|org|net)"
url        = lambda x: findall(http_regex, x)
name       = lambda x: x.capitalize()
important  = lambda x: x in companies

def parse_article(a):
    with open(a) as f:
        return \
            filter(important,
                   map(name,
                       map(url, f.read().split())))


[y for x in map(parse_article, articles) for y in x]

# ... or, for the verbose, an alternative implementation
# of parse_article ...

def parse_article(a):
    with open(a) as f:
        lines = f.read().split()
        urls = map(url, lines)
        names = map(name, urls)
        companies = filter(important, names)

    return companies

My point here? Python does fine without blocks. It can do all of the nifty stuff that blocks do in Ruby. But lambda, with, for, and list comprehensions all come with their own special syntax. Maybe that’s not necessary.

Chains & side effects

I want to give you one more Ruby block example. Because it’s just cool.

tap is a method that exists purely for side effects, just like each. It was designed to give us those side effects inside of a chain. tap yields its receiver to a block and then returns its receiver from that same block. You can print or examine anything about the object within that block, and it doesn’t affect the chain in the least. Say we want to debug our transformation above. Given an info method that examines our current array and prints its size, we can say:

def parse_article(a)
  File.read(a).split.   tap {|x| info x }.
    scan(http_regex).   tap {|x| info x }.
    flatten.            tap {|x| info x }.
    map(&:capitalize).  tap {|x| info x }.
    select(&important). tap {|x| info x }
end

parse_article 'article1.md'

# Hypothetical output:
# Array (1000)
# Array (1)
# Array (30)
# Array (30)
# Array (10)

Try debugging like that in Python.

The verdict

After mulling this over for a while, I’ve reached a tentative conclusion: Python is a pretty good language for some types of functional programming and a decent one for object-oriented programming, but not many people use the two together. That means that if you want to adopt a functional style but are using libraries built out of objects, you’ll run into conceptual road blocks. Changing coding style in the middle of a project is a drag, and I think Python really needs is a culture around using functional techniques with objects.

Another great feature would be a chain operator that lets us chain over multiple lines. (Clojure, for example, gives us ->.)

So functional programming is fun (and sometimes beautiful) in Python. Ruby’s beauty, on the other hand, comes from unity and adaptability.

Ruby is fantastically simple when you need it to be. (I know people who get by never having to pass a function.) But it scales really well with complexity. If your project is so big that you want to organize it with objects, you can. And you don’t have to sacrifice your functional style of programming to do so. If, instead, you want to organize your project around data structures and functions, you can do that too. The language is really malleable (sometimes too much), and its unifying principles really make it fun to use.

Ruby is not as elegant as Scheme, to be sure, nor is it quite as pliable. And it could use Python’s list comprehensions, because they are phenomenal. But I’m very happy using Ruby when I can, because it adapts to me, not the other way around. What syntax Ruby does have, it uses well. For example coll.map &f is equivalent to map(coll, &f), but the first is more intuitive to lots of people.

The problem I have with Ruby is that it discourages functions as first-class citizens. Every function I try to define is tied to an object by default. Most of the time, that object is mutable and stateful, and that makes concurrent programming hard. In my opinion, what Ruby needs is to take a hard look at questions like “should we really invoke method names without parentheses by default?”, especially from the point of view that it makes higher-order functions harder to write.

Regardless of how much I like Ruby, work demands that I use Python. But Python isn’t nearly as ugly as I anticipated. Over time, my initial disgust has turned into curiosity. My bafflement has turned into respect. And I’ve decided: I really do like this language. There is nothing that I can’t do in Python that I could do in Ruby, and there was almost no learning curve adding it to my tool belt.

Instead of thinking Ruby or Python–or any language–is more beautiful than the other, this article has pushed me to think about how a language makes me code and how syntax can really matter sometimes. In the case of Python, I’ve decided it really wants for loops to take center stage when it comes to iterating over objects. And because I prefer to think using transformations like map and reduce, I tend to stay away from building complex class definitions. Ruby, on the other hand, blends functions and objects in an intuitive way (with blocks as a great default way of mixing the two). I’m okay using classes when they make sense because I don’t have to give up functional techniques when I do. And that works, too.

And all of this has an interesting effect: When I move between these similar languages, my style changes a lot, and that’s not a bad thing. It keeps the languages separate in my mind and makes me value them both for the insight they bring to the table.

It would be interesting to take a look at how these subtle syntax changes affect Python and Ruby at the community level.