After a Hurricane or other natural disaster, debris poses a huge problem; it knocks out vital infrastructure like electricity, and blocks roads — inhibiting the work of rescue crews and cutting off victims from access to hospitals, supplies, and evacuation routes. It is vitally important that disaster relief efforts have plans for using their limited resources to efficiently clear debris off roads — giving aid to as many people as possible, as quickly as possible.
This was the scenario the Harvard Institute for Applied Computational Sciences presented to two teams of graduate students last week, as part of their first computational challenge. They were given digitized road maps of Cambridge, MA, information about the population density, and a realistic projection of the road debris that would be left behind after a major Hurricane. Each team was given two weeks to design an algorithm to efficiently clear debris, minimizing the amount of time people are cut off from access to local hospitals. I was a member of one of these teams.
Within this scenario, we have enough resources to clear a limited amount of debris each day. All of the bulldozers start off at two local hospitals (admittedly somewhat contrived — in a real scenario, relief aid would likely work their way in from outside the disaster area). At any given time, we can only clear debris on the roads immediately adjacent to roads that we have already cleared — that is, we can’t magically airdrop bulldozers into the most heavily damaged areas. Instead, we have to clear our way to these areas.
“Solutions” to the problem consist of a schedule of which roads to clear, in which order. Whichever solution is most efficient in giving as many people fast access to hospitals, wins.
Coming up with a Solution
In principle, this problem can be solved very easily — just consider every possible schedule for clearing the roads, and choose the one which restores hospital access most quickly. Unfortunately, this approach is utterly infeasible — our map of Cambridge has 604 road segments, yielding about 604! or options (thats a 1 with 1420 zeros after it). Even on the fastest computers in the world (or the future, really), this calculation would take far longer than the current age of the universe to complete. We need a more intelligent way of searching through possible solutions.
The approach we came up with turns out to be highly effective. It involves altering a given schedule to generate a new, similar, and possibly better solution. We called this “nudging” the schedule, and it works as follows:
- Take an initial schedule that solves the problem, but in a non-optimal way (these are easy to come up with).
- Truncate the schedule at some point, keeping only the first N decisions.
- Determine which points in the city do not have access to a hospital after these steps. Choose one at random.
- Clear out the most efficient path (the minimum-weight path, in Graph-algorithmic jargon) from one of the hospitals to this location. Add these decisions immediately after the partial-schedule from step 2.
- Add the rest of the decisions that were truncated in step 2 (paying careful attention to not duplicate any work that you did in step 4).
Nudging solutions has a lot of nice properties: its fast (we can easily do it ~100 times per second with our relatively slow python code), and provides a way to re-prioritize a schedule, since the location chosen step 3 is rescued earlier in the new plan than it was in the old. It isn’t too hard to convince yourself that, with enough nudging, it is possible to arrive at the globally optimum solution from any starting solution. And the choice of using the most efficient path in step 4 tends to create effective schedules which don’t waste resources.
Equipped with a strategy of nudging solutions, there are many algorithms which can search for the best strategy. We chose Simulated Annealing, which works more or less as follows:
- Start with a schedule S1
- Nudge S1 to generate a new schedule, S2
- If S2 is a better schedule, throw away S1
- If it is worse, then throw away S2 with some probability related to how much worse of a solution it is
- Repat the process with the solution that wasn’t discarded
The algorithmic showdown
After two weeks of development, each team applied their algorithm to a slightly modified map of cambridge. We were given 3 hours of computing time on one of Harvards super-computers to run our algorithm and come up with the best possible strategy.
Our solution turned out to be very effective. After only a few minutes of nudging, we had found a solution better than the competing team’s final answer. Furthermore, our strategy out-performed the solution generated by the competitions’ organizers from Georgia Tech, which was previously thought to be near-optimal.
We can put our performance in more concrete terms: with our strategy, the average resident is stranded without hospital access for 2 days and 18 hours. The previous ‘optimal’ strategy kept the average resident waiting for 2 days and 21 hours, and naive strategies (i.e. clearing off roads at random) will keep residents waiting for over 4 days on average. These extra hours would cost many lives, since common post-disaster health problems like dehydration and cholera progress on the timescale of hours to days.
I’m pretty satisfied with our work these past few weeks. This approach isn’t too different from some of the data analysis tasks I tackle within astrophysics, and it was great to see these same techniques successfully handle a problem with real humanitarian benefit. I hope our solution gets taken under consideration in future disaster relief research — to encourage this, I’ve posted our code online.
The trouble with big issues is that… well, they’re big. Big issues have many facets, which generate many different (and often contradictory) points of view among those who care. This creates problems when people use data to think about big issues. I want to focus on two:
A few months ago (at the height of the fighting and deadlock over the national debt ceiling), I decided to investigate how partisan the United States congress really is — that is, how often do congresspeople vote along strict party lines. Perhaps I’m naive, but I find it frustrating that politicians seem so much less willing to compromise than other groups of people. I would prefer a government where concern for the common good takes priority over the culture wars.
With the 2012 elections approaching, I wanted to get a better handle on how willing different congresspeople are to compromise. Fortunately, the congressional voting history is public, and govetrack.us provides convenient access to both browse and download these data.
After downloading these data into a local SQL database, I decided to take the following approach:
- For each vote since 2008, count up the total yes/no counts from the Democrat and Republican parties
- Calculate the ‘net’ party vote for each of these. A value of 1 indicates a party is unanimously in favor of a motion, bill, etc. A value of -1 indicates unanimous disfavor. A value of 0 indicates a deadlock, with half of the party voting yes and no.
- Compare the net party votes for each party.
Here’s one way to visualize this data, for the House of Representatives:
Each colored circle represents a single motion, bill, etc. The x/y locations give the net party vote for democrats/republicans. The color of the circle indicates, in cases where the majority opinion for each party differs, which party won (red=republican, blue=democrat, black= agreement among both parties). So, for example, all points in the upper right and lower left corners are black, since both parties agreed on a yes or no vote. The upper left and lower right corners are regions of disagreement between the parties.The histograms show the overal distribution of democrat (top horizontal graph) and republican (right vertical graph) votes, again color-coded by which party won the vote.
I noticed a few interesting features here:
- Very many votes are (nearly) unanimously accepted by both parties. I found this to be refreshing, as it suggests that Washington is not as deadlocked as the news may imply. Of course, many of these motions are hardly controversial. Take the title of one such resolution, passed in the House by a vote of 421-2: “On Passage – House – H.R. 2715 To provide the Consumer Product Safety Commission with greater authority and discretion in enforcing the consumer product safety laws, and for other purposes – Under Suspension of the Rules.” A good next step would be to filter out the most benign / procedural votes, to better see ideological divides.
- The votes for both parties are clustered around unanimous support or rejection, with few points near the center of the graph. I wish there was more disagreement within each party — dissenting opinions generally seem like a good idea.
- The Republican party indeed has been more of a “Party of No” lately, with a greater concentration of “No” votes. Interestingly, however, Republicans have had more success against Democrats when voting yes for something — Republicans have lost most of the votes where they have voted nearly-unanimously “No” against a Democrat “Yes.”
I haven’t calculated how partisan individual members of congress are — that’s coming soon. In the meantime, here’s the voting record for the Senate (many more blue points, due to the steady Democrat majority since 2008)