The RERO approach to software engineering -- Release Early, Release Often -- essentially means publishing a program as soon as it works, even though it's probably not optimal and you definitely haven't weeded out all of the bugs yet. This allows you to see the program fail in practice, letting you identify problems that you might not have thought of on your own; then you fix those and release it again, constantly refining it in the process.
Here's a good look at what this means, ideologically and practically, especially in contrast to traditional notions of quality control. (You should consider reading the entire series; it's quite enlightening.)
So if it works, why not extend this principle to other fields beyond software engineering?
I have long argued for an experimental and evidence-based approach to education, with short feedback-and-updating cycles as a key element. But what about, say, laws and regulations? What about currencies, or types of government?
If you had asked be about any of these in the past, I would have been extremely sceptical. Those are sensitive, high-impact areas, where even small failures can have immense effects. We're not talking about a few thousand people not being able to access their e-mail inboxes for half a day, here.
Then the DAO thing happened -- and witnessing the aftermath is slowly making me come around.
In case you haven't been following it, here's my short take on what happened:
Ethereum is a blockchain-based digital currency like Bitcoin, but with the added bonus that you can run programs on the blockchain. This enables the implementation of "smart contracts" that automatically execute when triggered by other blockchain events, such as the receipt of funds. Using this technology, the makers of Slock.it released code for a "Distributed Autonomous Organization", intended to be a fully automated investment fund controlled by votes from token holders, with no central authority to mess things up. This seemed like such a revolutionary idea that it was able to raise a record 140 million dollars in crowdfunding before going live this month -- and then someone found an exploit and started draining money (Ethereum) from the DAO.
The word's still out on whether the attacker will ever manage to use this money. The debate is still raging over whether it's okay for the DAO creators (who explicitly announced a completely hands-off approach before) to fork the blockchain and reverse the transactions. This is critical, because much of the appeal of the DAO (and cryptocurrencies in general) rests on the assumption that no central authority can alter, block or reverse transactions on the blockchain. If the way the algorithm is coded allows something, then it's allowed by definition; if you coded the smart contract in a way that allows for exploits, then there is (or should be) no recourse to any "higher authority".
This is why I think what's happening with the DAO is so important: because thanks to its creators' RERO mentality in releasing the requisite code, we're now able to watch our first large-scale experiment in governance by algorithm fail in real time, and learn from its failures.
And there is much to learn. This Bloomberg article pretty much cuts to the heart of the matter. The main strength of smart contracts is that, like any other computer program, they do exactly what it says in the code, meaning they can't be tricked, can't be bribed or extorted, and can be arbitrarily complex because they don't suffer the cognitive limits constraining administration-by-humans. But this "inhuman efficiency" is also their main weakness: if there is any way to use the code that you didn't foresee, there's no human authority to appeal to saying "wait, no, I obviously didn't mean it like that". In the blog post I linked above, Ethereum inventor Vitali Buterin implies that the problem of reliably translating intentions into isomorphic code might be AI-complete; the creators of Slock.it suggest helping matters by embedding a hash of a plain English contract in the smart contract (but of course the interpretation of that would again require human legal assistance, negating the whole point of a smart contract).
Those are real problems, and they are very much recognized by the relevant people in the field. But I don't see anyone retreating before the size of the problem, or saying we should stop experimenting with smart contracts and DAOs until the control problem is solved. This is why the reaction to the DAO exploit actually increased my trust in the RERO model, even for critical areas like currencies and contract law: because everybody involved knew they were participating in a huge, risky experiment, because they thought the direction was promising even though it was only the first step and likely to fail. If you look at the way the creators and most stakeholders in the DAO are reacting, you'll see that it's less like damage control and more like the expected next step in an agile development process. Sure, the DAO code could have been audited (even) more closely before the launch; and sure, some of the exploits might have been fixed by that; but the only way to really know how a project like that will behave in the wild is to put it out there, and then update on what happens.
Why is this important? One the one hand, because smart contracts are important. As soon as at least some of them can be shown to reliably work, have the potential to seriously disrupt many aspects of the economy (particularly the B2B part) by cutting out the middleman; and if distributed autonomous organizations of various shades really take off, they will be one of the biggest challenge for regulators ever. If you follow that line of thought, the economy of the future might be one where objects connected to the Internet of Things will autonomously negotiate for the resources they need, such as grid power or bandwidth, with other programs on the supplier side, creating an Economy of Things[link:https://slock.it/solutions.html] that could be arbitrarily many layers deep; and even in transactions between humans, governmental regulators will have an increasingly hard time controlling, tracking and taxing economic activity.
On the other hand, as I said at the beginning, the aftermath of the DAO exploit can teach us a lot about how to apply the RERO mentality to other critical fields. One important takeaway is that participation should be voluntary -- like patients agreeing to experimental treatments knowing that they might fail. Another is that the program (or law, or whatever form it takes) needs to be transparent and open-source, so that some failure modes can be detected by the crowd before the launch, and others traced back to the source of the problem after an exploit. A third, and possibly the most controversial takeaway is that people who find exploits need to be able to do so without fear of disproportionate retribution -- we will need white-hat hackers for every level of social organisation.
For now, the first large-scale instance of a DAO has failed -- but in dealing with the failure, its creators are doing us a service by highlighting both the problems facing us and the way out, through iterative improvement of running systems "in the wild".
Sunday, June 19, 2016
Monday, June 13, 2016
Liberating architecture
I was looking at a bush recently, and thinking "why can't cities be like this?"
The obvious answer is, of course, building regulations. (And lazy thinking. But we'll get to that.)
Now, I understand that building regulations have a point. Their most obviously legitimate purpose is to ensure the safety of both the building's inhabitants and their neighbors. You shouldn't be allowed to build a house that will collapse if you drive a nail into the wall at the wrong angle; neither should you be allowed to build a house out of flammable materials that will instantly set all neighboring buildings on fire, or fall onto them.
(From a libertarian point of view, you could say that it's only the latter consideration that counts: your buildings should not endanger others; whether they are a danger to yourself is your own problem. But then, almost nobody ever makes a building just for themselves, to be destroyed after their death; in nearly every case, someone else will also be living inside that building at some point.)
Because of this, and because humans are fallible, building regulations usually have large safety margins and limit the range of acceptable constructions to a few well-established principles that have been known to work. Of course, architects and builders can push the envelope by requesting permission for novel types of structures; but that is a lengthy and often expensive process with a high probability of failure, and as such not open to most people.
So, computers.
Existing building regulations are necessarily coarse-grained because they need to be tractable from an administrative point of view. You can't check every crazy hand-drawn plan for static viability and fire-safety and whatnot; you need applicants to operate within the limits of your well-established principles if you want to get any work done.
As long as you're a person and not a computer program, that is.
Imagine an architectural planning program that lets you do anything as long as you don't overstep the bounds of static viability and fire safety. The program would know those boundaries at a high level of detail, and it would be capable of calculating the interactions of far more variables, in many more ways, than could reasonably be be expected of a human being. As a result, the program would be able to permit many constructions that humans would have to fail because they don't understand them.
Now give the program a graphical user interface with tools that make it as easy to use as the Sims construction window. Let people experiment with designs up to the very limits of feasibility. Add a cost calculator -- every construction company will be happy to supply their prices in machine-readable formats if the program allows users to directly export the specifications for every part to the construction service provider of their choice. (Ideally, those specifications wouldn't even have to be implemented by humans, but just fed directly into the machines producing the relevant parts.) Gamify the whole thing and unlock people's creativity, making cities into growing and unique places full of custom-built habitations.
(To answer the obvious objection: Yes, of course it would be possible to code aesthetic requirements into the planning program. No, of course the harshest of critics wouldn't be satisified with any algorithmically encoded aesthetics; but then, those critics won't be happy with anything decided by a human committee, either.)
You might have noticed that I don't actually know much about architecture, and my experience with building regulations is mostly limited to frustration at the impossibility of building an Earthship in Austria. The above, then, is more of an intuition -- an intuition that I think might hold true for other fields as well: that algorithmic decision-making, contrary to the popular imagination, could actually lead to more freedom through expanding the space of tractable possibilities.
Thoughts?
The obvious answer is, of course, building regulations. (And lazy thinking. But we'll get to that.)
Now, I understand that building regulations have a point. Their most obviously legitimate purpose is to ensure the safety of both the building's inhabitants and their neighbors. You shouldn't be allowed to build a house that will collapse if you drive a nail into the wall at the wrong angle; neither should you be allowed to build a house out of flammable materials that will instantly set all neighboring buildings on fire, or fall onto them.
(From a libertarian point of view, you could say that it's only the latter consideration that counts: your buildings should not endanger others; whether they are a danger to yourself is your own problem. But then, almost nobody ever makes a building just for themselves, to be destroyed after their death; in nearly every case, someone else will also be living inside that building at some point.)
Because of this, and because humans are fallible, building regulations usually have large safety margins and limit the range of acceptable constructions to a few well-established principles that have been known to work. Of course, architects and builders can push the envelope by requesting permission for novel types of structures; but that is a lengthy and often expensive process with a high probability of failure, and as such not open to most people.
So, computers.
Existing building regulations are necessarily coarse-grained because they need to be tractable from an administrative point of view. You can't check every crazy hand-drawn plan for static viability and fire-safety and whatnot; you need applicants to operate within the limits of your well-established principles if you want to get any work done.
As long as you're a person and not a computer program, that is.
Imagine an architectural planning program that lets you do anything as long as you don't overstep the bounds of static viability and fire safety. The program would know those boundaries at a high level of detail, and it would be capable of calculating the interactions of far more variables, in many more ways, than could reasonably be be expected of a human being. As a result, the program would be able to permit many constructions that humans would have to fail because they don't understand them.
Now give the program a graphical user interface with tools that make it as easy to use as the Sims construction window. Let people experiment with designs up to the very limits of feasibility. Add a cost calculator -- every construction company will be happy to supply their prices in machine-readable formats if the program allows users to directly export the specifications for every part to the construction service provider of their choice. (Ideally, those specifications wouldn't even have to be implemented by humans, but just fed directly into the machines producing the relevant parts.) Gamify the whole thing and unlock people's creativity, making cities into growing and unique places full of custom-built habitations.
(To answer the obvious objection: Yes, of course it would be possible to code aesthetic requirements into the planning program. No, of course the harshest of critics wouldn't be satisified with any algorithmically encoded aesthetics; but then, those critics won't be happy with anything decided by a human committee, either.)
You might have noticed that I don't actually know much about architecture, and my experience with building regulations is mostly limited to frustration at the impossibility of building an Earthship in Austria. The above, then, is more of an intuition -- an intuition that I think might hold true for other fields as well: that algorithmic decision-making, contrary to the popular imagination, could actually lead to more freedom through expanding the space of tractable possibilities.
Thoughts?
Sunday, June 12, 2016
Low-hanging fruit
Every year, thousands of students in Austrian primary and middle schools take a reading test to see how well they do at distinguishing nonsensical sentences from meaningful ones. It's called the "Salzburger Lese-Screening" (SLS), and it's probably one of the most well-designed tests that these students will ever take; it's definitely much better than most of those that will ultimately determine their grades, on which the SLS has no effect.
(That said, I still don't see how it can be of much use for policymakers. Sure, you get reliable and valid results for every student at a given age -- but if you wanted to know how to make students read better, you'd have to correlate this with data on how and what they're being taught. But the school authorities, at least in Austria, do not have this information, except at the extremely coarse-grained level of school types. So, fine, students at one type of school do better than those at another type of school -- but what exactly is it that causes this difference, provided it's not all selection effects? To be clear, this is something that could be investigated, but that would require a much greater dedication to evidence-based teaching -- and policy-making -- than you tend to get in this country.)
So students take this little test, which takes all of ten minutes if you factor in preparation time and collecting the reams of paper. -- Speaking of paper: it's five sheets for every student. One with instructions, one practice sheet to familiarize students with the format of the test, and three sheets filled with increasingly complex sentences that students have to mark as either correct or incorrect. -- Now it's the responsibility of the class teacher (usually a language teacher) to evaluate the test. At most schools, teachers share a set of transparencies with the correct answers so they don't have to think about every answer for themselves. They then write down the number of correctly marked sentences by student. Next, there is a handy little table included with the instructions that maps this raw data to a "reading quotient", which works like the IQ in that it's centered on the established national average at 100 points.
The final step, as far as the administering teacher is concerned, is to fill out a report sheet for the class. The report sheet does not ask for the raw data, but for the percentage (to be calculated by the teacher) of students with a reading quotient lower than 95 or higher than 115. This sheet is then tied together with the ~140 pages of tests and handed in for further processing.
The problem should be obvious. No matter how well-designed the test -- this way of collecting the results introduces multiple unnecessary points of failure, by having teachers manually perform not only the evaluation, but two linear transformations of the raw data, both of which could be done by a computer program in seconds. Everybody who has ever worked with data knows that most problems derive from input errors, i.e. human failure -- and to multiply that by introducing three separate lists of hand-written numbers, shaded tables printed on paper in ten-point font, and having language teachers calculate percentages (no offense) is just asking for trouble.
Especially when the solution is equally obvious. Coding a digital version of the test is trivial (the reading quotient might have to be recalibrated for students clicking or tapping instead of circling the correct sign, though) -- and even failing that, digitally processing printed multiple-choice tests is a routine affair these days. At the very least, just take the teachers' lists with the raw data, if you can't make them input the numbers directly into a computer, instead of wasting the time of highly educated professionals with work that has nothing to do with their core competencies.
Of course, I'm saying all of this because I just wasted two hours doing just that (and explaining to colleagues how to calculate percentages). The upside? It helped me finally figure out what the overarching topic of my blog should be. So here it is: low-hanging fruit for a start, but I'll be sure to venture further out in future posts.
(More thoughts on optimizing education: here)
(That said, I still don't see how it can be of much use for policymakers. Sure, you get reliable and valid results for every student at a given age -- but if you wanted to know how to make students read better, you'd have to correlate this with data on how and what they're being taught. But the school authorities, at least in Austria, do not have this information, except at the extremely coarse-grained level of school types. So, fine, students at one type of school do better than those at another type of school -- but what exactly is it that causes this difference, provided it's not all selection effects? To be clear, this is something that could be investigated, but that would require a much greater dedication to evidence-based teaching -- and policy-making -- than you tend to get in this country.)
So students take this little test, which takes all of ten minutes if you factor in preparation time and collecting the reams of paper. -- Speaking of paper: it's five sheets for every student. One with instructions, one practice sheet to familiarize students with the format of the test, and three sheets filled with increasingly complex sentences that students have to mark as either correct or incorrect. -- Now it's the responsibility of the class teacher (usually a language teacher) to evaluate the test. At most schools, teachers share a set of transparencies with the correct answers so they don't have to think about every answer for themselves. They then write down the number of correctly marked sentences by student. Next, there is a handy little table included with the instructions that maps this raw data to a "reading quotient", which works like the IQ in that it's centered on the established national average at 100 points.
The final step, as far as the administering teacher is concerned, is to fill out a report sheet for the class. The report sheet does not ask for the raw data, but for the percentage (to be calculated by the teacher) of students with a reading quotient lower than 95 or higher than 115. This sheet is then tied together with the ~140 pages of tests and handed in for further processing.
The problem should be obvious. No matter how well-designed the test -- this way of collecting the results introduces multiple unnecessary points of failure, by having teachers manually perform not only the evaluation, but two linear transformations of the raw data, both of which could be done by a computer program in seconds. Everybody who has ever worked with data knows that most problems derive from input errors, i.e. human failure -- and to multiply that by introducing three separate lists of hand-written numbers, shaded tables printed on paper in ten-point font, and having language teachers calculate percentages (no offense) is just asking for trouble.
Especially when the solution is equally obvious. Coding a digital version of the test is trivial (the reading quotient might have to be recalibrated for students clicking or tapping instead of circling the correct sign, though) -- and even failing that, digitally processing printed multiple-choice tests is a routine affair these days. At the very least, just take the teachers' lists with the raw data, if you can't make them input the numbers directly into a computer, instead of wasting the time of highly educated professionals with work that has nothing to do with their core competencies.
Of course, I'm saying all of this because I just wasted two hours doing just that (and explaining to colleagues how to calculate percentages). The upside? It helped me finally figure out what the overarching topic of my blog should be. So here it is: low-hanging fruit for a start, but I'll be sure to venture further out in future posts.
(More thoughts on optimizing education: here)
Subscribe to:
Posts (Atom)