The first week in November I went to the 2013 Large Installation System Administration conference, one of the Usenix Association‘s two annual flagship conferences (the other being the summer Annual Technical Conference, which is part of what they call “Federated Conferences Week”).
I arrived in Washington on Saturday, November 2. When I arrived, I discovered that DCA’s Terminal A, where JetBlue is located, is a very long walk from Terminal B, where the MetroRail station is located. Like the MBTA, WMATA surcharges customers for every trip made with non-RFID fare media, so I bought a SmarTrip card, figuring I’d be making multiple MetroRail trips during the week. It took about an hour and a half to get from B6 baggage claim to the Marriott Wardman Park Hotel, the overpriced conference hotel, where I checked in and picked up my conference materials. The conference rate at the hotel was $259/night plus tax; I checked our business travel Web site and found no better deal in the neighborhood. (There is only one other hotel nearby, an Omni, and there’s a lot of benefit to being in or at least very near to the conference hotel.)
On Sunday I attended the configuration management workshop. There were about 30 people attending the workshop, including (it seemed) most of the bcfg2 community. The consensus evaluation of the workshop seems to be that these workshops, which have been held annually for the past several years, always seem to cover the same topics and never really produce anything, and that this year’s workshop was more of the same. CM is still not a solved problem, and a lot of the issues are a result of there being too many different models and not enough standardization. (Microsoft’s Desired State Configuration is mentioned as one major effort in this direction, for Windows systems only, of course.) Orchestration was a major topic, and there were a few new systems mentioned, including Ansible and Salt. The organizers canvassed attendees prior to the conference for questions to discuss, and some of the questions I sent in were brought up. I felt that the scribe for the “small group” segment of the workshop had missed the point of nearly everything I said. Sunday evening I had dinner with two people from the workshop, a woman from CMU central IT and (I think) a guy from Alberta. (Apologies in advance for not remembering anyone’s names!)
On Monday and Tuesday I had no conference-related activities planned, and did radio stuff instead. Scott drove down from Rochester and joined me for those two days, and returned home on Wednesday after covering the oral argument in a home-town Supreme Court case (Town of Greece v. Galloway). On Monday, we visited the WFED transmitter site in Wheaton, and the WBQH transmitter site in Silver Spring, and then had lunch at a Proper Jewish Deli (Max’s in Wheaton) before heading back into town to see the new NPR network studios on North Capitol St. and the new WAMU studios on Connecticut Ave. in the Van Ness neighborhood. (The NPR studios are the home of Morning Edition, All Things Considered, the NPR hourly newscasts, Tell Me More, and formerly Talk of the Nation. The WAMU studios are the home of the local NPR news/talk station and the Diane Rehm show; they also run an all-bluegrass service heard on translators in Maryland and NoVa.) On Monday evening we had dinner with a former colleague of Scott’s who is a producer for Morning Edition.
Tuesday morning started in Friendship Heights at the WMAL/WRQX (Cumulus) studios on Jenifer St. From there we went over to the WJLA/WRQX/WASH tower to see WRQX’s transmitter, and then up into a very ritzy part of Maryland just outside the Beltway where the WMAL (630) transmitter site is located. (We didn’t see the WMAL-FM (105.9) transmitter site as that station, which is licensed to Woodbridge, Virginia, transmits from the other side of the Potomac.) WMAL has the best AM signal in the market, but that doesn’t mean much these days, particularly in sprawling Washington, which has no full-market signals at all, AM or FM. We then made our way over to the studios of Hubbard’s WTOP-FM and WFED, where we shared some laughs at the expense of CBS Radio’s hapless WNEW-FM (an attempted competitor to WTOP which regularly shows up in the Washington ratings behind Fredericksburg and Baltimore stations). Of course, the Hubbard folks can afford to laugh when they own the #1 station in the market, which also happens to be the top-billing station in the entire country. (WFED doesn’t do much in the ratings, but its very weird format is designed to appeal to managers at federal contractors, a particularly lucrative, if only-in-Washington, market.) Tuesday evening we had dinner with Trip Ericson and his girlfriend. Pictures for all of the broadcast stuff to come some day, I hope, on The Archives @ BostonRadio.org.
The conference proper started on Wednesday, with a keynote address by Jason Hoffman of Joyent. He talked about the convergence of storage and compute power, and also at some length about the process of building a platform (SmartOS, an OpenSolaris/Illumos derivative) and a company based on that platform at the same time. (Joyent is primarily a platform-as-a-service or “cloud” provider, competing with EC2 on performance and storage services.) I then went to an Invited Talk by Mandi Walls of Opscode (the company behind Chef) entitled “Our Jobs Are Evolving: Can We Keep Up?”, which was mostly about how sysadmins can provide “strategic value” to companies, and how to make that case to corporate management. (She was also a big believer in changing jobs frequently, as I recall, and suggested that everyone should go to a job interview at least once a year. At least I think that was her.)
I spent the lunch break in the vendor exhibition; I talked with the guy from Teradactyl who isn’t Kris Webb. He was plugging his new storage system and gave me a white paper on it that he had done for another client. I also stopped by the Qualstar booth, where they were demonstrating a modular LTO-5/6 tape library that can hold more than 100 tapes in just a quarter of a rack. (It’s “modular” in that you can stack them vertically; there’s an elevator mechanism that allows tapes to move between modules. Each module can hold four drives and two import/export modules.) This prompted a visit and a follow-up email from the annoying sales guy from Cambridge Computer.
After lunch, I went to the paper and report session to see some guys from Brown University talk about PyModules, a hack similar to our .software
kluge, which is designed for HPC-type uses where different researchers may require multiple versions of multiple software packages optimized for multiple CPU types (paper here). I left the session after that paper and went to see an invited talk by Ariel Tseitlin of Netflix, about how they inject random (synthetic) failures into their production network to validate their fault-tolerance and fault-recovery designs. This can range from shutting down individual VMs to simulating the partitioning of an entire EC2 availability zone. They call this system the “Simian Army”, after the first and best-known of its mechanisms, “Chaos Monkey”. After this, I attended the Lightning Talks session, in which people are invited to talk for five minutes or less on any relevant topic. (This used to be the “WIP” — “Work in Progress” — session.) I had a chance to talk about building storage servers, and the session ended before I could volunteer. In doing this, I skipped seeing a talk about NSA surveillance by Bruce Schneier, which turned out to be no great loss as he wasn’t physically present and gave the talk by phone. (I had already seen Cindy Cohn’s talk for Big Data a few weeks previously so this was no great loss.)
On Wednesday evening, I went to the LOPSA membership BoF, which was not very interesting or enlightening. LOPSA wants more members so it can arrange for better member benefits, but there’s still a huge marketing gap between what LOPSA is and what would induce the target audience to join. One useful bit for information: the registration fee at LOPSA-East (a smaller-scale and cheaper event, held in the spring in New Jersey, that some people in our group might want to attend) includes a free LOPSA membership. Many of the tutorials on the LISA program are also given, by the same instructors, at LOPSA-East. When I got back to my room, I found that my home network
had gone down.
Thursday morning began with a change in schedule, as the originally scheduled presenter for the morning plenary session had an emergency and was unable to attend. In her place, Brendan Gregg of Joyent did a presentation about Flame Graphs, a new visualization for aggregating counted stack traces, which had originally been scheduled for a shorter session later in the day. I then went to a session in the papers track, The first presentation had the guys from Argonne talking about the challenges of managing a private cloud, particularly when it’s necessary to shed load or shut down users’ VMs for hardware maintenance. They described a system called Poncho which allows users to make SLA-type annotations to their VMs, including a notification URL that administrators can use to inform users of an impending shutdown, so that they (the clients) can checkpoint their state or remove a system from job dispatching (paper here). The second paper presented experiences diagnosing performance problems on a very-large-scale (11,520-disk) GPFS cluster, which has since been shut down. The third paper, which won the best-paper award, talked about block-level filesystem synchronization, which might have been interesting were it not for the fact that it required (Linux-only, natch) kernel modifications. ZFS can do the same thing with snapshots; their selling point was that they didn’t need snapshots. It’s not clear to me how resilient such a system would be in the face of incomplete transfers, and I haven’t read the paper, but they did have some nice performance graphs. I spent Thursday lunch in the exhibit hall once more, and talked to a few other vendors, of which nothing came.
For the first session after lunch, I saw two Invited Talks. The first was by David Thaw, a lawyer and law professor with a CS degree, who talked about an ongoing study examining the relative efficacy of two different strands of security regulations. One type, exemplified by Gramm-Leach-Bliley, HIPAA, and Sarbanes-Oxley, requires regulated entities to come up with a reasoned information security policy, disclose it to relevant market participants, and periodically revise this policy as new threats and new best practices become known. The other type, exemplified by numerous state data-breach laws like 210 CMR 17.00 in Massachusetts, function effectively as a command: “Encrypt this data”. Thaw’s analysis of reported data breaches shows that entities which were subject to the self-regulatory kind of law had a lower rate of incidents than those that were only subject to directive-type legislation. Furthermore, the directive-type laws are generally written by lawyers who know nothing about technology and either specify too much (such as requiring a specific cipher, mode, and key length), or more often, specify too little, thereby offering little actual security. (For example, some state data-breach laws would be satisfied if the sensitive data were encrypted with 56-bit DES in ECB mode, which is known to be well within the reach of an attacker today.)
The second talk was by Sandra Bittner, who manages information security at the Palo Verde nuclear plant in Arizona, the largest nuclear power facility in the country and the only one not located near a source of water. She discussed the challenges of security in the SCADA realm, for a heavily regulated facility where changes can only be made every 18 months, and where it takes a minimum of five years for any change in control systems to be fully deployed across all three units of the station. I skipped out during the question period and spent some time talking to David Parter (UW-Madison) about the challenges of IT management, and finding competent people to do this job, in the campus environment. He spoke very strongly in favor of a model where the CIO is a dean-level position, primarily engaged in long-term planning, and a deputy CIO is responsible for day-to-day management of the university’s IT organizations.
It was about this time that the idea began to crystallize for me that our operation today is far more like HPC, as opposed to corporate IT, than it was fifteen years ago when I started attending LISA. This goes in part to changes in our operation (e.g., OpenStack, large high-performance file servers, BYOD for students, etc.) but much more so to changes in the corporate IT environment: essentially no new company will ever build its own infrastructure again, unless there are regulatory requirements forcing it to keep critical business data and applications in-house. This idea will come up a few more times throughout the remainder of the conference.
My last slot on Thursday was again split between two sessions; first, a report by Marc Merlin of Google about the horrifying way they switched their OS platform from very old Red Hat to more recent Debian. (Summary: Linux is just an elaborate bootloader for Google, and they use an in-house analogue to rsync for copying system images around. They first minimized the size of the Red Hat image they were using, and then slowly replaced the old Red Hat packages with new Debian packages — which the converted to RPM and then installed, one by one, in their golden OS image — until the system was all Debian. Then they got rid of the RPM infrastructure to force their internal developers to build Debian packages instead of RPMs. An interesting approach that, like most Google “solutions”, is practical only for Google.) The second talk was by Matt Provost of Weta Digital, Peter Jackson’s CGI firm in New Zealand. The talk, entitled “Drifting into Fragility”, sounded interesting from the abstract, but the speaker had an unfortunate tendency to drone, and I was unable to keep my attention on the content of the talk.
Thursday dinner is always the conference reception, and this year’s (held in the hotel exhibit hall where the vendor exhibition had been just a few hours previously) was lamer than usual. (The food, from the hotel’s catering department, is pretty much always awful; the “nice” things are invariably ones I absolutely will not eat, and the rest consists of salad and fatty carbohydrates, like pizza and fries, that I should not have eaten but did anyway.) I did get to spend some time chatting with Brendan Gregg about DTrace and performance issues. (And throughout the whole conference, I never managed to speak with David Blank-Edelman at all, although I did deliver greetings from everyone to Tom Limoncelli, and we chatted a bit about the culture shift in moving from Google, where he was responsible for engineering one small component of a single service, to Stack Exchange, where his responsibility covers an entire software stack.)
Thursday night was dominated by the Google vendor BoF, which was held in a much-too-small room partitioned off from the exhibit hall, a few hours after dinner was over. I ran in to get my free ice cream and ran back out, having no desire to spend any more time packed like a sardine in a very loud room full of Google recruiters. I was not the only person to do so. (I and a number of other people complained about the over-abundance of vendor BoFs. The unfortunate reality, however, is that these events — which suck the life out of the BoF track — pay for about half the cost of the conference, so it’s very difficult for Usenix to cut back on them without substantially increasing the registration fee.) I did attend the very late “small infrastructures” BoF, which was run by Matt Simmons from Northeastern, where we again discussed the idea that small-scale data centers are “legacy infrastructures”, and those of us who run our own are thus “legacy sysadmins”, and should expect very poor job prospects going forward unless we learn to do something else, or else resign ourselves to being part of a low-value “platform provider”. There was some general discussion of this proposition, and no universal agreement. One guy at the BoF was an admin for a small infrastructure, as part of a very small team, for the largest online ad network in Sweden.
Friday morning, I went to Dan Kaminsky’s talk about the future of security, or the security of the future, or some such. One interesting viewpoint shared by both Kaminsky and the Google speaker at the closing plenary was the notion that virtual machines are not necessarily a particularly good idea. Real processors are Hardware, and Hardware comes with software-design requirements of the form “don’t do this, or something undesirable will happen”. Kaminsky’s view is that processor vendors are in the performance business, not the security business, and given the existence of Intel errata like the one allowing unprivileged processes to reprogram the processor microcode, it seems unlikely that no currently existing security container, whether hypervisor- or OS-based, is safe from malicious programs escaping. Kaminsky has much more confidence, oddly enough, in asm.js
, and notes that all of the JavaScript security issues in browsers have been the result of native-code interfaces and not the core JavaScript functionality. asm.js
simplifies this even further, and can be effectively optimized to within 50% of native-code execution speed.
I skipped the next session and went down to the National Building Museum, one of the DC museums I had never seen. It’s located in the old Pension Building, on Judiciary Square, which was built after the Civil War to house the Pension Bureau, which at the time accounted for fully a quarter of the entire federal budget. I took a guided tour concentrating on the history of the building itself, ate lunch at the cafe, and then saw about half of a special exhibit on the architectural history of Los Angeles. (There were about half a dozen special exhibits running at the time, so I could easily have spent all day, but I needed to rush back to the hotel for the afternoon sessions.) Mac-heads may be interested in two Invited Talks I skipped, “Managing Macs at Google Scale” and “OS X Hardening: Securing a Large Global Mac Fleet” (also at Google); the video and slides for these should be available in a few weeks, and I expect that both will be discussed at the BBLISA LISA-in-review meeting on Wednesday evening.
In the first afternoon session, I saw an Invited Talk by Zane Lackey at Etsy about what they learned while rolling out “HTTPS Everywhere” for their site. They did it in four stages: first for internal clients, then for external sellers, third for external buyers, and finally by disabling the unencrypted fallback option. Their original architecture was the common “SSL terminated at the load-balancer” (rolling eyes here as I’ve heard this story before), which resulted in a severe capacity restriction due to session-count limits on their load-balancer licenses; in order to meet user expectations, they had to terminate SSL on the actual servers, and then scale up their server pools significantly to account for the increased resource requirements of encryption. This session was followed by Jennifer Davis from Yahoo on “Building Large-Scale Services”, but I have no recollection of what she said.
The closing plenary address was given by Todd Underwood of Google. He made the provocative claim that system administration as a profession needs to be eliminated completely; operations is a job for automation and robots, and humans need to refocus their attention on running services, lest they be stuck in a low-value, low-skill, low-wage position as the people who wander data centers replacing servers when they fail. He also noted, almost in passing, that Google doesn’t see much point in VMs. (It should be obvious to any observer that since at Google’s scale, services require anywhere from tens to tens of thousands of machines, there is never a need to multiplex disparate services onto a single machine, and the overhead of a VM hypervisor is simply wasted resources that would be better put into running whatever service is the machine’s business purpose.) He made several handwavy arguments that this was the future and not just something that only exists at Google scale. One thrust of his presentation was a justification of the SRE role, and a claim that it differs not just in scale but qualitatively from system administration. With so many current and former Googlers in the audience, he was speaking to something of a friendly crowd, and didn’t receive much in the way of serious questioning during the Q&A; it would be interesting to see how the many thousands of administrators in Windows shops, very few of whom attend LISA or any other Usenix conference, would respond to these claims.
Friday night after dinner I attended the usual “dead dog” party in the conference organizer’s suite. I was very tired and rather “socialed out” so I don’t recall much of what was said. Lee Damon and I talked about a common acquaintance of ours with a cute UDel sophomore who was attending the conference for the first time. There were homemade brownies, and bottles of water, cans of soda, bottles of whisk(e)y, and bags of chips, and all the Usual Suspects were there, except Parter, who I don’t recall seeing again after speaking with him on Thursday. Adam Moskowitz (who is something of a cook but not a wine connoisseur) talked amusingly about cooking for Kirk and Eric (who definitely are wine connoisseurs).
On Saturday I flew home, and when I pulled in to my parking space, I was disturbed to see a 20-foot roll-off dumpster on the lawn next to my building. However, my doorbell was still lit, proving that my condo hadn’t burned to the ground. (In daylight, the container appeared to be full of shingles, suggesting that the association had replaced my building’s roof while I was gone.) I found my UPS feeping constantly, and power shut off to my home server and wireless access point; I power-cycled the UPS and things came back to normal.
Abstracts for all presentations, plus (eventually) slides and video