Simple database tricks

I’ve been keeping an inventory of my personal library in a database for more than a decade. It was thus easy, when writing the previous post, to find out the exact number of books I had bought from SFBC. I used the following SQL query:

SELECT EXTRACT(YEAR FROM purchase_date), COUNT(DISTNCT title)
    FROM inventory
    WHERE purchased_from = 'SFBC'
    GROUP BY EXTRACT(YEAR FROM purchase_date)

The results looked like this:

 date_part | count 
-----------+-------
      2001 |     1
      2002 |    11
      2003 |    24
      2004 |    18
      2005 |    24
      2006 |     8
      2007 |    10
      2008 |     6
      2009 |     4
      2010 |     5
      2011 |     5
      2012 |     9
      2013 |     3
           |   106
(14 rows)

I might ordinarily have used an ORDER BY clause to force the results into a useful order, but in this case the query planner decided to implement the GROUP BY using a sort so I didn’t need to. The last row in the results shows all the books I bought before I started the inventory database in 2001; it corresponds to books whose purchase_date is NULL because I didn’t have any record of when they were purchased.

Aside | Posted on by | Tagged

End of the line for book clubs?

When I was growing up, pretty much every magazine had at least one double-page spread, with a response card tipped in, for a book club of one sort or another. Which club it was would depend on the readership of the magazine: Popular Science might have the “Computer Book Club” or the “Military Book Club”; a national general-interest magazine like The Atlantic or Time might have the “Literary Guild” or the “Book of the Month Club”, which was one of the biggest. Like their cousins the record-and-tape clubs, these businesses were not actually clubs at all; they were instead an outgrowth of 19th-century publishers’ business model of selling books by subscription. Unlike the original subscription-book business, where each series or title would have to be sold anew to a set of subscribers, in the book clubs, “members” would get a new book every month, selected by the club’s editors, and most could be depended on not to refuse or return their selections since it was a hassle (they had to return a postcard, or worse, mail the book back to the club’s warehouse). The clubs could then use these guaranteed sales to negotiate with publishers and authors for the right to reprint the books; by resetting all titles into just one or two different formats, they could use economies of scale to get lower production costs for their reprints; and by dealing directly with customers, they could eliminate the distributor’s and retailer’s markups (which account for 50% or more of the “cover price” of a new hardcover) and sell the books at a huge discount while still making a tidy profit for their shareholders. Because they were usually printed at a different size and lower-quality binding than the trade originals, the secondary market for book-club editions was (and is) relatively limited; many secondhand book dealers won’t even handle book-club books.

By the 1980s, there were three major book-club organizations: the Literary Guild, the Doubleday Book Clubs, and the Book of the Month Club clubs. Doubleday was acquired by the German media conglomerate Bertelsmann (which also purchased Columbia Records and its affiliated record-and-tape club). Eventually the former Doubleday book clubs merged with the Book of the Month Club, and many clubs were merged or shuttered; the combined organization was sold to a private-equity investor in 2008.

By this time, the book-club model had changed considerably. Although the customers would still be offered a monthly “featured selection”, the clubs maintained significant backlists from which customers could order, including many books which were never featured selection. (I am guessing that this originated as a way to sell off returns and other unsold copies of the featured selections, since as reprints they could not be returned to the publisher, but by the 1980s it was important to have a full catalogue for potential customers to order their “teaser” selections from, hence the double-page magazine spread.) Some of the book clubs were even able to commission their own editions — at first omnibus editions of shorter and classic titles, but later some even published original anthologies.

For the run-of-the-mill reprint business, however, book clubs depended on being able to acquire books before they were published, so that they could be re-typeset and reprinted by the time the original publisher’s edition appeared in retail stores. Otherwise, many subscribers would not wait for the club edition, but would instead buy the retail edition, leading to substantially increased refusal and return rates and poorer economies of scale.

But this model still depended on the clubs being competing primarily against the regular in-person retail book trade, and competing for customers who were relatively uninformed about the universe of titles available to them. In such a setting, the book club can provide value both as a source of cheaper books and as a better source of information about what books are available — particularly for the genre book clubs, whose editors see a far greater fraction of the books being published in their categories than your typical Waldenbooks PFY would.

That was the situation as late as 1992, when I joined the Science Fiction Book Club, one of the Doubleday clubs. SFBC’s editor was then Ellen Asher, who was one of the most respected editors in the SF genre, having run SFBC for two decades by then. According to my library database, I have since purchased 233 distinct titles from SFBC. I have stayed with the SFBC for all these years, since it was a good source (that I didn’t have to pay for) for information about many new books, and the prices were reasonably competitive. But my purchases have declined steadily over the past decade, from a peak of 24 books a year in each of 2003 and 2005, down to just three books this year, and there doesn’t seem much prospect for any more this year, given what sfbc.com lists as forthcoming.

What happened?

There are three reasons, two of which are more general, affecting all book clubs, and one of which is more specific to SFBC; I’ll take take the third reason first. During the one of the SFBC’s episodic consolidations, Ellen Asher left SFBC. Management was taken over by an editor from one of SFBC’s sister clubs — I don’t even recall which one. I’m sure the new editor, Rome Quezada, is not a bad fellow, but he doesn’t seem to have the talent at picking books that Asher and her staff had. They actually commissioned a telephone survey a couple of years ago and I told the survey-taker exactly that (and I usually don’t talk to telemarketers).

The second issue is the arrival of ebooks as a serious platform. Like book-club editions, ebooks have exactly the same creative content as physical books — sometimes even down to the same artwork, depending on the reader technology being used. Unlike book-club books or regular trade books, they cost nothing to print (since they’re not printed) and next to nothing to distribute or warehouse. Ebooks are sold by a very small number of sellers, who have a direct and ongoing relationship with their customers that often involves many other categories of goods and services. And in the specific case of the SFBC and clubs for other related genres, many of the target demo actually prefer ebooks over p-books for their convenience and portability. (I’m not one of them, but I’m also 40 years old. Nearly all of the people I know who are my age or younger are both neophiles and live in rental housing; the cost of moving a large library every few years is a substantial incentive not to accumulate physical books.) As downloads, there is no shipping cost for purchasers to pay, and they thus cost much less than even a book-club edition hardcover; indeed, they are often cheaper than publisher’s trade paperback editions.

The first, and most important issue facing book clubs is the same one facing the rest of the book industry: Amazon. Amazon is the Walmart of the book world: a near-monopsony buyer of books and many other creative commodities unafraid to use its market power to transfer authors’ and publishers’ shrinking producer surplus to the consumer. My progressive friends generally love Amazon and despise Walmart, but their role in their respective markets is essentially the same, with one exception: unlike Walmart’s shareholders, Amazon’s shareholders don’t appear to care whether the company actually makes a profit or not, enabling them to sell many goods at below cost. Amazon’s market power is such that no publisher can afford not to do business with them, on their terms, and that means that Amazon’s prices on physical books, including shipping, are often less than the corresponding, lower-quality, book-club edition. This leaves the book clubs in a precarious position: unless they are selling a product that Amazon can’t — a proprietary anthology or omnibus, or a hardcover version of a title available to the trade only in mass-market paperback — they can’t compete on price. And as their margins are further squeezed by increasing costs of printing and postage, there is less and less money available to commission new editions or even to hire specialist editors to do a better job of selecting the best titles for reprints. There was maybe a window, a few years back, for book clubs to turn to a more editorially-driven model — fewer titles, perhaps, but providing a better match with the audience and better content in the “monthly” newsletter to sell them — but it seems that this window is now well shut.

What brought me to write this was the realization that, for the first time in a couple of decades, Mercedes Lackey had a new Valdemar book, Bastion, coming out (DAW has consistently released a new one every autumn for the past decade) and SFBC was not carrying it. Lackey has legions of loyal fans, and the four previous titles in this series have by all account sold reasonably well, so it’s a mystery why they don’t have it. But whether they decided not to pick it up, or they couldn’t come to terms with the rights holders, the effect is the same: to drive many potential customers to other sources — like Amazon, which is where I preordered it yesterday. Now that Amazon knows I’ve bought a Lackey book, its recommendation algorithm is certain to recommend many of the other books that I would have bought from SFBC — and unlike the club newsletter, Amazon will tell me about them months in advance of their release date.

I’ll probably continue to read the SFBC newsletters if they keep sending them to me, but I am doubtful that I will have much to buy from them in the future. I’ll post again if I find something interesting.

Posted in Books | Tagged , ,

Another pearl from Anderson’s article

To summarize thus far, opaque sentences require hard work to fully (as opposed to one-sidedly) interpret. Still, the typical six-year-old is conversationally fluent in them. In [] the high-stakes world of legal reasoning, it is surprising that all the king’s horses and all the king’s men, often billing by the hour, fall short of extracting the full range of reasonable interpretations of a statute.

—Jill Anderson, supra, p. 50.

Quote | Posted on by | Tagged , , ,

How courts are like children

Besides courts, two other much-studied populations who have trouble handling opacity are children under the age of four to six and older children with diagnoses on the autism spectrum.

—Jill Anderson, “Misreading Like a Lawyer: Cognitive Bias in Statutory Interpretation”, Harvard Law Review, forthcoming (draft on SSRN)

Hat tip: Rick Hasen

Quote | Posted on by

Building big NFS servers with FreeBSD/ZFS (2 of 2)

At CSAIL, we support two distributed file systems: AFS, which has a sensible security model (if somewhat dated in its implementation), but is slow and limited in storage capacity, and NFS, for people who don’t want security and do need either speed or high capacity (tens of terabytes in the worst case, as compared with hundreds of gigabytes for AFS). Historically, AFS has also been much cheaper, and scaled out much better, with NFS requiring expensive dedicated hardware to perform reasonably well, whereas AFS could run on J. Random Server (these days, Dell PowerEdge rackmounts, because they have great rails and a usable [SSH, not IPMI] remote console interface) with internal drives and when we needed more capacity, we could just buy another server and fill it with disks. Of course, the converse to that was that AFS just couldn’t (still can’t) scale up in terms of IOPS, notwithstanding the benefit it should get from client-side caching; the current OpenAFS fileserver is said to have internal bottlenecks that prevent it from using more than about eight threads effectively. So in this era of “big data”, lots of people want NFS storage for their research data, and AFS is relegated to smaller shared services with more limited performance requirements, like individual home directories, Web sites, and source-code repositories.

In part 1, and previously on my original Web page, I described a configuration for a big file server that we have deployed several of at work. (Well, for values of “several” equal to “three”, or “five” if you include the mirror-image hardware we have installed but not finished deploying just yet as a backup.) One of the research groups we support wanted more temporary space than we would be able to allocate on our shared servers, and they were willing to pay for us to build a new server for their exclusive use. I asked iXsystems for a budgetary quote, and we’re actually going forward with the purchasing process now.

If you read my description from January, you’ll recall that we have 96 disks in this configuration, but four of them are SSDs (used for the ZFS intent log and L2ARC), and another four (one for each drive shelf) are hot spares. Thus, there are 88 disks actually used for storage. On our scratch server, we have these configured in mirror pairs, and on the other servers, we are using 8-disk RAID-Z2 — in both cases, the vdevs are spread across disk shelves, so that we can survive even a failure of an entire shelf.) That gives us the following capacities, after accounting for overhead:

Storage array usable capacity
Configuration Drive size Aggregate Usable
mirror 2 TB 80 TiB 76.1 TiB
mirror 3 TB 120 TiB 114.2 TiB
RAID-Z2 2 TB 160 TiB 117.4 TiB
RAID-Z2 3 TB 239 TiB 176 TiB

The quote that we got from iXsystems was for a system with more memory and faster processors than the servers that Quanta donated in 2012, and has 3 TB disks. All told, with a three-year warranty, it comes in at under $60,000. For this group, we’ll be deploying a “scratch” (mirrored) configuration, so that works out to be under $512/TiB, which is amazingly good for a high-speed file server with buckets of SSD cache. That’s about 47 cents per terabyte-day, assuming a useful life of three years, and in reality we usually get closer to five years. (Of course, that does not include the cost of the rack space, network connectivity, power, and cooling, all of which are sunk costs for us.) In the “production” (RAID-Z2) configuration, the cost looks even better: $341/TiB or 31 cents/TiB*d. (Of course, we’d like to have a complete mirror image of a production system, which would double that price.)

This raises an interesting question: at what point, if at all, does it make sense to build our AFS servers around a similar hardware architecture? Given the OpenAFS scaling constraints, might it even make sense to export zvols to the AFS servers over iSCSI? A fairly random Dell R620 configuration I grabbed from our Dell Premier portal (not the model or configuration we would normally buy for an AFS server, but an easy reference) comes in at nearly $960/TiB! (Nearly 88 cents per TiB*d.) Because of various brokenness in the Dell portal, I wasn’t able to look at servers with 3.5″ drive bays, which would significantly reduce the price — but not down to $341/TiB. I think the only way to get it down that low with Dell hardware is to buy a minimal-disk server with a bunch of empty drive bays, then buy salvage Dell drive caddies (thankfully they haven’t changed the design much) and fill the empty slots with third-party drives. Even if you do that, however, I think you still can’t amortize the cost of the host over enough drives to make it competitive on a per-TiB basis.

For now, we’ll be sticking with our existing AFS servers, but this will be a matter to consider seriously when we have our next replacement cycle.

Posted in Computing, FreeBSD, ZFS | Tagged , | 3 Comments

Building big NFS servers with FreeBSD/ZFS (1 of 2)

Over the past couple of years, I’ve had the chance to build a number of big file servers for work. Back in January, I wrote a description of the server hardware we used and some of the software configuration that was required to make it go. Since we’re in the process of actually buying one of these things for the first time (the initial hardware was donated), I figured it was time for an update.

When I first built these servers, I used FreeBSD 9.1 as the base operating system, primarily due to the combination of familiarity and ZFS support. ZFS is a big win for servers on this scale. I had hoped that FreeBSD 9.2 would be out by the summer, and we could test and deploy a new release fairly easily, but that still hasn’t happened; summer break ended three weeks ago, and with it my opportunity to test an updated software stack. As it turns out, most of the stuff that we might have cared about from 9.2 is already in my patched 9.1, and some of the patches that matter the most didn’t make it into the 9.2 release cycle at all.

The servers are all Puppetized, although some issues with Puppet have limited my ability to control as much of the configuration that way as I would have liked, and support for FreeBSD in non-core Puppet modules is still very limited. (Many Puppet modules that I’ve run across also have conceptual or data-model problems that limit their portability.)

One issue we discovered fairly early on was with the driver for the Intel 10-Gbit/s Ethernet controller. This turned out to be a misunderstanding on the part of the Intel people over memory allocation. (Specifically, for an interface with jumbo frames configured, as all but one of our servers have, they would try to allocate a “16k jumbo” buffer, which requires four contiguous pages of memory. The controller’s DMA engine has no problem doing scatter-gather across multiple physical pages, so the right thing — and the fix that I applied — was simply never to allocate more than a page-sized buffer, which will always be possible whenever there is any memory available at all.) Debugging this issue at least got me to write a munin plugin for tracking kernel memory allocations, which has proved to be useful more generally.

Once I fixed the ixgbe driver, the next issue we ran into was the fact that 96 GB just isn’t quite enough for a big ZFS file server, at least on FreeBSD. This is due in large part to the way ZFS manages memory: it requires wired kernel virtual memory for the entire Adaptive Replacement Cache (ARC), and while it tries to grow and shrink its allocation in response to demand, it often doesn’t succeed (or doesn’t succeed quickly enough) to avoid memory-exhaustion deadlock. (By contrast, the standard FreeBSD block cache is unified at a low level with the VM system, and stores cached data in unmapped — physical — pages, rather than wired virtual pages.) We found that our nightly backup jobs were particularly painful, as the backup system traversed large amounts of otherwise cold metadata in some very large filesystems. We ended up limiting the ARC to 64 GB of memory, which leaves just enough memory for NFS and userland applications.

gmultipath is a bit of a sore point. It does exactly what it is supposed to in the case of an actual path failure — I tested this under load by pulling a SAS cable — but it does totally the wrong thing when the hardware reports a medium error. gmultipath appears to have no way to distinguish these errors (it may be implemented at the wrong layer to be able to do so), so it just continually retries the request, alternating paths, until someone notices that the server is really slow and checks out the console message buffer to see what went wrong. At least it does allow us to label the drive so we know which one is bad, but it would be better if it were built into CAM, which actually has sufficient topology information to do it right and can distinguish between path and medium errors before both get turned into [EIO] heading into GEOM. (This is particularly bad for disks that default to retrying every failed request for 30 seconds or more — and even modern “enterprise” drives seem to come that way.)

The overall performance of the system is quite good, although not yet as good as I would like. For data that fits in the ARC, the tweaked NFS server (with Rick Macklem’s patches) can do close to line rate on the 10-Gbit/s interface, which is better than any of our clients (which are limited to 1 Gbit/s for the most part). Operations that hit disk are still a bit slower than I think they should be, but the bottleneck is clearly on the host side, as the disks are clearly loafing. I’m guessing that there are some kernel bottlenecks yet to be addressed even after fixing the NFS server’s replay cache.

In part 2, I’ll look at how much it costs to build one of these things.

Posted in Computing, FreeBSD, ZFS | Tagged ,

Molly’s Chocolate Pie, my take

There’s been a recipe — or rather, several related recipes — going around for the past month or so. I got the pointer from Diane Duane (Tumblr here) and the original recipe is at Azriona’s LJ where she has been posting a Sherlock Holmes fanfic to which it is related. I’m not really a fan of fanfic (or even profic pastiches) but I am a fan of chocolate, so I had to try it. I learned a bit along the way.

Azriona’s recipe leaves a few important details out; most notably, she doesn’t specify the size (or volume) of the pie plate or give a specific graham-cracker crust recipe. I checked a bunch of cookbooks and settled on the very simple crust in Christopher Kimball’s Yellow Farmhouse Cookbook. It just called for graham crackers, brown sugar, and melted butter, which suited my preferences. Of course, I had to buy graham crackers, and I knew I needed chocolate, milk, and eggs as well, so I went to the local Whole Foods (because I know they have good chocolate).

It was at the supermarket that I hit the first snag. Graham crackers were pretty easy to find, and I bought the same eggs as I always do, but the only milk they sold by the pint (the amount needed for this recipe) was Dean Foods. My only choices in non-evil dairies were limited to quart size packages, which meant the I would be using low-fat milk (so I could drink the rest). Hopefully that would be OK, although some recipes really could use the extra 2% milkfat from whole milk and I would have bought that if I could have gotten a pint from someone other than Dean Foods. I also picked up some chocolate feves from the specialty department — specifically, Valrhona Manjari 64%. (They also had Guanaja 70%, which they don’t usually, and Jivara Lactee 40%, which is too weak for me.)

When I got home and opened the box of graham crackers, I found that they were an odd size, slightly smaller than I expected. (It turns out that I was wrong.) I added an extra sheet of cracker to the crumb crust recipe, which still turned out OK so far as I could tell. I still didn’t know what to bake it in. For lack of a better idea, I first tried a 9-inch tart pan, but when I dumped the crust into the pan, it nearly filled the whole thing, and I was worried that there would not be enough room for the filling even after compacting the crust. So instead I used my Pyrex-brand-but-not-borosilicate-any-more-boo! 9-1/2″ pie plate, which turned out to be the wrong thing in nearly every respect. (As it almost always is. All pie recipes in the U.S. are written for either 8″ or 9″ pie plates, never 9-1/2″, and I have no idea why the people who sell stuff under the Pyrex brand name now seem to think that everyone wants oversized pie plates. I have two of the things and nothing ever comes out right. Must get some proper-sized ones some day.)

The filling comes together surprisingly quickly, as DD notes in her adapted recipe. I had to make one other accommodation, besides using low-fat milk: I don’t drink coffee, don’t ever have any in the house, and quite frankly despise the taste of it. So I just used water where the recipe calls for coffee; in retrospect perhaps I should have made a cocoa slurry.

Blog posts about food are always supposed to include photos, so here’s what the finished pie looks like. You’ll note that I followed DD’s advice and sifted cocoa powder across the top to keep that nasty film from forming (I happened to have some Valrhona cocoa in the cupboard so I used that).

A chocolate pie in a glass pie plate with a graham-cracker crust and cocoa powder on top

Chocolate pie, Mk. I

A few defects are fairly obvious from the start. Since I didn’t know how deep the pie was going to be, I made the sides of the crumb crust much too tall. The filling set up so fast that I was unable to smooth it off properly. (Offset spatula dipped in hot water? Maybe next time.) And you’re supposed to use a white tablecloth or a black velvet mat and direct overhead light when taking pictures of food. (So sue me.)

I took some pictures from other angles, but I forgot to put the camera in aperture-priority mode so I could adjust the depth of field, with some comical results:

Photo of a chocolate pie taken with too shallow a depth of field

Bad bokeh is bad

This was taken with a portrait lens, the widest-aperture lens of any in my camera bag, and it shows — the depth of field is paper-thin, leaving the central defect quite clear and the crust blurry.

I didn’t sift any confectioner’s sugar over the top as DD did in her second recipe, mostly because I’m only one person and I’m not about to eat a whole pie in one sitting. (Well, I probably could, but I’m seeing my doctor later this week and he would raise an eyebrow.) Here’s what it looks like with one slice (somewhat clumsily) removed:

A chocolate pie with one slice remove; the photo is taken close to the horizontal to display the texture of the filling and the profile of the crust.

Pie with slice removed, showing crust profile


You can see that I didn’t do a very good job of evening out the crumb crust. Finally, here’s a single serving:
A slice of chocolate pie on a dessert plate with one bite of pie loaded on a fork.

A slice of chocolate pie, Mk. I

So how was it?

I’m a little bit disappointed in the overall chocolateyness of it. Some of that is undoubtedly due to my having left out the coffee; perhaps I should make a hot-water-and-cocoa-powder slurry to replace it rather than just plain old tap water. I think it might be worth using a higher-test chocolate, perhaps that Guanaja that I passed up at Whole Foods. But as I’ve had two slices now, I’m coming to agree with Diane Duane that the crumb crust competes a bit too much with the filling. There are no spices on the ingredients list of my graham crackers, but there is molasses, and of course the brown sugar called for in the crust also has molasses, so perhaps that part of it. If I try this recipe again, I’ll make at least three changes: use a pastry crust, in a tart pan, with 70% chocolate rather than 64%.

Posted in Food | Tagged ,

Outsourcing the platform

I’ve always been leery of outsourcing any part of my Web content. A few years ago, I even installed a blog platform on my Web server with the intent of using it to publish things like this. Before I had posted a dozen things, it was inundated with comment spam, and then it stopped working entirely when I upgraded the Web server, so I shut it down. Reluctantly, I have come to the conclusion that the network effects for shared blog platform — particularly for dealing with spam and interacting with other social media — make it impractical to run one on my own, particularly when I am not going to be posting things every day.

I do plan to keep up my private server systems for other functions, where I have both simpler requirements and stronger opinions, but being able to outsource spam and security stuff will hopefully make it more likely that I actually will post to this blog in the future. My primary social-media outpost will probably remain Twitter, but there are occasional things that I want to talk about that either won’t reach the desired audience there or are just too long to fit in 140 characters.

I’m still experimenting with themes, so the appearance of this site is likely to change significantly before I’m happy with it.

Posted in Administrivia