Skip to content

Skype Ruined My iPhone

02/11/2011

That is, of course, an exaggeration, but there’s a story to be told here.

Just over three years ago, I bought a then-new iPhone 3G (side note: here in Canada, we don’t have 2-year contracts, only 3-year contracts.  It sucks).  Since then, through two OS upgrades (from iOS2 to the current iOS4.2) it has only become better and more useful — and similarly with most apps as they are upgraded.

Then, in late 2008, I travelled to Australia for a few weeks, and wanted to use my iPhone for local calls and tethered 3G data for my laptop and making Skype calls back home.  At the time, it was possible to jailbreak the iPhone 3G, but there was no software unlock available.  My solution was to purchase a $20 shim which you insert with the SIM card to fool the carrier into thinking the phone is unlocked — it worked wonderfully with a 3 Australia prepaid “cap” SIM.

Unfortunately, 3G tethering and Skype-over-3G were both features that would not be “officially” possible for another two years with the release of iOS4.  But it was possible to jailbreak the iPhone and, with some unauthorized and less-than-perfect hacks, add this functionality.  To wit, this kind of example has been one of the strongest arguments for jailbreaking in the history of the iPhone: to add new functionality that is available on other (e.g. Android) devices which Apple has yet to implement.

(In fact, Apple has a solid track record of “implementing” features from the jailbreak community, even going to far as hiring developers from among them.)

My hack worked beautifully.  I have great memories of talking to my family on my iPhone while strolling through the Royal Botanic Gardens in Sydney.  This might seem unremarkable now, but at the time it was widely considered impossible; except, of course, that it wasn’t.

But something bizarre has happened since then: while iOS has steadily improved with each release, the Skype app has gotten steadily worse.  Much, much worse.  I’m not just talking about the UI becoming unnavigable, which it has, but simply the ability to have passable audio quality.  It has diminished to the point that it is effectively unusable for making local calls from my wireless network at home.

But I know for a fact that my phone can run Skype over 3G without problems — I did it almost three years ago on the other side of the world.

And of course, you can’t downgrade.

Because certainly no developer would release a newer version that is worse than the older one.  That would be crazy, right?  Thanks to the App Store model, once you upgrade an app, you’re stuck with it.  So, once again, I had to turn to jailbreaking as a solution to my problem.  You see, there does exist an application — a very contentious application, even among jailbreakers — that allows you to install App Store apps without paying for them.  But in my case, it has a handy side feature: it also provides copies of older versions of apps that are long gone from the App Store.

As it turns out, I’m heading back to Australia again this year, and once again, I’d like to be able to call home using some prepaid 3G data.  But this time, it’ll be different: my iPhone is legitimately carrier-unlocked and I can do native iOS4 tethering.  The only thing that hasn’t changed is that I’ll be running the same old version of  Skype that I was in 2008.  Because guess what: unlike the latest version, it works just fine.

Good thing I’m able to break the rules to do something I did three years ago.

Addendum: I’m not sure who to blame for this frustration.  Certainly Skype for earning their reputation of producing the most terrible software possible, and then somehow being able to move backwards with it.  But I think Apple should consider making old versions of apps available on the App Store for the innumerable edge-cases like this; it’s pretty low-risk on their part.  Of course, the argument could be made that this would decrease incentive to buy new iPhones, but in places like Canada (and thanks to the corrupt CRTC), that’s not an option for many of us.

Unpacking

28/10/2011

Last week I gave a presentation at the Access 2011 conference, and I have to admit: it wasn’t my best.  The conference was excellent, among the best I’ve ever been to, but still I’ve remained frustrated by the feeling that I didn’t convey what I wanted to.  Combined with a number of comments in the past few days about clarification, I feel compelled to sit down and at least clear my head of the details bouncing around.

The title of the talk was “Big Data in Libraries: Has Open Source’s Time Arrived?” (slides) — part of the problem is that this entails a lot of different, albeit related, ideas.  Let’s go through them one at a time, in the order that I gave the slides.

“Big data” is a buzzword.  So is “cloud”.

The point of my introduction was to remind people that a few years ago, “Web 2.0″ was The Next Big Thing™.  Software vendors in particular LOVE to use these terms for marketing, and “big data” is no exception.  The same way that “Web 2.0″ really meant “the social/interactive web”, “big data” really means: data that is large enough to be difficult to work with using familiar tools (for the purposes of my talk, I mention relational databases. More on that shortly). I wanted to make people aware that vendors are going to try to dazzle you with these buzzwords to get your money. Don’t let them.

Big data really isn’t about size.

The next thing I tried to do was give some context of how much space in bytes we’re talking about.  I only used MARC as a touchstone because it’s something every systems librarian is familiar with.  MARC was created in 1966, and the average MARC record today is the same size it was then: about 4KB.  But the cost and physical size of storing library data have both dropped astronomically, thanks to Moore’s Law.

In 1980 (sorry, I don’t have numbers for 1966), storing a million 4KB MARC records (3.8GB) cost about $170,000.  In 2011, it costs about $18 and fits on a thumbnail-sized microSD card.  A large library catalogue of about 2.5 million records (9.5GB) would easily fit on an iPhone.  WorldCat, the world’s largest union catalogue, contains 1.5 billion records: at about 5.6 TB, that’s less than $500 in storage.
(Yeah, yeah, more for infrastructure, redundancy, etc. but you get the point.)

Big data really isn’t about the number of records.

At least, not in libraries.  The kinds of organizations that are dealing with really big data, like Facebook and Google, are processing tens or hundreds of billions of records, most of which are much larger than 4KB — we’re talking petabytes of data, and often in near real-time.  Libraries, by contrast, are dealing with collections of less than 3 or 4 million records at most.  Let me reiterate: our biggest data sets are ten thousand times smaller than the “big data” crowd.  Keep in mind that, unlike Google or Facebook, 99% of the time, library data is read-only.

Big data isn’t really complex.

Again, not for libraries anyway.  But this is where we’ve fooled ourselves into thinking we have a problem.  If it’s not complex, then why is it cumbersome for a library catalogue to manage a million records?  The answer lies in how we represent a “record” in the tools with which we are familiar.

In 1974, Raymond Boyce and Edgar Codd (and apparently Ian Heath in 1971) popularized the relational model of data storage, including the concept of normalization and the SEQUEL (now SQL) language.  For the past 35 years, the relational model has been the pantheon and de facto standard for data management.  But the relational model basically assumes all data are keys (columns) and values (rows) — essentially tabular.  MARC records, devised 10 years earlier and despite all their other shortcomings, have never been tabular data.  They are hierarchical documents.

Similarly, links between library records are not normalized relations, but rather directed graphs.  This means that, in order to represent MARC records, authority records, and their relationships using relational databases, we have to transform them (using tools like object-relational mapping, for example).  I’m willing to wager that your ILS is actually using a relational database under the covers, if not something more proprietary and esoteric.

In short, for the past 35 years, we have been pushing a square peg into a round hole.  To be fair, it has worked remarkably well, but historically, managing tens or hundreds of thousands of non-tabular records has been overwrought and cumbersome — not to mention slow (More on this shortly).

New tools can change how we work.

The alternative to what we’ve been doing to date, which is modifying our data, is modifying our tools.  In the past ten years or so, there has been a rise in so-called NoSQL tools.  These are essentially data management tools that don’t implement the relational model and thus are not subject to the same limitations.  As it turns out, these NoSQL tools are the same ones that Google and Facebook are using, in large part because they are the ones who developed them.

Of course, they are really effective at dealing with large numbers of hierarchical documents and directed graph relations.  In particular, in the past year or two we have tools for object-document mapping (ODM) that replace the traditional object-relational mapping (ORM) approach.

This part of the presentation basically shows a couple of illustrations that show how the ICA-AtoM software scales when managing a set of 3.5 million archival descriptions.  The main point to take away is that, everything else being equal, an ODM scales much better (ie. performs faster) than an ORM: up to 10x faster and in only a tenth of the memory footprint.  This is significant because it means we can potentially improve the ability of our systems to handle substantially more data — several hundred times more — by only changing the way the data is stored.

Use the right tool for the job.

But why does an ODM scale so much better?  My hypothesis (more of an educated guess) is that a NoSQL document database much more closely fits the shape of the hierarchical, graph-like data that libraries are dealing with.  We don’t have to transform our data every time we modify — or more importantly, when we read/search it. (99%, remember?)

The contemporary approach is to use a relational database for storing (writing) data, and then a “shadow” index like Solr for searching (reading) data, because that is what each are good at.  But if you have a single tool that does both well, why wouldn’t you just use that?  Simplicity is good.

Here’s my soapbox moment: I think we have been so blinded by the dogma of SQL and the relational model, probably because it was the most widely-used solution, that we haven’t been actively pursuing other tools.  In fact, I’d contend that we haven’t even been open to considering other tools, even when they are: a) growing in popularity, b) relatively easy to use, and c) free.

It’s this last point that leads me to my conclusion, and, for those who were at Access, picks up on Peter’s more philosophical talk prior to mine.

Big data is no harder than small data; a.k.a. The Cloud Is A Lie

Like the tools we’re already using: Apache, MySQL, Solr, PHP, etc, many of the best NoSQL tools are open source and/or free.  If you’re already developing for MySQL, then MongoDB is no harder to work with.  If you’re already deploying Solr, then if anything, ElasticSearch is much easier.  Yes, it means learning new things, and learning new things is difficult when you’re busy doing old things — but the payoff is tremendous, and the risk is minimal.

As I said from the start, vendors are going to continue to use these new ideas to convince you that they are able to do something you’re not.  They are going to claim that their “horizontally-distributed, web-scale cloud platform” can magically outperform and out-scale your current application.  And I’m sure it can.  But here’s the part they probably don’t want you to know: I’ll wager that, under the covers, they are using the same open source NoSQL tools that you can implement yourself.

This isn’t to say that software-as-a-service (SaaS) isn’t a viable business offering — Artefactual offers application hosting for organizations who want it, and I happen to think we do a good, competitive job at it.  But “the cloud” is not the same as SaaS.  You don’t have to put your library data into that Black Box in the Sky™ just because it runs slowly in MySQL.  You don’t have to buy a top-of-the-line $5000 HyperScale PowerEdge server because Solr is running out of memory.

Think different.  Scale out.  Build your own damn cloud.

Follow

Get every new post delivered to your Inbox.