Archive for Technology

Distributed k-Core Algorithm talk at CoNEXT 2012

Our short paper / extended abstract was accepted for the ACM CoNEXT 2012 student workshop  in Nice, France. See also the slides for my very brief “pitch” talk (the even-numbered slides with text were my notes, and were not shown to the audience obviously). Somehow the talk was voted as 1 of the 8 best talks of the student workshop, and I got to give it again to the full conference the next day!

Comments (1)

Dear geek, the BBC is not your friend

The BBC have a policy of tightly controlling access to their “iPlayer” IPTV services. Last I checked, access to the HTML video “iPlayer” front-end is restricted to devices authenticated via SSL, through a vendor private key signed by a BBC certificate authority key. General web browser access to “iPlayer” is via the now obsolescent Flash applet technology, using RTMPE streams.

BBC management appear to be under the impression that Flash RTMPE secures access to the video streams. Or rather, they appear to wish to seem to believe in that impression, because I know for certain their management are aware it does not. There is, of course, simply no way that you can deliver content to a general purpose computing device AND prevent whoever controls the device from easily copying the digital content. The BBC iPlayer Flash streams are easily recorded using non-BBC approved software. Some of which perhaps exists to aid piracy, but some of which exists because the BBC decided to shut-out certain users of iPlayer (e.g. those who prefer not to run insecure, proprietary software from Adobe). If you mention such software exists on BBC forums your comment will be deleted and you will be warned that you are violating the BBC ToS. The BBC takes a firm “head in sand” approach to the futility of trying to secure stream access, at least for the present.

To my thinking, the BBCs’ current digital/ondemand strategy is anti-competitive and hence at odds with its public service remit. To the extent my previous concerns were about the use of Flash, the BBC has answered them by (it seems) moving to HTML video interfaces for 3rd party device access. However, by requiring those devices submit to BBC type approval, and enforcing this through strong cryptographic authentication, the BBC have increased my concerns about competition. The BBC is even in the position where it is a major share-holder in “YouView”, a company that makes a cross-UK-broadcaster IPTV software platform and consumer device. Dragging the BBC even further into anti-competitive and anti-public-interest commercial interests.

The BBC tries to deflect these concerns by trumpeting there are now “an astonishing 650 connected TV devices”. Those 650 devices are from just 21 vendors however, those few blessed by the BBC. One of the criteria for receiving this blessing is that you be large enough to make it worth the BBCs’ while. I know this as the BBC refused to certify my IPTV device, on the grounds the market I would serve was not significant enough (i.e. initially just my family).

Basically, if you’re a net-neutrality geek, or an open-access geek, or a competitive-markets economics geek, then know that the BBC is not the cuddly, friendly public champion you might think it is. Rather, the BBCs’ digital wing has and continues to work hard to ensure the future of IPTV, at least in the UK, is a tightly-controlled arena, controlled by the BBC and a select few large players. The BBC are working hard to ensure you lose the right to record your TV. The BBC are working very hard for a future where, if you want to watch the BBC or any TV, you must choose a locked-down device, controlled by the BBC or organisations it approves of.

If you are such a geek, know that the BBC is not your friend.

Edits: Fixed some prepositions. Removed a redundant sentence. Changed “the” in “the major shareholder” to “a”. Changed “ondemand strategy” to “digital/ondemand strategy”. Added link to the 21 vendors.

Comments (13)

Cerf and Kahn on why you want to keep IP fragmentation

In “A Protocol for Packet Network Intercommunication“, Vint Cerf and Bob Kahn
explain the basic, core design decisions in TCP/IP, which they created. They describe the end-to-end principle. What fascinates me most though is their explanation of why they incorporated fragmentation into IP:

We believe the long range growth and development of internetwork communication would be seriously inhibited by specifying how much larger than the minimum a packet size can be, for the following reasons.

  1. If a maximum permitted packet size is specified then it becomes impossible to completely isolate the internal packet size parameters of one network from the internal packet size parameters of all other networks.
  2. It would be very difficult to increase the maximum permitted packet size in response to new technology (e.g. large memory systems, higher data rate communication facilities, etc.) since this would require the agreement and then implementation by all participating networks.
  3. Associative addressing and packet encryption may require the size of a particular packet to expand during transit for incorporation of new information.

Fragmentation generally is undesirable if it can be avoided, as it has a performance cost. The fragmenting router may do so on a slow-path, for example; and re-assembly at the end-host may introduce delay. As a consequence, end hosts have for a long while generally performed path-MTU-discovery (PMTUD) to discover the right overall MTU to a destination, thus allowing them to generate IP packets of just the right size (if the upper-level protocol doesn’t support some kind of segmentation, like TCP, this may still require it to generate IP fragments) and so set the “Don’t Fragment” bit on all packets and generally avoid intermediary fragmentation.  Unfortunately however PMTUD relies on ICMP messages which are sent out-of-band, and unfortunately as the internet became bigger, more and more less-than-clueful people became involved in the design and administration of the equipment needed to route IP packets. Routers started to either ignore over-size packets and (even more commonly) firewalls started to stupidly filter out nearly all ICMP – including the important “Destination Unreachable: Fragmentation Needed” ICMP message needed for PMTUD. As a consequence, end-host path-MTU discovery can be fragile. When it fails to work, the end-result is a “Path MTU blackhole”: packets get dropped for being too big at a router while the ICMP messages sent back to the host get dropped (usually elsewhere), meaning it never learns to drop its packet sizes. Where with IP fragmentation communication may be slow, but with PMTU blackholing it becomes impossible.

As a consequence of this, some upper-level applications protocols actually implement their own blackhole detection, on top of any lower-layer PMTU/segmentation support. An example being EDNS0, which specifies that EDNS0 implementations must take path-MTU into account (above the transport layer!).

So now the internet is crippled by an effective 1500 MTU. Though our equipment generally is capable of sending much larger datagrams, we have collectively failed to heed Cerf & Kahn’s wise words. The internet can not use the handy tool of encapsulation to encrypt packets, or to reroute them to mobile users. Possibly the worst aspect is that IPv6 completely removed fragmentation support. While there’s a good argument that end-end level packet resizing may be more ideal than intermediary fragmentation, as IPv6 still relies on out-of-band signalling of over-size packets, without addressing that mechanism’s fragility problem, it likely means IPv6 has cast the MTU-mess into stone for the next generation of inter-networking.

Updated: Some clarifications. Added consequence of how PMTU breaks due to ICMP filtering. Added how ULPs now have to work around these transport layer failings. Added why fragmentation was removed from IPv6, and word-smithed the conclusion a bit.

Comments (1)

Why don’t we just reclaim unused IPv4 addresses?

With IANA recently allocating its last 2 /8s from the IPv4 free pool to APNIC, and about to announce automatic allocation of each of the last 5 /8s to the RIRs, the end of IPv4 is truly nigh. The RIRs’ pools will run out over the course of the next year, and that’ll be it – no more v4 addresses left.

However, why can’t we just reclaim unused and reserved addresses? Surely there’s a few legacy /8s assigned to organisations that could be clawed back? Couldn’t we check whether assignments are being used and reclaim ones that aren’t? What about the large, former-class-E space? Couldn’t one or more of those buy a good bit of time for IPv4?

This post examines these questions and largely concludes “no”.  The issues with IPv4 depletion are simply fundamental to its (fixed) size. The IPv4 address space simply is too small to accommodate the growth of the current internet, and reclamation is unlikely to buy any useful time.

NAT could buy some amount of time, but even the NAT space seems like it may be too small for developed-world levels of technology to be deployed globally.

“Unused” address space

If you were to “ping” or otherwise probe all the assigned address space, you might find significant chunks of it are not reachable from the public, global internet. E.g. the US military has large assignments which are not advertised, as do other organisations. So why don’t we reclaim those assignments, let the organisations switch to NAT, and make them available?

Well, just because address-space is not globally reachable does not mean it is not being used for inter-networking. The criteria for IPv4 assignments has always been a need for globally unique address-space, not a need for global reachability. Many organisations have need for private inter-networking with other organisations (financial organisations notably), which is hard to do in a scalable way with NAT. So such use is justified, and we can’t really reclaim it.

Former Class-E Space

What about the 16 /8s that comprise the former Class-E space? Even ignoring 255/8, which likely will never be useable, that’s still a fairly big chunk of address space – more than 5% of the total 32bit address space. Why not re-use that, surely that would make a big difference?

Unfortunately there are major challenges to using this address space. It has long been classed as “Reserved – for future use”, and remains so. A lot of networking code that deals with forwarding or routing packets checks that addresses are not reserved. This means if anyone were assigned addresses from this range they would find they would not be able to communicate with much of the rest of the internet. Even if most of the internet upgraded their software to fix this, the poor user would still find some sites were unreachable. The sites with the old software might not ever notice there was a problem, and might not even have an economic incentive to upgrade (“you want me risk causing problems for my network with an upgrade, to fix problems only you and a few others have?”)!

If we are forced to assign from the former-Class-E space, it will be a sign that the IPv6 rollout is perhaps in serious trouble.

The core of the problem: The size of the IPv4 address space

The nub of the problem is that IPv4 is simply too small.

IPv4 has 32 bit addresses, giving 4.29G addresses, roughly divided into 256 pieces, called /8s, for top-level assignments. Of this space, 18 /8s are reserved in their entirety for special purposes  and will never be useful for general assignment; 1 /8 is reserved  for private-networking; 16 /8s are tied up in the former Class-E space and likely not useful, as above. There are other reservations across the address space, but in smaller quantities that we can ignore in impact here. That still means 221 /8s = 3.71G address – 86% of the total address space – is available for global assignment (and the private 10/8 takes some pressure off that global space). This is equivalent to a 31.788 bit address space.

Now, average daily assignment rates have been running at above 10 /8s per year, for 2010, and approached 15 /8s towards the end. This means any reclamation effort has to recover at least 15 /8s  per year just to break even on 2010’s growth. That’s 5.9% of the total IPv4 address space, or 6.8% of the assignable address space. Is it feasible to be able to reclaim that much address space? Even if there were low-hanging fruit to cover the first year of new demand, what about there-after? Worse, demand for address space has been growing supra-linearly, particularly in Asia and Latin America. So it seems highly unlikely that any reclamation project can buy anything more than a years worth of time (and reclamation itself takes time).

Seen another way, there are approaching 7G people in the world – 6.9G in 2010. Giving 1 address for every 1.86 people (in 2010). Even if we reclaimed old-Class-E, IPv4 still only provides 3.98G = 231.89 addresses, or 1 address for every 1.73 people.

Worse we can not use the address space with perfect efficiency. Because of the need for hierarchical assignment, some space will be wasted – effectively some bits of addresses are lost to overheads such as routing. Empirical data suggests a HD-ratio of 0.86 is the best achievable assignment density. This means that with the 3.98G assignable addresses with class-E reclaim, only 3.98G0.86 = 2(31.89*0.86) = 2(27.43) = 181M will actually be useable as end-host addresses, giving 1 IPv4 address for every 38 people (in 2010)!

Yet, people in the developed world today surely use multiple IP addresses per person. They have personal computers at work and home, eBook readers, mobile phones, etc. All of which depend on numerous servers which require further IP addresses. The people in the developing world surely aspire to similar standards of technology. If we assume the density of IP/person is heavily skewed towards the reported 20% of the world population who manage to earn more than $10/day, then that means that today each IP address is being used by around 7 people. If the skew is heavily biased towards just 10% of the world population, the figure would be around 4 people per address. It’d be interesting to get precise figures for this.

Can NAT save us?

Many organisations are likely to try buy time with NAT. But how much? NAT gives us only 16 extra bits. Assuming they were free bits, that would give us a 2(27.43+16) bit address space = 11,850G addresses. On the face of it seems like this would do for quite a while. It’d allow 1 connection at a time between every host, which is still sufficient to allow all processes to communicate with each other if a higher-level multiplexer protocol is agreed on (it’d be HTTP based, given current trends).

Unfortunately though, this won’t work with TCP, as it is. When TCP closes a connection it will go into a TIME_WAIT state, where it will not allow connections from the same (src,dst) 4-tuple. TCP remains in this state for 1 or 2 minutes on most implementations. Which means you need at least 60 ports if you want to be able to open connections to same host on average 1/s (you probably don’t want to generally, but think of bursts). For every 0.5s, you need 120 ports.

In practical terms, this means probably at least 8 bits of the port-space need to be reserved for TCP on each host. Leaving 8 bits to extend the address space with. This gives 2(27.43+8) = 46G addresses = 6.7 addresses/2010-person (NB: addresses/person instead of the person/address used above) = 0.15 people/address.

This though assumes the HD-ratio assignment-density model applies only over the scale of the IP addresses, and that the borrowed port-space will be allocated with near perfect efficiency. If that were not to be the case, if instead the extra port space also were subject to the HD-ratio model, then the numbers become instead (2(31.89+8))0.86 = 2(31.89+8)*0.86 = 21.3G addresses & 3 addresses/2010-person = 0.32 people/address.

Is that enough? Certainly for a time. It doesn’t seem a comfortable margin though, particularly as it may require some further changes to end-hosts to be truly workable.

Errata

This blog post almost certainly has mistakes, possibly significant. Those noted so far, and any other significant edits:

  • Missing “not”: those assigned old-class-E addresses would not be able to communicate with much of rest of internet
  • Added people/address numbers in last section, for ease of comparison with previous figures.

Comments (7)

Managing the Twitter Flood

Once you’ve been on twitter a while, you’ll be following enough people that you can no longer keep up with the incoming flood of tweets to your main timeline. The good tweets – be they interesting or from people you’re interested in – get lost in the cacophony. Managing the flood is quite a challenge as twitter provides very few tools. So what, if anything, can be done? So far I’ve only found 2 broad strategies:

  1. Use Retweets effectively
  2. Use 3rd party tools

Twitter provides just one built-in filter for general tweets from people you subscribe to – the ability to view “Retweets by others” (RtsBO). To use this effectively  use “native” API retweets, and encourage others to do the same, as it only works for these native retweets. I.e. the ones where the retweeter’s name appears beside the name of the tweeter. NOT the old-style inline retweets, like “RT: @whatever <original tweet>” – they won’t do! Some 3rd party clients unfortunately still do not support native retweets – bug the author or find another client.

If you think a link from a tweet is interesting, retweet that if it’s still easily to hand and you don’t have any interesting or pithy comment to add to it – rather than tweeting it yourself. If the tweet you’ve got is someone doing an old-style retweet of another tweet – see if you can native retweet the original instead.

Native retweets basically provide a way to ‘vote’ on a tweet. Twitter uses this to sort native retweets in the RtsBO according to age and popularity, which can be useful. Native retweets can also be filtered out from your main timeline, while still appearing in RtsBO. Annoyingly there doesn’t appear to be a general way to do this. For everyone you follow you have to visit their profile and click the little retweet symbol to disabled. (If using the new UI you can do this from your own “following” list – click on each name to get the mini-profile). This obviously gets tedious very fast.

There are also some 3rd party tools that can help. I use paper.li and the bit.ly Chrome extension.

Paper.li will go through all the tweets you’ve received each day and collate links it decides are newsworthy and present them in a newspaper style format, with summaries for each. A little button is provided under each for functionality related to the tweet the article was discovered via, such as retweeting. E.g. see the paper created for me each day.

Bit.ly is another useful tool. Browser plugins for it allow you to easily post links with a comment to your twitter, if/when you find something interesting on the web.

What useful tools & strategies have you found?

Comments (1)

Critical comparison of benefits of 64bit to 32bit

Summary

There is perhaps a common impression that 64bit is generally always better than 32bit software on PCs. This tends to be based on:

  1. 64bit obviously provides sufficient virtual address space to allow all RAM in any PC to be easily addressable – 32bit runs into problems with modern PCs.
  2. AMD64 provides twice the number of GPRs over IA32.
  3. AMD64 provides RIP relative addressing mode, to ease implementation of PI code.

This impression is not quite correct though. There are costs to 64bit. Particularly in increased memory usage, which can be of consequence certain environments, such as hosting VMs. More surprisingly, even the common wisdom that 64bit improves performance can have the odd exception.

All this means that there is value in running a 32bit userspace, even on 64bit kernels on 64bit hardware. Only those applications which require 64bit pointers, or have an actual demonstrable performance benefit from 64bit word sizes need be run as 64bit. Doing so can allow significant amounts of memory can be saved (30%, 60%, sometimes more).

Read on for justification and benchmark results..

Comments (2)

BBC Trust On-Demand Syndication Consultation

The below are my answers to the BBC Trusts’ Consultation on On-Demand Syndication, which closes very soon, on the 21st of July. Note that the link to the PDF of the questions seems to be broken, the correct link is here. The full text leading in to the questions can be quite long, in which case I have elided all but the closing part of the questions.

  • Q1. … The BBC Executive would therefore like the Trust’s on-demand Syndication Policy amended to make clear that BBC programmes should always be made available in the context of a BBC package (such as a BBC TV channel (BBC1 for example) or via the BBC iPlayer on a PC, TV or mobile phone) in order to deliver the public purposes more effectively.What are your views on this proposal?

    While I sympathise with the desire for editorial integrity, if the choice is to be a binary choice between:

    a) The ability of device manufacturers (commercial or otherwise; hardware or software) to innovate

    or

    b) The ability of BBC to tightly control devices, and to deny access 3rd party access to content

    then I would suggest the public interest is served far, far more by the former than the latter. I would suggest that the BBC Trust ensure that any guidelines seek to protect 3rd party innovation above all else.

  • Q2. Do you agree with the BBC Executive that the Trust should place more emphasis on value for money in its syndication policy?

    Yes, I do.

    A very important aspect to value for money is 3rd party access. Such access allows multiple 3rd parties to innovate and make new products available to the public, independently of the BBC and *without* the BBC having to spend money.

    As an example, 3rd party iPlayer applications have allowed people to construct ‘media-centres’ for themselves, building on commodity PCs and Linux (XBMC); have allowed a wide-range of Android based phones to access iPlayer (unlike the official BBC Android iPlayer which works only on a limited range of the newest phones); have allowed general Free Software users to access iPlayer. All of these were produced without cost to the BBC. In all but 1 case, they enabled access on platforms which the BBC does not consider ever worth supporting, and in the final case it enabled broader access long before the BBC produced its limited access.

    It is clear therefore that 3rd party innovation allows greater access to BBC content by UK users who generally are entitled to access that content. Thus this furthers the BBCs’ chartered goal of providing as wide access as possible, while doing so at no extra cost to the BBC.

    Unfortunately, to date, the BBC have taken the view that such 3rd party access must be restricted as much as possible. The BBC has regularly taken technical steps to shut off access to any 3rd apps. This is most unfortunate and, given the above, on the face of it somewhat at odds with the BBCs’ remit.

  • Q3. The BBC Executive agree that wide syndication is good for audiences, but they are concerned about the cost of developing different versions of packages like the iPlayer for growing numbers of platforms and devices. To make the iPlayer or other packages of content widely available across a range of platforms in a more cost-effective way they propose to develop ‘standard’ software (notably for the iPlayer) that can work on many devices. Manufacturers can build this into their products when designing them. The BBC would publish details of how decisions on which standard software products to develop would be taken.

    What do you think about this proposal?

    This proposal is ridiculous. Provably so given the history of there already being a number of independent iPlayer applications available for a number of platforms which the BBC does not support, at no cost to the BBC. Rather than the BBC investing so much energy into denying access to these 3rd party applications, the BBC could save its resources AND promote wider platform access simply by desisting from trying to block such applications.The BBC would be much better off publishing the technical details for supported access to its streams, and communicating with 3rd party device makers (commercial or otherwise; software or hardware). This step alone would ensure that iPlayer access was effectively universal (at least, relative to what the BBC has the resources to achieve), at very little cost to the BBC.

  • Q4. Do you think that the BBC should, in principle, be prepared to invest in developing special non-standard technology for other devices at the BBC’s expense?

    False dichotomy.

    The BBC should not at all be aiming to be the developer of end-user devices, beyond whatever prototypes required to verify its delivery infrastructure. It is quite wasteful of resources that the BBC has taken on this role for itself – a role it can never properly fulfill. Instead, the BBC should, as is its long-standing practice in broadcast TV, contribute to industry standards bodies (for the case of internet technologies, these would be the IETF and the W3, nota bene) and publish the technical information needed for devices to access its content using such standards. Thus, the responsibility for the development and marketing of devices ought to be borne by the free market. This arrangement has been proven to be the most economical way of developing products for public use.

    It is disturbing that the BBC wishes, by use of the technology-shift from broadcast to online for TV, to acquire complete control over TV devices. This is quite clearly anti-thetical to a healthy free market. That we must then talk about which platforms the BBC then should or should not support – rather than allowing a free market to decide – is the result of this unhealthy control.

    In short, let the free market invest to develop that ‘non-standard’ technology. Let the BBC publish, not decisions about which platforms it deigns to support, but technical specifications that allow such a free market to operate. This is how the market for TV devices has worked for many a decade – the BBC should not be allowed to change the fundamental economics simply because the technology details change.

  • Q5. Is audience reach the best criterion for setting priorities? If so, what number of potential users should be taken as the threshold?

    Yes it is. The number of users is irrelevant – the free market should be allowed to decide, as elaborated in my answer to Q4.

  • Q6. Should the BBC also publish its criteria for prioritising any non-standard software development?

    Yes, the BBC should endeavour to operate transparently, in all things.

  • Q7. An alternative is that, provided the BBC has the resources available, a manufacturer would have to pay the BBC’s costs for the development of a customised iPlayer that worked for its platform. What do you think about this?

    See my answer to Q4. Obviously, my answer would be ‘no’. Rather, the BBC should make content available in industry standard ways (thus, IETF or W3 for internet technologies). Thus the BBC invests its resources once, so allowing many others to invest theirs (likely without even having to contact the BBC) to develop software for whatever platform they wish. The BBC should seek to support a free market in devices. The public interest is served by the innovation and breadth of products a free market brings – it is not served by having the BBC centrally plan the device market.

  • Q8. A further alternative is that manufacturers should be free to develop their own versions of the iPlayer or other technology (known as ‘self-build’) to show BBC content.The BBC Executive do not think this should be allowed because they do not think the BBC would be able to ensure editorial standards or the high quality that viewers expect from the BBC.

    What do you think about this?

    I agree there should be a free market in devices (commercial or otherwise; software and/or hardware). I believe the BBCs’ role should be to support such a market by investing in industry-standard (i.e. non-discriminatory, royalty-free – W3 and IETF for internet) delivery interfaces.As an example, the BBC at present has an HTML video / CSS based iPlayer – a fully industry standard version, which would work on a wide variety of devices which the BBC does not support at present, such as generic digital TV sets which are internet enabled (but lack Flash, and lack the power for Flash) available from a variety of asian manufacturers. The BBC could trivially make this interface available to all devices, but it chooses to restrict it to Apple iPads, Sony PS3s and a small handful of other types of devices.

    If the Trust were to decide the BBC ought not to have such tight-fisted control of device access, then the BBC could tomorrow start the process of enabling much wider access, using standard delivery technology the BBC already has developed.

Comments (1)

« Newer Posts · Older Posts »
%d bloggers like this: