High Frequency Trading Webinar
Am hosting a webinar on reducing the latency in high frequency trading environments this coming Wednesday. (which means, if you know me, that I am working furiously on PowerPoint slides, although I have been using Keynote more lately...)) HFT is pretty interesting to me as it is one of the markets I spend a lot of time focusing on at work and with folks from Solace Systems and Intel joining we will be chatting about how to reduce the latency, specifically on the back-end connections between feed handlers and trade logic and order execution systems. These are usually TCP based and there is a lot of room for improvement from the base stacks and generic NICs and messaging systems people may use.
We'll do a follow-up webinar later focusing on the market data feed handlers and scalable multicast as that is very important between the exchange and the feed handlers. If there are any other topics we should think about, let me know...
If you are interested in registering and joining a healthy chat please click here to register. If you can't make the live webcast, no worries - we will be archiving it and enabling the VOD to be watched in arrears.
Homebrew Render-Farm. Frankly, ‘just cause’
Ok, putting this in context- I wanted some new network icons. Somehow all the ones I used in the past were made by an art department I strangely do not have access to anymore, and I really don't want to have to pay an agency to make them for me. I could probably outsource somewhere, but don't want to have to explain what I want, so sometimes it is just easier to do it yourself. (and learn a few new things while you are at it.) Plus it was a good way to spend an evening...
So off I went. Using a 3d design program known as Rhinocerous, which is an insanely cool name (I wish I could name my products things like that... a new switch is the Raven 98000, and over here we have the Magpie 5600 connected to the Corpus Corax 11000. Ok, you get the drift, cooler names should be used in networking products, not just secret-decoder ring needing acronyms and SKUs limited to 17 characters.... (and who picked 17!)) Ok, off soapbox, don't expect this to change anywhere anytime soon.
So we have Rhino running, doing a bit of drawing, getting the shape right and such. Then you couple that with a ray-trace rendering app, in this case VRAY, for Rhino. You get a lot of choices about textures, lighting, etc frankly too many for a neophyte, but these are pretty powerful programs. This is where it gets fun though - there is an option in VRAY for 'distributed rendering'. Nerd alerts went off throughout my office as I madly scrambled around loading a VM with the VRay distributed rendering client onto every machine I could get my hands on. Old Mac laptops, an 8-Core MacPro, a 4-Core MacPro, even a 2-Core MacMini fell victim to loafing this intimidating piece of software. I then realized that I had some network issues to quickly patched through a few more Cat6 ports from the office to the wiring closet, locked the ports down at 1000-FULL and moved my IP Phone to a PoE port while I was at it...
Probably the coolest part was watching the MacPro spawn multiple execution threads which you cold see rendering in real time. Render times were cut down by about 70% from using just one machine.

Here you can see the active raytrace threads modeling different surface segments. I was playing with lines on the front to show hot-aisle cold-aisle airflow. FAIL.
Lessons Learned
It wasn't all roses. A few things I learned and a few things I think the SW developers should focus on in future versions.
1) VRay and Rhinocerous both do not have native Mac versions yet. This is frustrating but you can work around it with VMware Fusion 3. They both worked pretty well through a VM on Windows XP. I am still not up to 7 being happy to have skipped Vista.
2) Since you are running it in a VM note this- On the station with Rhinocerous be sure to tweak your setting to as many CPU cores as you can. I set it up for 4 cores and 3-4Gb of DRAM on the VM. I need more RAM for this machine, it could easily be happy with 16Gb on the VM. I am looking forward to the native version.
3) On the distributed render farms you don't need a whole lot of memory as it seems mostly CPU intensive, at least for the way I was using it. I set mine at 512Mb of Dram and let the other machines continue their happy servitude streaming iTunes, serving photos, keeping my Drobo happy, and generally performing well. Even the Tweetdeck machine. On these and the master you will have to move the Network Interface Card settings from NAT (default) to Bridged. You will probably have to at least go to the console and do a 'ipconfig /release, ipconfig /renew' to ensure the adapter comes up and you are ont eh same LANs egment as teh physical hosts. I was not able to get it working with NAT. Also be sure to let the sockets through any host-stack firewalls, McAfee goofed me for a bit on this.
4) Room for improvement- a native MacOS client for Rhinocerous and for VRay would really help. But the way the developers have you add distributed render nodes is archaic. First on the node themselves it spawns a text window and doesn't provide any diagnostics, just a scrolling log when it gets a job.
b) VRay requires you put the IP address, hard coded, into the master machine of each client. Don't you think this would work much better integrated with Bonjour or something that enables auto-discovery of potential render-nodes.
c) Even smarter would be have the render nodes run as a reduced priority process in the taskbar. Then each machine in a studio could be helping any rendering via processor reclamation when not being dominated by the user.
d) I like the real-time display of the ray tracing going on, but put a report in their showing what system did what percentage of the work. This way I would now which ones to upgrade, find the bottlenecks, etc. A little diags would go a long way here.
e) Also when showing the list of the servers check availability and let me know BEFORE I start a render job.. novel.

Here you can see the finished object, ready for export to a PNG to plague PPT users everywhere... Not sure about my air inlets though...
In the end, it was fun, I will continue to use them, but there is some room to improve that would be really useful for someone like me and I imagine the IT staff at any design studio. Here's some shots of the finished products...
HP takes out 3Com- what is the next consolidation step?

HP-3Com - ushering a wave of tech consolidation?
People have been asking me for a while what the next 'shot' would be in the tech titan border-clash. Cisco entered the server market with UCS, and everyone was wondering what the response would be.
I didn't think 3com would be taken off the market this quickly, I figured everyone would wait a year or so to see if 3Com could be successful in breaking back into the global marketplace, outside of China, with their current and new product lines. HP, taking some risk in that department, made an aggressive move knowing that HP has the global footprint and 3com has strong roots in China that HP can leverage. I have to say it's an impressive bit of M&A.
But what is next?
The real question is how will others respond to this move. What will IBM do? What will Dell do? I have postulated for some time that we are in a phase of consolidation where the tech-titans, in order to have competitive portfolios, will acquire or build these capabilities.
Neither IBM, nor Dell have data center networking presence, both have partnerships with Juniper and with Brocade. A lot of people were betting on an HP-Brocade acquisition, as evidenced by the share price impact on Brocade today. And who can count out Oracle/Sun? They also do not have a networking footprint.
I think the major players will wait for a quarter or so, through the holiday season - evaluating their options and also seeing how HP rolls up the 3Com acquisition. If it creates competitive advantage for HP then IBM, Dell, and Oracle will follow in HP's footsteps. I can't say who will acquire who, but there is only a small universe of potential acquisition targets as well.
Does this spell the end of independent networking?
dg
ISR G2 – what I wish it was…

Cisco ISR G2 - the best a branch can get?
Cisco announced the new ISR line recently, a 3x performance improvement for the high-end moving up to ~150Mbps. But the question I have that has been lingering with me for a while is, "Why not use an x86 processor and a decent hypervisor with that?"
Crazy, I know, right? But with the current set of Intel Nehalem cores I can get several Gb/s of sustained throughput at varying packet sizes. So it's not like I have a data plane performance issue. You can even schedule the cores to provide additional protection for mission-critical control plane processes.
Regrettably, to me, this was not the direction taken with this line. Why do I think it would be cool? For several main reasons:
One thing you could do is run several VMs for integrating neat things like Call Managers and Network Analysis. Who needs a separate co-processor when you can cost-effectively get a CPU with more than enough horsepower and DRAM to run a variety of concurrent branch office workloads.
Control Plane Performance would be through the roof - so you can actually support the market that fiber to the home is creating for Gigabit Ethernet handoffs to the home and business. This is rapidly expanding and becoming a more and more popular handoff in dense urban environments.
Killer integration - run branch office apps, rin your own apps, run the routing protocol stacks, and have enough process and VM separation to guarantee performance and stability. But also you wouldn't have to do special versions of IP-PBX call management for the router- you could run the full-blown image right on it. Want WAN Optimization, load a VM. Same with Network Analysis, etc. At some point performance will peter out, but not too soon.
Sadly for me, this feels like an opportunity lost. But who knows - maybe they will pull a rabbit out of the hat with something like this someday.
dg
America’s Data Center Top 40 for QoS Implementations

a rainbow of fruit flavors for the network
Do you really use network quality of service within the glass house data center? This is a question I have been pondering pretty much all day- I get that on the costly WAN links we almost all use it. I also completely acknowledge that most engineers plan to use some of the QoS features within the data center, but what do we really use? Which features are necessary and which are just... well... extraneous?
Here's my thought- I think there are some QoS 'hit songs' implemented in the data center, and some that just are not that appropriate for this iMix.
Bad Boy Bad Boy, whatcha gonna queue....
No matter how many security products we surround ourselves with we all have the 'time out queue' for traffic that just isn't, well, good. It's for those pesky control plane DoS attacks, the misbehaving broadcast storm NIC, or that one application that just won't shut up.
Voices Carry
For some reason I have the 80's Til Tuesday song in my head by the same name- but most engineers I have talked with seem to set up, by default, a strict priority queue for voice traffic, then never,ever touch it again because it just works.
Marky Mark and the Funky Bunch
We all use markers, especially at the edge where we can have maximum visibility to what comes from a single host, and mark it up appropriately so that we don't have to continuously re-inspect and re-apply policy everywhere. Tag at the edge, queue in the core, shape on the WAN or point of massive congestion/cost.
(Don't) Drop it like it's Hot
No matter what religious SCSI encapsulation you love- FibreChannel, FibreChannel over Ethernet, iSCSI or even file based alternatives such as NFS or CIFS storage just performs better when is is not dropped. As much as I love technologies like Aspera for moving large files from A-to-B, until it is integrated in my host stack (which would monumentally rock for copying my iTunes library to S3 btw, hint...) the world will be dependent on TCP-guaranteed or PFC/BCN guaranteed storage delivery for a while - so we need a "lossless don't-drop-me-please" queue.
Shape ya Tailfeather
As we hit congestion and contention for bandwidth in a well designed data center this happens at the cost-center link, i.e. the edge router: this is where I apply large buffers, shapers, and policers to ensure that traffic can en-queue long enough to get onto the link, we can also adhere to Peter Lothberg's long-time rule of providing enough buffer to handle a round-trip-response of around the world or ~300ms. Shaping the previously marked-up traffic onto the link while trying to get all the right traffic on the link is the job of the core routers, usually not fixed switches.
That's it for now, any songs I should also think about including into the Data Center QoS Playlist?
dg
Things I Would Like to Change Part 1/N

Why someone would do this to a poor Ethernet cable is beyond me...
For those of you who know me you know I am a bit opinionated. I don't profess to be right every time, but I will pretty much always have an opinion and will always argue it to the best of my ability.
I was sitting down today and I kept coming up with a bunch of things I wanted to change. Whether in the networking world, the data center space, about virtualization platforms, or nebulous clouds, or even about tax-structures, legislation, and other things reasonably adjacent to the tech sector.
So in that vein I thought I would put this series together- Things I Would Like to Change. Feel free to suggest your own things that you'd like to change, I really don't mind - may make a post out of them too!
For the first thing I will err on the side of something techie-
EtherChannel. I would love to change the EtherChannel hashing function and do something far more intelligent, automated, and better performing. Most switches today use a simple hash based on L2, L3, or L3 plus L4 port info to determine which link to send a given traffic flow down. This link is chosen based on a hash algorithm and then stays constant unless there is a link failure in which case the traffic is remapped.
Why is this is not good enough? It's actually okay for some traffic. But when host interconnect speeds and uplink speeds are identical we start running into problems where a host can generate a flow that can consume an entire uplink, and then you deal with contention and buffering and all sorts of fun-stuff. Today, we are seeing a convergence of host speeds and uplink speeds at 10Gb, so this problem will rear its ugly head again.
What would be better? Well, several options-
1) Wider hash. A wider hash, say a 32-bit rather than 3-bit means I can have a better granularity of traffic apportionment when I have a non base-2 number of links. It also means that link failure cases get re-apportioned much more fairly.
2) Wider hash with counters and dynamic bucket re-mapping. Came up with this idea about five years ago. Short version is that if any 'bucket' gets used at around the link speed you move other traffic to non-congested links. This allows large flows to go through unhindered and not congest multiple smaller flows. May cause some flapping if timers are tuned too tightly.
3) Out-of-Order bits. Create an OOO-bit that can be set with an ACL. Then for traffic that is not impacted by out of order delivery you can set the OOO bit with an ACL match and spray-and-pray that traffic across multiple links in an EtherChannel. This would work for video flows that are protected by FEC, DNS lookups, and some of the more elegant bulk-file movement protocols. This would not work for market data feeds where receipt is order-dependent and packet order is generally not encoded into the payload.
4) Out-of-Order with timing. Basically you run a TDR/OTDR test and determine the latency of the physical media for each link in an EtherChannel bundle. Then maintain a small re-assembly buffer on the receive side that is allocated based on the maximum delta in latency between the fastest and slowest links between to given nodes. While more complex this allows packet-striping ensuring almost perfect efficiency in link utilization. If the distance of the physical media is too much and the latency sprad too wide then we would be able to identify up front that the links selected were incapable of this type of advanced operation.
How else would you address this EtherChannel load balancing issue?
btw- my first week as a product manager in the switching group this topic came up via a large customer at an executive briefing- 2001. I had it again my last executive briefing at Cisco- 2009.
dg
What should I do next?
In an extremely non-scientific survey form here is a poll where you can tell me what you think I should do next. While I have my own idea or two (or three...los of possibilities), it would be interesting to see what everyone thinks would be good and why...
Update - the gang over at NetworkWorld put a poll up on their site- I think I like their choices better
I need to put this text in a post for Technorati and don't want to make it too obvious - g3rdc2a5uh
dg


