10 interesting stories served every morning and every evening.
There’s a very specific reputation I want to have on a team: “Nat helps me solve my problems. Nat get things I care about done.”
There’s a very specific reputation I want to have on a team: “Nat helps me solve my problems. Nat get things I care about done.”
I keep a bullet journal. I’m not one of those people you see on Pinterest with the fancy spreads — I mostly just use black ink, the standard setup, and the occasional custom collection.
Every time I join a new team, I go to the next fresh page, and on top of that page I write: “WTF - [Team Name].” Then I make a note every time I run into something that makes me go “wtf,” and a task every time I come up with something I want to change.
For two weeks, that’s all I do. I just write it down. I don’t tell the team everything that I think they’re doing wrong. I don’t show up at retro with all the stuff I think they need to change. I just watch, and listen, and I write down everything that seems deeply weird.
This is a trick I picked up from a team lead a few years ago, who learned it from a previous lead of his in turn. It’s one of my most powerful techniques for making changes on a team, and managing myself while I do it. So I’m going to walk you through how I use that list, and how it helps me to build a reputation as someone who’s really effective at getting stuff done, and avoid being someone who’s complaining all the time.
There’s always stuff that makes me go “wtf” on a new team. The team talks for an hour in retro about a serious problem, and then leaves without making any action items. The tests don’t run locally and no one seems to notice. Big chunks of the build board are always red. Only one person can do some critical, time-sensitive thing. The team is spending a bunch of time on some feature, but when I ask around no one can seems to know why it’s important or how it’ll help a customer.
Once I’ve got a nice big list, I start crossing things off. There are four reasons at this point that I might cross off something I’ve put on that list:
There’s actually a good reason for itThe team is already working on a fixThe team doesn’t care about itIt’s really easy to fix
If the tests don’t run locally, for instance, that might be a known issue that there’s an ongoing effort to address. The team might do all of their work on virtual machines, and have a simple chat command that provisions those machines for them. Or they might have a pretty good continuous integration system and good habits around making small changes, so not being able to run the tests locally isn’t stopping them from deploying multiple times a day.
Sometimes, it’ll turn out that there’s a really simple fix for some of the things I’ve identified. Maybe there’s some documentation I can write, once I know where it is, or maybe there’s an easy change once I find the right scripts. That’s not always immediately obvious when I first see a problem. When I do see an easy fix, though, I’ll just go ahead and make it.
After a few weeks, though, I’ll still have a bunch of weird, unresolved issues on that list. At this point I’ll start talking about it with other people on the team, the team lead, and my manager.
I’ll ask why things on the list are that way, and how they got to be that way. I’m trying to establish credibility as someone who’s genuinely curious and empathetic, who’s patient, and who respects the expertise of my coworkers. That’s the reputation that’s going to let me make changes later.
Generally, I’ll find out that the things that problems I’ve noticed are around for one of a few reasons.
The team has gotten used to itThe problem is relatively new, and the old problem it replaced was much worseThey don’t know how to fix the problemThey’ve tried to fix the problem before and failed
On a lot of teams, when I ask some questions about things that turn out to be in the first few questions, the person I ask will just fix them immediately. Or they’ll help me figure out how to fix them. If it’s a technical problem, that means writing a story or a ticket together, and then we’ll work on it. If it’s more process or social, it means bringing the problem up at retro and talking about it with the whole team.
At this point I’m looking for one or two problems that have been bugging one of my new teammates for a while, and that have relatively simple solutions. I’m looking for something I can put on the retro board and know I won’t be the only person who’s bothered by that problem. Then, during the team conversation about the problem, I’ll identify something that teammate suggests as an action item that we could try immediately. That way the team starts to see me as someone who helps them solve their problems.
The feeling that I want to create, the association I want people to have with me, is, “Oh, Nat joined the team and little things started to get better, almost immediately. It feels like we’re starting to make some progress. And it’s not like they showed up and started telling me what to do, either. They’re really listening to me, they’re helping me explain myself to the rest of the team.”
Pretty soon, I’ll start to get in to the really sticky issues. The problems the team knows about but is afraid of dealing with. The things that aren’t “that bad,” but that no one wants to talk about. Maybe they’re missing the technical skills to deal with the problem. Maybe there’s a knotty people problem at the center of it.
At this point I’m going to be talking to my manager. I’m going to bring them that list I’ve been working on, and I’m going to say something like, “Now that I’ve been on the team for a few weeks, this is what I’m seeing. We’re making progress on some of it, but some of these seem like they’re going to take longer. I wanted to get your thoughts before I try to do anything about them. Is there something I’m missing? Is there a particular area I’d like you to focus?”
The reaction I’m looking for from my manager, at this point, is something like, “Wow. This is really validating. I’ve been concerned about these things but the team doesn’t seem really bothered by them, so I didn’t want to push too hard. I’m glad you’re bringing this up.”
Then we can have a conversation about what their concerns and problems are. I can do some reflective listening to help them organize their thoughts, and I can talk about what I’ve seen work well, or not, in the past. They’ll start to see me as someone with good judgement, and someone they can come to for help solving their harder problems.
There’s a very specific reputation I want to have on a team: “Nat helps solve my problems. Nat get things I care about done.” That’s the reputation that’s going to get me the results I want in next year’s performance review. That’s the reputation that’s going to get me a referral a few years from now.
Before I started keeping this kind of list, I brought up problem I saw immediately, as soon as I noticed it. The reputation I got was, “Nat’s always complaining about things. Nat thinks we’re never doing things right.” People stopped listening to me. I was personally frustrated, and professionally ineffective.
There’s no faster way to totally sink my credibility, as a new team member, by making a huge fuss over something that’s not a problem, or that the team doesn’t see as a problem, or that there’s already an effort to fix, or that there’s a really simple way to fix that I just didn’t see at first. There are always so many problems on a team, so many things that could be better, that I’m only ever going to solve a handful of them. Working on problems in the order I noticed them is rarely the most effective order. So the WTF Notebook gives me a place to park the impulse to fix it now, damn it! until I have more context for deciding what to work on first.
Instead, for two weeks, I just write things down.
Periodic reminder that Code for America is usually hiring, and they pair and write tests. Until the end of this month they have a Software Engineer role up for a team that works San Francisco hours. If you’re looking for a “show up, write code, go home” experience, and want to help Americans access food stamps and other safety net services, this is a team that can deliver it — especially if you have some experience with Rails or Spring.
If, on the other hand, you’re interested in gnarly cloud infrastructure and software problems for the Department of Defense, check out Rise8. If you’ve heard about Kessel Run, or Pivotal’s work with the Air Force generally, Rise8 is where many of those folks ended up. They also practice design thinking, test-driven development, and continuous deployment, but they’re teaching them to folks who have never used these practices before, and pairing with military service people. Their job listings mention experience at Pivotal Labs by name.
If you’ve got an active job search running and you’re struggling to keep track of it all, check out Davis Frank’s guide to Job Search Journaling with Obsidian.
I’ve mentioned Seeing Like a State before but I reread it while we were on the road, and, and, man, seriously, if there’s one book I wish everyone I talk to had read, it’s this one. Nothing explains systems thinking in action better. Nothing has more useful anecdotes for illustrating how large organizations work, and why they work the way they do.
The other book I’ve read by James C. Scott is Against the Grain, and if you’re at all interested in the history of the earliest states and the initial development of human civilization, that book will absolutely blow your mind.
Ed Zitron’s piece recently about How Our Need For Attention Online Drives Us Crazy articulated a bunch of half-formed thoughts I’ve been chewing on and trying to figure out how to write about. It doesn’t mention Slack explicitly, but I’ve seen Slack drive a lot of these same processes at work.
...
Read the original on www.simplermachines.com »
Halo 2 in HD: Pushing the Original Xbox to the Limit
Halo 2 in HD: Pushing the Original Xbox to the Limit
This blog post goes over all of the work I’ve done to add HD resolution support to the Original Xbox version of Halo 2. From patching the game to modifying the hardware of the Xbox console to writing custom tools for performance benchmarking, my goal with this project was to push the limits of both and see how far I could go. I’ve tried to keep this blog post as short as I could and only include the most technically interesting parts but even then it ended up quite long.
A long time friend who goes by the handle “doom” has spent the past few years reverse engineering and researching the hardware and software on the original Xbox. His end goal was to learn more about PC hardware and see how far he could push the console. Some of his work includes swapping out the stock Pentium 3 CPU running at 733Mhz for a variant of the Pentium 3 CPU running at 1.4Ghz using a custom made CPU interposer board, and even being able to overclock it upwards of ~2Ghz.
Doom also wrote custom patches for the Xbox kernel in order to on-the-fly patch timing calculations for games so they ran properly with the faster CPU. Combined with a few other hardware upgrades such as additional RAM and an SSD, doom started to refer to these as “god boxes”. These god boxes were also running a custom kernel (or BIOS image) that doom made to support all of the hardware modifications and push the hardware and software as far as they could go. One of his demos for his work was showing the opening sequence in Half-Life 2 which is notorious for abysmally slow loading times and poor performance on the Xbox, running at a solid 30 FPS and loading in a matter of seconds. But there were still additional benefits to be had. Doom wanted someone to create a proper HD resolution patch for a popular game and really utilize the hardware upgrades he performed.
One night while talking over Discord doom asked if I would be interested in developing an HD patch for Halo 2 and in exchange he would provide me with a god box to develop it on. Halo 2 has a max supported video resolution of 480p and patching in support for 720p (and possibly 1080i) would get a lot of attention to demonstrate the benefits of all this work. We both knew that many of the community “HD” or “720p” game patches were not actually functioning correctly and that patching in HD resolution support for a game was more work than just searching for 640/480 in a disassembler and changing the resolution. These patches require a deep understanding of 3D graphics, DirectX APIs, and a lot of specific knowledge about the game and Xbox console. Having spent years reverse engineering the Xbox and Halo 2’s game engine I had the perfect background to take on the task. As doom would put it “there’s nobody more qualified than you to do it for halo 2 so that’s why I asked”. While it piqued my interest (and I was pretty jealous of these god boxes and all the experience he’d gotten developing them), I made a request/requirement before I would even entertain the idea.
The upgraded CPU has more than double the processing power compared to the stock CPU, however, the GPU was going to take on most of the increased processing load once the video resolution was increased. After all, each additional pixel in the output image would result in more pixel shader calculations which meant more work the GPU would have to do. If he could manage to overclock the GPU I would do it, but at stock clock speeds it wasn’t worth the time it would take to develop this patch just to have it fall over on the GPU. He said he would look into it, and after a few weeks time he came back and said it was done. He managed to overclock the GPU by ~15%, and said he had the “GENESIS-3” console ready for me (a nickname for the 3rd iteration of the “god box” upgrades he’d been working on).
Having spent the past few years reverse engineering and re-implementing the Halo 2 rendering engine I already had a mental list of things I’d need to change to support higher video resolutions. The first thing that needed to be changed was the size of the D3D front and back buffers. The setup for the D3D device has 3 functions that need to be modified in order to use the proper resolution for the current video mode. The first is _rasterizer_detect_video_mode which checks the video mode and sets some global variables for widescreen and progressive video modes. Next is _rasterizer_init_screen_bounds which sets up the screen dimensions used for creating the D3D device, view frustum, and a number of other things. Lastly is rasterizer_device_initialize which is responsible for setting up the D3D device. Below is a shortened version of these 3 functions with the lines of interest highlighted. All of the code shown in this post has been reverse engineered from assembly back into C for ease of understanding.
Halo 2 already supports 480p, or does it…
If you’ve ever looked at the back of the game case for Halo 2 you might have seen it supports 480p. However, looking at line 42 above, the D3DPRESENTFLAG_PROGRESSIVE flag is not being set on the present parameters. And, if we look at the call site for the _rasterizer_init_screen_bounds function we see this:
The scale parameter is always set to 1.0f, which means the screen_bounds are always set to 640×480 regardless of what the video mode is set to on the console. On the Original Xbox 480p is considered to be 720×480, which means that Halo 2 does not render in 480p natively regardless of what the video settings are set to. If you enable 480p mode on the console you’ll get a 480p signal out but that’s because after the game is done drawing to the 640×480 back buffer it’ll get up-scaled to 720×480 by the GPU before being fed to the video encoder. I often get comments saying “that’s not not a 16:9 resolution” or “that’s not real 480p”, but “480p” encapsulates a range of resolutions and aspect ratios and 720×480 is the resolution the Xbox considers to be 480p (so take it up with Microsoft, not me…).
If you’ve ever played Halo 2 in 480p mode with wide screen enabled you may have noticed that things look a little weird. That’s because when wide screen mode is enabled the game will use an anamorphic camera with an aspect ratio of 1.33:1. That means it renders 1.3x the width into the same 640×480 surface as it would when wide screen mode is disabled. Here is a comparison showing the effect anamorphic scaling has on the Zanzibar wheel:
I’m not entirely sure why it does this and my only guess is if you set your TV to stretch mode it would “cancel out” the horizontal “squish” introduced by the anamorphic scaling and look somewhat normal. However, I personally hate it and wanted the cleanest image I could get out of the console so I added an option to disable the anamorphic scaling entirely.
To create the D3D front/back buffers with the right dimensions we’ll need to change g_progressive_scan_enabled to be set when 720p is enabled, set the screen_bounds and frame_bounds variables based on the proper video resolution for the video mode set, and finally set some additional flags on the D3D present parameters depending on if the video mode is progressive or interlaced (1080i mode). The pseudo code for the modifications is shown below with the changed lines highlighted. I ignored the scale variable in _rasterizer_init_screen_bounds because it’s only ever set to 1.0 anyway.
void _rasterizer_detect_video_mode()
DWORD videoStandard = XGetVideoStandard();
DWORD videoFlags = XGetVideoFlags();
if (videoStandard == XC_VIDEO_STANDARD_PAL_I)
g_refresh_rate_hz = (videoFlags & XC_VIDEO_FLAGS_PAL_60Hz) != 0 ? 60 : 50;
g_letterbox_enabled = (videoFlags & XC_VIDEO_FLAGS_LETTERBOX) != 0;
g_widescreen_enabled = (videoFlags & XC_VIDEO_FLAGS_WIDESCREEN) != 0;
g_progressive_scan_enabled = (videoFlags & (XC_VIDEO_FLAGS_HDTV_480p | XC_VIDEO_FLAGS_HDTV_720p)) != 0;
void _rasterizer_init_screen_bounds(int x_off, int y_off, float scale)
// Set default resolution to 640x480.
float width = 640.0f;
float height = 480.0f;
// Adjust resolution based on current video mode set.
DWORD videoFlags = XGetVideoFlags();
if ((videoFlags & XC_VIDEO_FLAGS_HDTV_1080i) != 0)
width = 1920;
height = 1080;
else if ((videoFlags & XC_VIDEO_FLAGS_HDTV_720p) != 0)
width = 1280;
height = 720;
else if ((videoFlags & XC_VIDEO_FLAGS_HDTV_480p) != 0)
width = 720;
rasterizer_globals.screen_bounds.x0 = 0;
rasterizer_globals.screen_bounds.y0 = 0;
rasterizer_globals.screen_bounds.x1 = (int)width;
rasterizer_globals.screen_bounds.y1 = (int)height;
rasterizer_globals.frame_bounds.x0 = x_off;
rasterizer_globals.frame_bounds.y0 = y_off;
rasterizer_globals.frame_bounds.x1 = (int)width - x_off;
rasterizer_globals.frame_bounds.y1 = (int)height - y_off;
bool rasterizer_device_initialize()
D3DPRESENT_PARAMETERS PresentParams = {0};
PresentParams. BackBufferWidth = rasterizer_globals.screen_bounds.x1 - rasterizer_globals.screen_bounds.x0;
PresentParams.BackBufferHeight = rasterizer_globals.screen_bounds.y1 - rasterizer_globals.screen_bounds.y1;
PresentParams.BackBufferFormat = D3DFMT_A8R8G8B8;
PresentParams.EnableAutoDepthStencil = TRUE;
PresentParams.AutoDepthStencilFormat = D3DFMT_D24S8;
PresentParams.Flags = D3DPRESENTFLAG_LOCKABLE_BACKBUFFER;
PresentParams.FullScreen_RefreshRateInHz = g_refresh_rate_hz;
PresentParams.FullScreen_PresentationInterval = D3DPRESENT_INTERVAL_IMMEDIATE;
// Check if wide screen mode is enabled.
if (g_widescreen_enabled != 0)
PresentParams.Flags |= D3DPRESENTFLAG_WIDESCREEN;
// Check if the video mode supports progressive scan.
if (g_progressive_scan_enabled != 0)
PresentParams.Flags |= D3DPRESENTFLAG_PROGRESSIVE;
// Check the resolution width to see if 1080i is enabled.
if (rasterizer_globals.screen_bounds.x1 == 1920)
PresentParams.Flags &= ~D3DPRESENTFLAG_PROGRESSIVE;
PresentParams.Flags |= D3DPRESENTFLAG_INTERLACED;
g_pDirect3D->CreateDevice(0, D3DDEVTYPE_HAL, NULL, D3DCREATE_HARDWARE_VERTEXPROCESSING, &PresentParams, &g_pD3DDevice);
// Check the resolution width to see if 1080i is enabled.
Booting up the game with these changes gives some less than pleasing results. Looking at the main menu the first thing we can see is the blue filter is now gone, and there’s some repeating line pattern strewn across the screen. Looking a bit closer and we can see part of the water geometry is also cut off, suspiciously at where the old 640 width would be compared to the new width of 720.
The Xbox uses a unified memory architecture meaning the CPU and GPU share the same RAM. Unlike a PC there’s no concept of creating a D3D allocation in VRAM and having the GPU manage it. On Xbox the CPU can create an allocation for textures, render targets, vertex buffers, etc, and pass the allocation address directly to the GPU. This gives developers the ability to allocate one buffer and have multiple resource “views” that utilize the memory. Consider the following code which shows how to create a render target letting D3D do all the work and how to create a render target by hand:
// How to create a render target with D3D:
IDirect3DSurface8* pRenderTarget = NULL;
g_pD3DDevice->CreateRenderTarget(/* width */ 1024, /* height */ 1024, /* format */ D3DFMT_A8R8G8B8, NULL, FALSE, &pRenderTarget);
// How to create a render target by hand:
// Allocate and initialize the texture header.
IDirect3DSurface8* pRenderTarget = (IDirect3DSurface8*)malloc(sizeof(IDirect3DSurface8));
DWORD textureSize = XGSetTextureHeader(/* width */ 1024, /* height */ 1024, /* levels */ 0, 0, /* format */ D3DFMT_A8R8G8B8, 0, pRenderTarget, 0, 0);
// Allocate memory for the pixel buffer.
void* pSurfaceBuffer = D3D_AllocContiguousMemory(/* size */ textureSize, /* alignment */ D3DSURFACE_ALIGNMENT);
pRenderTarget->Register(pSurfaceBuffer);
// How to create a render target with D3D:// How to create a render target by hand:
While the latter looks more messy it provides greater control to the developer and is something Halo 2 makes great use of to conserve memory for all the render targets it uses. In total Halo 2 has approximately 25 different render targets it uses but there’s only 4-5 unique buffers allocated for them which saves a lot of memory. So what does this have to do with the issues we saw in the main menu? Well if Halo 2 is creating render targets by hand it’ll need to encode the width and height of the surface into the header of the render target structure. If it’s hard coded to use 640×480 resolution it would cause issues that could result in cut off images or repeating line patterns as the pitch of the surface would not match the pitch of the back buffer. Essentially, there’s two different “views” for the same memory but the views see the memory as being of different widths which results in misplaced pixels when spanning each scan line.
Looking around the D3D/raster initialization code I found a function I called rasterizer_primary_targets_initialize that does exactly this. It takes the back, front, and depth buffers created by D3D and creates additional render targets and texture views from them, using hard coded dimensions of 640×480. Here is the C representation of the disassembly:
bool rasterizer_primary_targets_initialize()
// Get the back buffer, front buffer, and depth buffer surfaces.
global_d3d_device->GetBackBuffer(0, D3DBACKBUFFER_TYPE_MONO, &global_d3d_surface_render_primary[0]);
global_d3d_device->GetBackBuffer(-1, D3DBACKBUFFER_TYPE_MONO, &global_d3d_surface_render_primary[1]);
global_d3d_device->GetDepthStencilSurface(&global_d3d_surface_render_primary_z);
global_d3d_texture_render_primary[0] = (IDirect3DTexture8*)malloc(sizeof(IDirect3DTexture8));
global_d3d_texture_render_primary[1] = (IDirect3DTexture8*)malloc(sizeof(IDirect3DTexture8));
// Setup texture views for back/front buffers.
for (int i = 0; i < 2; i++)
XGSetTextureHeader(640, 480, 1, 0, D3DFMT_LIN_A8R8G8B8, 0,
global_d3d_texture_render_primary[i], global_d3d_surface_render_primary[i]->Data, 0);
// Create a render target surface for the depth buffer that matches the size and format of the back buffer.
global_d3d_surface_z_as_target = (IDirect3DSurface8*)malloc(sizeof(IDirect3DSurface8));
memcpy(global_d3d_surface_z_as_target, global_d3d_surface_render_primary, sizeof(IDirect3DSurface8));
global_d3d_surface_z_as_target->Data = global_d3d_surface_render_primary_z->Data;
// Create two textures for the depth buffer, one in ARGB format and one in ABGR format.
global_d3d_texture_z_as_target[0] = (IDirect3DTexture8*)malloc(sizeof(IDirect3DTexture8));
XGSetTextureHeader(640, 480, 1, 0, D3DFMT_LIN_A8R8G8B8, 0,
global_d3d_texture_z_as_target[0], global_d3d_surface_render_primary_z->Data, 0);
...
Read the original on icode4.coffee »
In October 2022 a bird with the code name B6 set a new world record that few people outside the field of ornithology noticed. Over the course of 11 days, B6, a young Bar-tailed Godwit, flew from its hatching ground in Alaska to its wintering ground in Tasmania, covering 8,425 miles without taking a single break. For comparison, there is only one commercial aircraft that can fly that far nonstop, a Boeing 777 with a 213-foot wingspan and one of the most powerful jet engines in the world. During its journey, B6—an animal that could perch comfortably on your shoulder—did not land, did not eat, did not drink and did not stop flapping, sustaining an average ground speed of 30 miles per hour 24 hours a day as it winged its way to the other end of the world.
Many factors contributed to this astonishing feat of athleticism—muscle power, a high metabolic rate and a physiological tolerance for elevated cortisol levels, among other things. B6’s odyssey is also a triumph of the remarkable mechanical properties of some of the most easily recognized yet enigmatic structures in the biological world: feathers. Feathers kept B6 warm overnight while it flew above the Pacific Ocean. Feathers repelled rain along the way. Feathers formed the flight surfaces of the wings that kept B6 aloft and drove the bird forward for nearly 250 hours without failing.
One might expect that, considering all the time humans have spent admiring, using and studying feathers, we would know all their tricks by now. Yet insights into these marvelous structures continue to emerge. Over the past decade other researchers and I have been taking a fresh look at feathers. Collectively we have made surprising new discoveries about almost every aspect of their biology, from their evolutionary origins to their growth, development and aerodynamics.
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Among the creatures we share the planet with today, only birds have feathers. It makes sense, then, that for centuries scientists considered feathers a unique feature of birds. But starting in the 1990s, a series of bombshell fossil finds established that feathers were widespread among several lineages of the bipedal, carnivorous dinosaurs known as theropods and that birds had inherited these structures from their theropod ancestors. The discovery of feathered nonbird dinosaurs sent researchers scrambling to understand the origin and evolution of feathers, especially their role in the dawn of flight. We now know many dinosaurs had feathers, and protofeathers probably go all the way back to the common ancestor of dinosaurs and their flying reptile cousins, the pterosaurs. Bristles, fuzzy coverings, and other relatively simple featherlike structures probably decorated a wide array of dinosaurs—many more than we have been lucky enough to find preserved as fossils.
The feathers on nonbird dinosaurs were not limited to bristles and fuzz, however. The flat, broad, flight-enabling feathers we see across most of the wings and much of the body surface of living birds are called pennaceous feathers. (Fun fact: these are the feathers people used to make into quills for writing, hence the word “pen.”) It turns out that these feathers, too, appeared before birds. In fact, there is an entire group of dinosaurs comprising birds as well as species such as Velociraptor that takes its name from these very feathers: the pennaraptoran clade. Fossils of early pennaraptorans show that they had feathery coverings that would have looked essentially modern at a quick glance.
The flight capacity of these early pennaraptorans has been hotly contested. Some species were clearly not fliers, given the small size of their “wings” relative to their large bodies. For those animals, pennaceous feathers were probably display pieces. But other pennaraptorans, such as the small, four-winged, forest-dwelling Microraptor, are trickier to interpret. Many of the arguments about whether this creature could fly have centered on something called vane asymmetry. The two flat “blades” of a feather on either side of the main shaft are called vanes. In living birds that fly, the feathers that arise from the hand, known as the primaries, have asymmetrical vanes: the leading vane is narrower than the trailing one. It stood to reason that vane asymmetry was important for flight. And because fossils of Microraptor and its kin show asymmetrical feathers, some researchers argued, these animals must have been able to fly.
Recent work by flight biomechanics experts, including me, has overturned this received wisdom about feather vane asymmetry. Our research shows that feather shape is largely optimized to allow the feather to twist and bend in sophisticated ways that greatly enhance flight performance. Merely being anatomically asymmetrical doesn’t mean much. What matters is that the feather is aerodynamically asymmetrical, and for this to be the case, the vane asymmetry must be at least three to one—that is, the trailing blade needs to be three times wider than the leading one. Below this ratio, the feather twists in a destabilizing rather than stabilizing way during flight.
Early pennaraptorans such as Microraptor didn’t have aerodynamically asymmetrical feathers. But that doesn’t mean they couldn’t fly. The tendency to twist (whether in a stabilizing or a destabilizing fashion) is only relevant if the feathers are separated enough to do so. Keeping feathers in a wing tip tight and overlapping makes them stable, even if they’re not asymmetrical. Asymmetry matters only if the flier spreads its primaries apart in flight like many modern raptors do—a feature called slotting. So Microraptor and its kin could probably use flapping flight, but their wing shape was necessarily different from that of today’s forest-dwelling birds of prey. Specifically, Microraptor had relatively long, narrow wings with tight, unslotted wing tips—anatomically distinct from the wings of Cooper’s Hawks and other modern-day forest hawks but aerodynamically similar.
After considering these findings on vane asymmetry, as well as new data on flight muscles in near-bird dinosaurs, a group of researchers (of which I was the senior biophysicist) led by Michael Pittman of the Chinese University of Hong Kong recently concluded that powered flight—that is, flapping flight rather than gliding flight—probably evolved multiple times in dinosaurs, with just one of those lineages surviving to the present in the form of birds. Yet only in birds did flight feathers attain the degree of shape-shifting we see today. That ability of feathers to twist in just the right way is what enabled slotting, which makes the wing much more efficient at low flight speeds. In essence, a slotted wing behaves as if it is longer and narrower than it is anatomically. Slotting also makes the wing tip very resistant to stall, whereby the airflow separates from the wing, causing a precipitous loss of the lift that keeps the bird in the air. It’s a vital adaptation that underpins an array of aerial acrobatics.
Birds typically need long, narrow wings to soar efficiently—seabirds such as albatrosses and petrels are perfect examples. The advent of wing-tip slots made it possible to soar with broader wings, paving the way for evolution of a diversity of broad-winged soarers, including vultures and hawks. The aerodynamic advantages of slotting also permit the explosive flight performance of sprinters such as grouses, which spend most of their time on the ground but burst into flight for a short distance when startled. And wing-tip slots provide much greater maneuverability for a wide array of birds that live in forests and other cluttered environments, from songbirds to toucans. In fact, the maneuverability made possible by slotted wings might have helped birds compete with pterosaurs and ultimately survive the end-Cretaceous extinction.
The pennaceous feathers we associate with flight aren’t the only type of feather birds possess. Feathers in different regions of the body vary in size, shape and function. You can think of feather form as a spectrum, with the large, relatively stiff flight feathers of the wing and tail at one end and the short, fluffy down feathers that sit close to the body at the other. All of them have a central shaft and softer “barbs” that branch out from the shaft. In flight feathers, the barbs interlock like Velcro teeth to form the smooth, windproof surface of the vanes. In down feathers, the barbs are loosely structured and fluffy to trap heat. Many of the other kinds of feathers combine aspects of these two types. The contour feathers that streamline a bird’s body, for example, have vaned tips like flight feathers and noninterlocking barbs like down ones. The bristle feathers that typically occur on the face and may serve protective and sensory purposes meld the flight feathers’ stiff shafts (called rachises) with the down feathers’ fluffy base.
In recent years researchers have begun to piece together the intricate process by which feathers develop. Like scales, spines and hairs, feathers are skin appendages. Scientists have known for a while that they arise from structures in the skin. But how can an animal produce feathers with different anatomies across its body?
My colleagues and I, led by Cheng-Ming Chuong of the University of Southern California, related the developmental biology of various kinds of pennaceous feathers to their mechanical properties. These feathers begin as a tube that essentially unzips along its length, forming the two vanes. Several genes and molecules interact with one another and with the environment to determine the amount of interlocking in the barbs that make up the vanes, the size and shape of the rachises, and whether the shaft is filled with a “foam” that makes it stiff relative to its weight. We found that different feather types have varying specializations in their overall stiffness, their tendency to twist, and the distribution of the foam in the shaft. These variations depend to some extent on the work of different genes, but most of the differentiation is the result of changes in how the genes are regulated—that is, when they are turned on or off or how active they are during feather development.
Scientists have also shown a recent surge of interest in another category of feathers: display feathers, the showy feathers that help to attract mates. Display feathers may dazzle an observer with their colors (think of a hummingbird’s glittering throat), or they may attain eye-catching proportions, like the feathers that make up a peacock’s crest and train. The conventional wisdom about display feathers holds that they are strictly products of sexual selection, in which mate choice drives the evolution of a trait. These days, however, researchers around the globe, me included, are coming to see display feathers not as exclusively sexually selected traits with no mechanical properties of interest but instead as complex compromises between the pressures of social biology and mechanobiology.
To wit: long display feathers don’t grow just anywhere on the body. They most often occur on the lower back and tail, where they interfere comparatively little with flight performance. Take, for example, the Resplendent Quetzal, a small, colorful bird native to the cloud forests of Mexico and Central America with tail feathers that can grow up to three feet long on males during the breeding season. The tail streamers might not be shaped solely by sexual selection. Evidence indicates that the streamers of some birds produce at least a little aerodynamic force, enough to support much of their added weight. The quetzal’s streamers, for their part, lost their tight interlocking structure, making the vanes a pennaceous-downy hybrid that lets much of the airflow pass through without producing much lift. This arrangement is most likely an adaptation to prevent these feathers from being highly destabilizing. These flashy feathers still increase the cost of flying because they add drag, but that cost may well be less than has been assumed.
The microstructure of display feathers, especially tail streamers, may also be more finely tuned than previously thought. Feather structure provides a balance of stiffness, weight and shape. The feathers must hold their shape well enough, even at extreme lengths, to be effective signals. But they cannot be so stiff as to destabilize the bird during gusty winds or tight maneuvers. There’s a particular range of flexibility that shows off the feather to best effect while minimizing detrimental impacts on flight performance.
One of the aspects of feathers that has long fascinated me is their adaptability. Under varying conditions and evolutionary pressures, they can become specialized for everything from speed and maneuverability to insulation or display. Some of the most fascinating adaptations can be found in owls.
Facial disks are an especially conspicuous feature of owls. These broad, semicircular fans of feathers around the eyes and ears give owls their distinctive appearance. The skull of an owl is actually quite long and narrow, but the feathers enveloping it completely change the contours of the animal. These facial disks are not just for looks. They do a remarkably good job of funneling sound to the owl’s ears. The disks, along with vertically offset ears and exceptionally sensitive middle and inner ear structures, make owls so good at determining the origin of a sound that they can zero in on prey without seeing it at all (they still use vision to make the final capture, though).
I have worked with quite a few owls over the years, particularly individuals being rehabilitated after injury. One such owl couldn’t be released because a car strike had left him completely blind. Yet if someone tossed food onto one of his perches, the gentle thud of it landing was enough for him to pounce on it perfectly. (Readers may also find solace in knowing that he still flew, having memorized his enclosure, and was regularly taken around for walks and neck scratches.)
Still, that exceptional sense of hearing wouldn’t get owls very far without some additional feather adaptations. Other nocturnal creatures can also hear very well, and an owl whose feathers were rustling in flight would be hard-pressed to get close to its vigilant prey. Furthermore, owls might not hear quietly creeping prey if their own feather sounds covered the faint noises of their targets. Owls solved both problems by evolving feather traits that make them inaudible during flight.
It is hard to appreciate just how quiet owls are. Even ultrasensitive microphones, if properly calibrated, aimed exactly right and set to maximum sensitivity in a silent space, can just barely pick up sounds from a flying owl … sometimes. For all practical purposes, owls are silent. They are so eerily noiseless that even if they fly over your head close enough for you to feel their wake, you will still hear absolutely nothing. In a dark space, they are essentially undetectable. All the owl wing sounds you hear in the Harry Potter movies and other films? Those are added in.
Owls achieve this stealth with a few different feather adaptations. To start, their feathers have a “velvety” surface that silences them when they move against one another. More important, the feathers on the leading edge of an owl’s wing have a set of comblike structures, whereas those on the trailing edge have fluffy fringes. The leading-edge comb stirs the air in a specific way called micro vorticity. These tiny, swirling streams of air cause the main flow to stick to the wing. In aerodynamic speak, we say the combs “inject vorticity into the boundary layer.” When this modified flow then passes through the trailing-edge fringes, the net result is a wake that contains no coherent waves of linear pressure and therefore no sound. Put another way, there are no vibrations from the interactions between feathers and the air capable of producing sound.
These specializations have deep roots. Modern-day owls belong to one of two groups: the tytonids (represented by Barn Owls and Bay Owls) and the strigids (all other living owls). Their last common ancestor existed at least 50 million years ago. Because owls in both groups exhibit silent flight, this trait probably dates to their common ancestor. In other words, owls have been surreptitiously coursing the night skies for more than 50 million years.
Not surprisingly, some of the most extreme feather adaptations are found in birds with the most extreme ecological specializations. One way feathers can adapt to a particular way of life is by increasing or decreasing in stiffness. Coincidentally, the stiffest feathers are found in two groups of birds that are otherwise as different as can be: hummingbirds and penguins.
Hummingbirds have ultrastiff feathers as an adaptation to the exceptionally high flapping frequencies and unusual flapping stroke they use to hover in front of flowers while sipping nectar. Unlike most birds, hummingbirds can get a substantial amount of weight support and thrust from their upstroke, not just their downstroke. They do this by rotating their shoulder to flip the wing over completely. The wing needs to be very stiff for this method to work. Reinforcements in the bones of the hummingbird wing provide some of this rigidity; feathers with extremely firm rachises provide the rest.
The flightless penguins, in contrast, have adapted to life in the water and on land. They possess some of the most specialized plumage of all, having converted their entire body covering into a densely packed mosaic of tiny feathers. These feathers are individually quite stiff, and together they form a textured surface over the wings and body that regulates the boundary layer of water against them while the penguin is swimming. In essence, they use a rough coat of feathers to catch and hold a smooth jacket of water. The net effect is a reduction in drag and therefore a lower energetic cost of swimming. The dense feathers also trap just enough air to provide some insulation without making the penguin buoyant, supplementing the fat layer that helps to keep the bird warm.
In the absence of any constraints posed by flight, penguins jettisoned the more typical feather accoutrements of their ancestors in favor of a novel suit of drag-reducing, minimum-buoyancy feathers. These feathers are a key part of the package of adaptations that have made penguins the undisputed diving champions of the avian world, capable of reaching depths of more than 1,600 feet in search of krill, fish, and other aquatic prey.
Feathers are a fantastic model system for understanding how complex structures evolve and how anatomy and behavior influence each other over time. It’s no wonder that the applied science sector has taken note of feathers’ many brilliant features. Already they have led to successful technological innovations. The Velcro-like mechanism that connects the barbs of pennaceous feathers is the basis for an advanced temporary fastening system. The silencing fringes of owl feathers have inspired ventilation-quieting systems. The surface texture and boundary-layer-control principles of penguin feathers have made their way into robotics, mostly in prototypes.
No doubt feathers will give rise to more clever inventions in the future. We have only to let our creativity take flight.
...
Read the original on www.scientificamerican.com »
To see all available qualifiers, see our documentation.
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
We read every piece of feedback, and take your input very seriously.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
...
Read the original on github.com »
OnOn the afternoon of March 11th, 2011, Mitsuyoshi Hirai, the chief engineer of the cable maintenance ship Ocean Link, was sitting in his cabin 20 miles off Japan’s eastern coast, completing the paperwork that comes at the end of every repair. Two weeks earlier, something — you rarely knew what — damaged the 13,000-mile fiber optic cable connecting Kitaibaraki, Japan, and Point Arena, California. Alarms went off; calls were made; and the next day, Hirai was sailing out of the port in Yokohama to fix it.
A camera mounted on the KDDI Ocean Link on March 11th, 2011.
The repair was now nearly done. All that remained was to rebury the cable on the seafloor, which they were doing using a bulldozer-sized remotely operated submersible named Marcas — and, of course, the paperwork.
Suddenly, the ship began to shudder. Hirai got to his feet, found he could barely stand, and staggered out of his cabin, grasping the handrail as he pulled himself up the narrow stairway to the bridge. “Engine trouble?” Hirai asked the captain, who’d already checked and replied that everything seemed normal. The ship continued to tremble. Looking out from the bridge, the sea appeared to be boiling.
A sketch of the Ocean Link in port in Yokohama transitions into a video of the ship. A bird flies overhead and waves lap at its hull.
They turned on the television. An emergency alert showed that an earthquake had struck 130 miles northeast of their location. The shaking finally stopped, and in the silence, Hirai’s mind leapt to what would come next: a tsunami.
Hirai feared these waves more than most people. He had grown up hearing the story of how one afternoon in 1923, his aunt felt the ground shake, swept up her two-year-old brother, and sprinted uphill to the cemetery, narrowly escaping floods and fires that killed over 100,000 people. That child became Hirai’s father, so he owed his existence to his aunt’s quick thinking. Now, he found himself in the same position. He knew tsunamis become dangerous when all the water displaced by the quake reaches shallow water and slows and grows taller. The Ocean Link, floating in less than 500 feet of water, was too shallow for comfort.
Mitsuyoshi Hirai, the former chief engineer of the Ocean Link.
Photo by Go Takayama for The Verge
In the family tree of professions, submarine cable work occupies a lonely branch somewhere between heavy construction and neurosurgery. It’s precision engineering on a shifting sea using heavy metal hooks and high-tension lines that, if they snap, can cut a person in half. In Hirai’s three decades with Kokusai Cable Ship Company (KCS), he had learned that every step must be followed, no matter how chaotic the situation. Above all else, he often said, “you must always be cool.”
Across Ocean Link’s 400-foot deck, the ship’s 50 crew members were emerging from their cabins and workstations, trying to figure out what had just occurred. Over the intercom, the captain announced that there had been an earthquake, a tsunami was coming, and the crew should ready the ship to evacuate to deeper water. The crew fanned out to check fuel tanks and lash down machinery. Inside a darkened, monitor-filled shipping container on the starboard deck, the submersible’s pilot steered Marcas back toward the ship as fast as the bulky robot’s propellers could carry it. Minutes later, the submersible was hoisted aboard and the Ocean Link was underway.
Controls on the bridge of the Ocean Link.
Photo by Go Takayama for The Verge
View from the bridge of the Ocean Link.
Photo by Go Takayama for The Verge
Photo by Go Takayama for The Verge
Photo by Go Takayama for The Verge
The tsunami passed under them imperceptibly on their way out to sea, and when they came to a stop three hours later, the television was showing the first images of destruction. Members of the crew who weren’t working gathered on the bridge to watch the news, which continued to display a tsunami warning, a map of Japan with its eastern seaboard glowing red. They took turns trying to reach loved ones using the ship’s satellite phone, but no calls went through.
As night fell, periodic aftershocks thumped against the hull. Hirai thought about his wife, who was working at a department store in Yokohama near the Ocean Link’s port; his son, a junior in high school at the time; and his parents, whom the family lived with in his hometown of Yokosuka — none of whom he’d been able to reach. Everyone had someone they were worried about.
But Hirai also began to think about the work he knew lay ahead. The Ocean Link was one of a small number of ships that maintain the subsea cables that carry 99 percent of the world’s data. Positioned in strategic locations around the planet, these ships stand ready to sail out and fix faults the moment they are detected, and most of the time, they are more than equal to the task. But earthquakes, Hirai knew from experience, were different. They didn’t just break one cable — they broke many, and badly. If what he feared had happened, Japan risked being cut off from the world in its moment of need.
Sure enough, that night, a call came from headquarters confirming the Ocean Link was safe and directing them to remain at sea until further notice, followed by messages announcing cable failure after cable failure, including the one they had just finished repairing.
Fumihide Kobayashi standing in front of the submersible Marcas.
Photo by Go Takayama for The Verge
Cable industry professionals tend to be pragmatic people, preoccupied with the material realities of working planet-scale construction. But in conversations about landing high-bandwidth cables in digitally neglected regions or putting millions of people back in contact with every fiber strand melted together, they often hint at a sense of larger purpose, an awareness that they are performing a function vital to a world that, if they do their jobs well, will continue to be unaware of their service.
For the Ocean Link crew, this awareness was bound up in a still unfolding national tragedy. They knew that whenever they returned to land, they would have to care for their loved ones quickly, because they would soon be going back out to sea. For how long, no one knew.
TheThe world’s emails, TikToks, classified memos, bank transfers, satellite surveillance, and FaceTime calls travel on cables that are about as thin as a garden hose. There are about 800,000 miles of these skinny tubes crisscrossing the Earth’s oceans, representing nearly 600 different systems, according to the industry tracking organization TeleGeography. The cables are buried near shore, but for the vast majority of their length, they just sit amid the gray ooze and alien creatures of the ocean floor, the hair-thin strands of glass at their center glowing with lasers encoding the world’s data.
If, hypothetically, all these cables were to simultaneously break, modern civilization would cease to function. The financial system would immediately freeze. Currency trading would stop; stock exchanges would close. Banks and governments would be unable to move funds between countries because the Swift and US interbank systems both rely on submarine cables to settle over $10 trillion in transactions each day. In large swaths of the world, people would discover their credit cards no longer worked and ATMs would dispense no cash. As US Federal Reserve staff director Steve Malphrus said at a 2009 cable security conference, “When communications networks go down, the financial services sector does not grind to a halt. It snaps to a halt.”
A map of the world showing the dozens of fibre optic cable systems which stretch across the oceans, connecting continents and island chains. Some of these cables are extremely long. The map animates to show the cables laid down between 1989 and the present, with planned cables up to 2027 also displayed.
Active and planned fiber optic cable systems
Corporations would lose the ability to coordinate overseas manufacturing and logistics. Seemingly local institutions would be paralyzed as outsourced accounting, personnel, and customer service departments went dark. Governments, which rely on the same cables as everyone else for the vast majority of their communications, would be largely cut off from their overseas outposts and each other. Satellites would not be able to pick up even half a percent of the traffic. Contemplating the prospect of a mass cable cut to the UK, then-MP Rishi Sunak concluded, “Short of nuclear or biological warfare, it is difficult to think of a threat that could be more justifiably described as existential.”
Fortunately, there is enough redundancy in the world’s cables to make it nearly impossible for a well-connected country to be cut off, but cable breaks do happen. On average, they happen every other day, about 200 times a year. The reason websites continue to load, bank transfers go through, and civilization persists is because of the thousand or so people living aboard 20-some ships stationed around the world, who race to fix each cable as soon as it breaks.
Photo by Go Takayama for The Verge
Grapnels on the foredeck of the Ocean Link.
Photo by Go Takayama for The Verge
“Mushroom” anchors, used instead of fluked anchors to avoid entangling cables.
Photo by Go Takayama for The Verge
Bow sheave on the Ocean Link, where cables and grapnel ropes pass over into the sea.
View of the Ocean Link bridge from the foredeck.
Photo by Go Takayama for The Verge
The industry responsible for this crucial work traces its origins back far beyond the internet, past even the telephone, to the early days of telegraphy. It’s invisible, underappreciated, analog. Few people set out to join the profession, mostly because few people know it exists.
Hirai’s career path is characteristic in its circuitousness. Growing up in the 1960s in the industrial city of Yokosuka, just down the Miura Peninsula from the Ocean Link’s port in Yokohama, he worked at his parents’ fish market from the age of 12. A teenage love of American rock ‘n’ roll led to a desire to learn English, which led him to take a job at 18 as a switchboard operator at the telecom company KDDI as a means to practice. When he was 26, he transferred to a cable landing station in Okinawa because working on the beach would let him perfect his windsurfing. This was his introduction to cable maintenance and also where he met his wife. Six years later, his English proficiency got him called back to KDDI headquarters to help design Ocean Link for KCS, a KDDI subsidiary. Once it was built, he decided to go to sea with it, eventually becoming the ship’s chief engineer.
Captain Shoichi Suzuki in the bridge of the Ocean Link.
Photo by Go Takayama for The Verge
Others come to the field from merchant navies, marine construction, cable engineering, geology, optics, or other tangentially related disciplines. When Fumihide Kobayashi, the submersible operator — a tall and solidly built man from the mountain region of Nagano — joined KCS at the age of 20, he thought he would be working on ship maintenance, not working aboard a maintenance ship. He had never been on a boat before, but Hirai enticed him to stay with stories of all the whales and other marine creatures he would see on the remote ocean.
Once people are in, they tend to stay. For some, it’s the adventure — repairing cables in the churning currents of the Congo Canyon, enduring hull-denting North Atlantic storms. Others find a sense of purpose in maintaining the infrastructure on which society depends, even if most people’s response when they hear about their job is, But isn’t the internet all satellites by now? The sheer scale of the work can be thrilling, too. People will sometimes note that these are the largest construction projects humanity has ever built or sum up a decades-long resume by saying they’ve laid enough cable to circle the planet six times.
KCS has around 80 employees, many of whom, like Hirai, have worked there for decades. Because the industry is small and careers long, it can seem like everyone knows one another. People often refer to it as a family. Shipboard life lends itself to a strong sense of camaraderie, with periods of collaboration under pressure followed by long stretches — en route to a worksite or waiting for storms to pass — without much to do but hang out. Kobayashi learned to fish off the side of the ship and attempted to improve the repetitive cuisine by serving his crewmates sashimi. (His favorite is squid, but his colleagues would prefer he use the squid to catch mackerel.) Hirai, an enthusiastic athlete, figured out how to string up a net on the Ocean Link’s helideck and play tennis. Other times, he would join the crew for karaoke in the lounge, a wood-paneled room behind an anomalous stained-glass door containing massage chairs, a DVD library, and a bar. A self-described “walking jukebox,” Hirai favored Simon & Garfunkel and Billy Joel, though he said the younger members of the fleet didn’t go in for it as much.
Photo by Go Takayama for The Verge
Photo by Go Takayama for The Verge
Photo by Go Takayama for The Verge
Photo by Go Takayama for The Verge
The world is in the midst of a cable boom, with multiple new transoceanic lines announced every year. But there is growing concern that the industry responsible for maintaining these cables is running perilously lean. There are 77 cable ships in the world, according to data supplied by SubTel Forum, but most are focused on the more profitable work of laying new systems. Only 22 are designated for repair, and it’s an aging and eclectic fleet. Often, maintenance is their second act. Some, like Alcatel’s Ile de Molene, are converted tugs. Others, like Global Marine’s Wave Sentinel, were once ferries. Global Marine recently told Data Centre Dynamics that it’s trying to extend the life of its ships to 40 years, citing a lack of money. One out of 4 repair ships have already passed that milestone. The design life for bulk carriers and oil tankers, by contrast, is 20 years.
“We’re all happy to spend billions to build new cables, but we’re not really thinking about how we’re going to look after them,” said Mike Constable, the former CEO of Huawei Marine Networks, who gave a presentation on the state of the maintenance fleet at an industry event in Singapore last year. “If you talk to the ship operators, they say it’s not sustainable anymore.”
He pointed to a case last year when four of Vietnam’s five subsea cables went down, slowing the internet to a crawl. The cables hadn’t fallen victim to some catastrophic event. It was just the usual entropy of fishing, shipping, and technical failure. But with nearby ships already busy on other repairs, the cables didn’t get fixed for six months. (One promptly broke again.)
But perhaps a greater threat to the industry’s long-term survival is that the people, like the ships, are getting old. In a profession learned almost entirely on the job, people take longer to train than ships to build.
A powerful but delicate 12-foot diameter electro-hydraulic steel drum used for paying out and recovering cables and grapnels during repairs.
A conveyor comprised of 21 pairs of cable-gripping tires used for laying and retrieving cables.
A command center adjoining the bridge where cable tension is monitored and all cable operations are managed.
Three tanks capable of holding a total of 2,800 miles of cable.
A rolling sheave that cables and grapnel ropes are passed over.
Bow and stern thrusters are used to maneuver into wind, waves, and currents to keep the ship stationary during repairs.
Remote submersible capable of operating at up to 8,000ft. Equipped with cameras, sensors, a robotic arm, and a powerful water jet for burying cables.
A powerful but delicate 12-foot diameter electro-hydraulic steel drum used for paying out and recovering cables and grapnels during repairs.
A conveyor comprised of 21 pairs of cable-gripping tires used for laying and retrieving cables.
A command center adjoining the bridge where cable tension is monitored and all cable operations are managed.
Three tanks capable of holding a total of 2,800 miles of cable.
A rolling sheave that cables and grapnel ropes are passed over.
Bow and stern thrusters are used to maneuver into wind, waves, and currents to keep the ship stationary during repairs.
Remote submersible capable of operating at up to 8,000ft. Equipped with cameras, sensors, a robotic arm, and a powerful water jet for burying cables.
“One of the biggest problems we have in this industry is attracting new people to it,” said Constable. He recalled another panel he was on in Singapore meant to introduce university students to the industry. “The audience was probably about 10 university kids and 60 old gray people from the industry just filling out their day,” he said. When he speaks with students looking to get into tech, he tries to convince them that subsea cables are also part — a foundational part — of the tech industry. “They all want to be data scientists and that sort of stuff,” he said. “But for me, I find this industry fascinating. You’re dealing with the most hostile environment on the planet, eight kilometers deep in the oceans, working with some pretty high technology, traveling all over the world. You’re on the forefront of geopolitics, and it’s critical for the whole way the world operates now.”
The lifestyle can be an obstacle. A career in subsea means enduring long stretches far from home, unpredictable schedules, and ironically, very poor internet.
Photo by Go Takayama for The Verge
“Everyone complains about that,” said Kaida Takashi, a senior advisor at KCS, who is trying to get the Ocean Link set up with Starlink. It’s a generational difference, he said. For someone like him, a 62-year-old ham radio enthusiast, Wi-Fi barely fast enough to email is a luxury. Other industry veterans reminisced about the days when they felt fortunate to get faxes on board, or waiting for the mailbag in port, or the novelty of using the very cable they were laying to make calls from the middle of the ocean. But for people who grew up with an expectation of constant connectivity, the disconnection of shipboard life can cause visible discomfort. “It’s a part of them,” one industry veteran marveled of his younger colleagues. “They can’t let it go.”
The industry’s biggest recruiting challenge, however, is the industry’s invisibility. It’s a truism that people don’t think about infrastructure until it breaks, but they tend not to think about the fixing of it, either. In his 2014 essay, “Rethinking Repair,” professor of information science Steven Jackson argued that contemporary thinking about technology romanticizes moments of invention over the ongoing work of maintenance, though it is equally important to the deployment of functional technology in the world. There are few better examples than the subsea cable industry, which, for over a century, has been so effective at quickly fixing faults that the public has rarely had a chance to notice. Or as one industry veteran put it, “We are one of the best-kept secrets in the world, because things just work.”
TheThe Ocean Link spent two nights at sea before receiving orders to return. As they neared land, Hirai saw debris from the tsunami’s backwash floating in the water: fishing nets, tires, the roofs of buildings, the bloated body of what he guessed was a cow.
The earthquake measured 9.1 on the Richter scale, the fourth largest ever recorded and the largest to ever hit Japan. But it was the series of tsunami waves that arrived half an hour later that dealt the most destruction, surging miles inland and sweeping buildings, cars, and thousands of people out to sea. The death toll would eventually climb to nearly 20,000, and the day would become a national tragedy referred to simply as “3/11.”
The full extent of the devastation was still becoming clear when the Ocean Link returned, but the disaster had already entered a new phase. One hundred and sixty miles north of Tokyo, a 50-foot tsunami wave overtopped a seawall protecting the Fukushima power plant, swamping the emergency generators that were cooling the reactors through its automatic post-quake shutdown and precipitating a nuclear meltdown.
Hirai’s wife and son had made it back home to their house in Yokosuka, where they lived with Hirai’s parents. Kobayashi’s family, too, was safe. Some crew lost loved ones; others sent family to stay with relatives in the south out of fear of radiation. They all knew that they had only a few days before they would be sent back out to sea.
The Ocean Link in a storm in the North Pacific. Sometimes, Hirai said, storms are so bad you can’t work or sleep. All you can do is hold onto your bunk and laugh.
The Ocean Link in a storm in the North Pacific. The ship pitches wildly in the heavy swell, the waves crashing over its bow.
The disaster had severed phone lines and wrecked cell towers, causing phone service to cut out almost immediately after the earthquake struck. Instead, people turned to email, Skype, and other online services that were mostly able to route around damage to the network. There was a sense, according to one engineer’s postmortem presentation, that the internet was the only media that survived.
But its survival was more tenuous than the public knew. While the cables connecting Japan to the rest of the world survived the initial destruction, later that night, as millions of people tried to find their way home with trains stopped and power intermittent, engineers in Tokyo network operation centers watched as one cable after another failed. By the next morning, seven of Japan’s 12 transpacific cables were severed. Engineers working through the night and following days managed to shift traffic to those that remained, but the new routes were near their maximum capacity. The head of telecom company NTT’s operation center at the time estimated that if another cable failed, it would have lost all traffic to the US. With servers for most major internet companies located there, Japan would have effectively lost the internet.
Normally, the sequence of repairs would be determined by whichever cable owner reported the fault first, but given the extraordinary circumstances, the usually self-interested cable owners agreed to defer to KCS. The priority was to repair a cable — any cable — as fast as possible.
It was impossible to know the state of the cables on the ocean floor, so like forensic investigators, Hirai and the other engineers had to work with the sparse facts available. By having the cable landing stations on either side of the ocean beam light down their end of the line and time the reflections back, they were able to locate the faults nearest to them within a few meters. Most of the faults lay in deep water, in the canyons channeling into the Japan Trench. This, plus the timing of the faults, indicated it wasn’t the quake that broke them but the underwater avalanches it triggered.
“It hasn’t changed in 150 years… The Victorians did it that way and we’re doing it the same way.”
Submarine landslides are awesome events whose existence was only discovered in the 1950s, when scientists analyzed the timing of 12 cable faults that severed communication between Europe and North America two decades earlier. Before then, according to oceanographer Mike Clare, “It was assumed that deep water was boring and nothing happens down there.” In fact, the ocean floor is riven with mountains and canyons that experience avalanches that dwarf anything found on land, cascades of sediment and debris racing for hundreds of miles. Hirai had dealt with them in Taiwan in 2006, one of the most notorious events in the annals of cable repair.
On December 26th, an earthquake dislodged sediment on Taiwan’s southern coast and sent it rushing 160 miles into the Luzon Strait, one of several global cable chokepoints. Nine cables were severed and Taiwan was knocked almost entirely offline. Banking, airlines, and communications were disrupted throughout the region. Trading of the Korean won was halted. The cables, buried under mountains of debris, were nearly impossible to find. It took 11 ships, including the Ocean Link, nearly two months to finish repairs.
Often in a multi-cable disaster like the Taiwan earthquake, every ship in the region comes to assist. But with Japan, there was an unprecedented complication: the majority of the faults were located offshore of the ongoing nuclear meltdown at Fukushima. Ship operators deemed assistance too risky, which meant that, for the time being, the Ocean Link was on its own.
The crew felt not only duty bound to work but uniquely capable of doing so. They had dealt with radiation before, though not at this scale. In 1993, shortly before the Ocean Link was to lay a cable linking Japan, Korea, and Russia, they learned the Soviets had dumped radioactive waste in the ocean along the planned route. With some trepidation, KCS proceeded with the job. They bought Geiger counters and protective gear, flew in nurses from the US with chemical weapons training, and scanned the water for radiation as they went. When none was detected, they put the gear in storage.
Now, as they readied the ship for departure, an employee was dispatched to the depot to find the old radiation gear. A local university donated a few more sensors and trained the crew on how to use them.
They decided to begin with the same cable they had just finished repairing when the earthquake struck. On a drizzling afternoon eight days after returning to port, with smoke still rising from the Fukushima power plant, the Ocean Link set back out to sea.
Photo by Go Takayama for The Verge
ToTo the extent he is remembered, Cyrus Field is known to history as the person responsible for running a telegraph cable across the Atlantic Ocean, but he also conducted what at the time was considered an equally great technical feat: the first deep-sea cable repair.
Field, a 35-year-old self-made paper tycoon, had no experience in telegraphy — which helps explain why, in 1854, he embarked on such a quixotic mission. Though small bodies of water like the English Channel had been bridged by telegraph, failure was routine and costly. Cables shorted out, snapped under tension, snagged on rocks, were sliced by anchors, twisted by currents, tangled around whales, attacked by swordfish, and devoured by a “miserable little mollusc” called the Teredo worm with an appetite for jute insulation.
Field fared no better. Twelve years after he began, he had endured severed cables, near sinkings, and had one “success”: a cable laid in 1858 that prompted celebrations so enthusiastic that revelers set fire to New York City Hall. The cable failed weeks later.
The SS Great Eastern attempting to recover the broken transatlantic telegraph cable in 1865.
Sailors coiling the transatlantic telegraph cable aboard the SS Great Eastern in 1865.
Pieces of the first transatlantic telegraph cables and a model of the grapnel used to recover them.
Cyrus West Field, who financed and organized the laying of the first transatlantic telegraph cables.
Field tried again seven years later only for the cable to snap halfway across the Atlantic. The next year, he set out yet again, promising not only to finally lay a working transatlantic cable but to recover the broken cable and finish that one, too.
By that time, a crude method had been developed for fixing cables in shallow water. A ship would drag a hooked grapnel anchor across the seafloor, until, like the tremor of a fishing line, increasing tension showed they’d caught the cable, which they would then haul on board to fix. Field’s plan was basically this but bigger: bigger hooks, stronger rope, more powerful winding engine, all aboard the largest ship afloat, a passenger liner called the SS Great Eastern that had been retrofitted for the mission. William Thomson, the project’s scientific adviser and the future Lord Kelvin, did the math and deemed it feasible.
...
Read the original on www.theverge.com »
The following exchange can be found in the second volume of Letters of Note. Reprinted by kind permission of the Reagan Library. The picture of Ronald Reagan peering into a child’s messy bedroom—which I’m now realising, as I type, is mildly sinister?—is in fact, believe it or not, two photos mashed together. Both are from Getty who I’m sure won’t mind.
As one would expect, Ronald Reagan was the recipient of thousands of letters each month during his presidency; a mailbag so voluminous, in fact, that a gang of patient volunteers were tasked with opening them all on his behalf and passing him approximately 30 each week to read and respond to. Letters arrived from all over the world, written by a diverse group of people: men, women, fans, critics, average Joes, celebrities, world leaders, and, marking a moment in history, a letter from a 13-year-old boy from South Carolina named Andy Smith, written exactly 40 years ago on 18 April 1984.
My name is Andy Smith. I am a seventh grade student at Irmo Middle School, in Irmo, South Carolina. Today my mother declared my bedroom a disaster area. I would like to request federal funds to hire a crew to clean up my room. I am prepared to provide the initial funds if you will provide matching funds for this project.I know you will be fair when you consider my request. I will be awaiting your reply.
I’m sorry to be so late in answering your letter but, as you know, I’ve been in China and found your letter here upon my return. Your application for disaster relief has been duly noted but I must point out one technical problem: the authority declaring the disaster is supposed to make the request. In this case, your mother.However, setting that aside, I’ll have to point out the larger problem of available funds. This has been a year of disasters: 539 hurricanes as of May 4th and several more since, numerous floods, forest fires, drought in Texas and a number of earthquakes. What I’m getting at is that funds are dangerously low.May I make a suggestion? This Administration, believing that government has done many things that could better be done by volunteers at the local level, has sponsored a Private Sector Initiative Program, calling upon people to practice voluntarism in the solving of a number of local problems.Your situation appears to be a natural. I’m sure your mother was fully justified in proclaiming your room a disaster. Therefore, you are in an excellent position to launch another volunteer program to go along with the more than 3000 already underway in our nation. Congratulations.Give my best regards to your mother.
...
Read the original on news.lettersofnote.com »
In my last post, I discussed some work I had done building Nero, the assistant of the future that I’ve always wanted. I ended up creating an end-to-end example which used Nx, OpenAI APIs, and ElevenLabs to create an in-browser home automation assistant. For a first product, it’s decent. Nero is a neat little party trick that I can use to impress my non-tech friends. I am, however, not in this business to impress my friends. I want Nero to actually help me and actually feel like an assistant. My previous version is not that.
One missing piece is the ability to converse naturally without browser interaction. The first implementation of Nero’s “conversational” abilities relied on user interaction with the screen every time we wanted to initiate a response or action. Nero also did not retain any conversational history. In short, Nero was not a great conversational assistant. It was one of the things I wanted to fix; however, I was motivated to do it sooner rather than later after watching an impressive demo from Retell.
The Retell demo implements a conversational agent backed by their WebSocket API in a browser. The demonstration has:
* Impressive filtering (e.g. snapping and other non-voice activity doesn’t seem to throw off the agent)
Their documentation suggests they also have support for backchanneling and intelligent end of turn detection—two things that are essential to natural conversational feel but which are very difficult to express programmatically.
I had previously convinced myself that I could implement a passable conversational agent experience in a short amount of time. So that is what I set out to do.
The first thing that needed to change about Nero’s design was the speech to text pipeline. My original demonstration relied on an example from Bumblebee which implemented a speech to text pipeline using Whisper. The pipeline uses mouse events in a Phoenix LiveView Hook to start and stop recordings before sending them to the server to initiate transcription. If you’re not familiar, Phoenix LiveView is a server-side rendering framework built on top of Elixir. LiveView has support for client-side JavaScript hooks which support bidirectional communication between client and server.
The original speech to text implementation used a hook with an event listener attached to mousedown and mouseup on a button to start and stop recording. After recording stops, the hook decodes the recorded buffer into a PCM buffer, converts the endianness, and then pushes the buffer to the server with an upload. The original hook implements most of the functionality we want; however, we need to make some minor tweaks. Rather than trigger recordings to stop and start on mouse events, we want to trigger recordings to start and stop exactly when a person starts and stops speaking. Simple, right?
My first idea in implementing what I called “always on recording” was to monitor the microphone’s volume, and trigger a recording when the volume reached a certain threshold. The recording would stop when the volume dipped below that threshold. At this point, I learned about getUserMedia. getUserMedia prompts the user for permission to access media devices such as a microphone and/or webcam, and then produces a MediaStream. A MediaStream is a stream of media content containing information about audio and video tracks in the stream. We can use data from the MediaStream to determine speaker activity and thus trigger recordings.
To determine the volume for a given sample, we can use an AnalyserNode. Per the documentation AnyalyserNode is designed for processing generated audio data for visualization purposes, but we can use it to determine spikes in audio:
This uses an analyser and repeatedly checks if the volume of the microphone at a given frame exceeds the given VOLUME_THRESHOLD. If it does, it checks to see if we are recording and if not, starts the recording.
After testing a bit, I realized this implementation sucked. Of the many issues with this approach, the biggest is that there are many natural dips in a person’s volume. Checking a single frame doesn’t account for these natural dips. To fix this, I thought it would be a good idea to introduce a timeout which only stopped recording after the volume was below a threshold for a certain amount of time:
This actually ended up working decent, but required tuning hyperparameters for both VOLUME_THRESHOLD and SILENCE_TIMEOUT. The challenge here is that higher SILENCE_TIMEOUT introduces additionally latency in transition time between a speaker and Nero; however, lower timeouts might be too sensitive to speakers with slower and quieter speaking rhythms. Additionally, a static VOLUME_THRESHOLD does not account for ambient noise. Now, despite these shortcomings, I found I was able to passably detect a single speaker in a quiet room.
After hooking this up to my existing LiveView and trying some end-to-end conversations, I realized something was significantly off. The transcriptions I was getting were off. I soon realized that they were always off at the beginning of a transcription. Shorter audio sequences were especially affected. It turns out that the detection algorithm always resulted in some amount of truncation at the beginning of an audio clip. When a speaker starts talking, their volume ramps up — it’s not an instantaneous spike. To account for this, I introduced a pre-recording buffer which always tracked the previous 150ms of audio. After recording started, I would stop the pre-recording buffer and start the actual recording, and then eventually splice these 2 together to send to the server for transcription.
Overall, this actually worked okay. While there are some obvious failure modes, it worked well enough to get a passable demonstration. If you can’t tell by now, I am not an audio engineer. I learned later that this is a very naive attempt at voice activity detection. Later on in this post, I’ll run through some of the improvements I made based on my research into the field of VAD.
The demonstration I built for Nero in my first post already contained the scaffolding for an end-to-end transcription -> response -> speech pipeline. I only needed to make some slight modifications to get the phone call demo to work. The end-to-end the pipeline looks like this:
When our algorithm detects that speech has stopped, it invokes the stopRecording method. stopRecording takes the recorded audio, does some client-side pre-processing, and uploads it to the server. The server consumes the uploaded entry as a part of LiveView’s normal uploads lifecycle and then invokes an async task to start transcription:
Note that because we did most of the pre-processing client-side, we can just consume the audio binary as an Nx. Tensor, without any additional work. The SpeechToText module implements transcription using Nx.Serving:
Nx. Serving is an abstraction in the Elixir Nx ecosystem for serving machine learning models directly in an Elixir application. It implements dynamic batching, encapsulates pre-processing, inference, and post-processing, supports distribution and load-balancing between multiple GPUs natively, and in general is an extremely easy way to serve machine learning models.
After transcription completes, we get an async result we can handle to initiate a response:
Here Nero. Agent.respond/1 returns an Elixir Stream of text. For my original demonstration I just used the Elixir OpenAI library to produce a stream from a GPT-3.5 response:
The response stream is consumed by speak/2. speak/2 implements the text to speech pipeline:
Where Nero. TextToSpeech.stream/1 uses the ElevenLabs WebSocket API to stream text in and speech out. You can read a bit more about the implementation in my previous post.
Nero. TextToSpeech.stream/1 returns the consumed response as text so we can append that to the chat history after the :speak task finishes:
This is basically all of the scaffolding needed for an end-to-end demo, but I wanted to add a few more features. First, I wanted to support “intelligent” hang-ups. Basically, I wanted to be able to detect when a conversation was finished, and stop the recording. To do that, I used Instructor:
Please ignore my wonderfully engineered prompt. This uses GPT-3.5 to determine whether or not a given conversation has ended. After every one of Nero’s turns, we check the transcript to possibly end the call:
This pushes a hang_up event to the socket:
Which stops the recording, and then pushes an event to toggle_conversation back to the server. toggle_conversation implements the start/stop logic from the server:
Finally, I wanted to implement information extraction from the transcript. Again, I used instructor and defined an extraction schema:
And used GPT-3.5 with a rough prompt to get the necessary information from the transcript:
And then anytime a conversation ends, we attempt to retrieve appointment information:
Now this is essentially the exact implementation that produced this demonstration. End-to-end this amounted to a couple of hours of work; however, I already had most of the basic scaffold implemented from my previous work on Nero. In my biased opinion, I think my demo is pretty good, but as others have pointed out Retell’s demo kicks my ass in:
And so, I set out to improve my implementation — starting with latency.
Human conversations have extremely tight “time-to-turn.” In-person conversations are especially rapid because we rely on visual as well as audio signals to determine when it’s our time to participate in a conversation. The “average” time-to-turn in a conversation can be as quick as 200ms. That means for a conversational agent to feel realistic, it needs an extremely quick turn around time for “time to first spoken word.”
After posting my original demonstration, I already knew there were some very easy optimizations I could make, so I set out to improve the average latency of my implementation as much as possible in a short amount of time. First, I needed at least some method for determining whether an optimization worked. My rudimentary approach was to use JavaScript Performance Timers and logging. Basically, I computed a startTime from the exact moment an audio recording stopped and an endTime from the exact moment an audio output started, and then I logged that time to the console.
This is a very unscientific way of doing business. In the future, I’d like to implement a much more involved profiling and benchmarking methodology. For this process though, it worked well enough.
Next, I considered all of the areas that could introduce latency into the pipeline. From the moment a recording stops, these are all of the steps we take:
Pre-process recording by converting to PCM buffer, and then converting endianness to match server (if necessary)
Perform speech to text on buffer to produce text
That’s a lot of steps that can introduce latency, including potentially 3 (in our case 2 because we own the STT pipeline) network calls.
Next, I wanted to esablish a “baseline” of performance. To demonstrate this iterative process, I did a baseline example on my M3 Mac CPU. Note that this is going to be slow relative to my previous demo because the previous demo runs on a GPU. The baseline performance I got from the original demo running on my mac was 4537 ms. 4.5 seconds turn around time. Yikes. Lots of work to do.
To start, I knew that the SILENCE_TIMEOUT used to wait for speech to end was rather long. For the original demo, I used 1000 ms, which basically means a speaker has to stop talking for a full second before we’ll even start the long response process. After some trial and error, I figured 500 ms was a “passable” hyperparameter. After adjusting this down, the latency change was almost exactly correlated to the dip: 4079 ms.
I had a hunch that my text to speech pipeline was not efficient. Fortunately, ElevenLabs gives us a nice Latency Guide. The first suggestion is to use their turbo model by specifying eleven_turbo_v2. I set that and we got a slight performance boost: 4014 ms.
Next, they suggest adding optimize_streaming_latency. I set the value to 3 and we get: 3791 ms. Their next suggestion is to use a pre-made voice. I actually didn’t realize until much later that I was not using a pre-made voice so I don’t have a comparison for how that change impacted latency.
Now it says to limit closing WebSocket connections. my current implementation opens a connection everytime it speaks — which is not good. Basically every “turn” has to establish a new websocket connection. Additionally, ElevenLabs has a timeout of 20s from when you connect. So you need to send a message at least every 20s. I considered 2 options at this point:
Open a global WebSocket connection, or maybe even a pool of connections, and try to keep the connection alive. But that seems really wasteful, and I don’t think is the intended use of their API
Open a WebSocket connection when convo starts. We don’t have to worry about 20s pauses
I decided to go with option 2, but I still think there are some drawbacks and considerations for a production system. The implementation I used opens a websocket connection on first “speak” and stores the connection PID as an assign in the LiveView socket. If you have a system with potentially many concurrent users speaking, you run the risk of creating a potentially unbounded number of connections. A more robust solution would probably use connection pools; however, I’m not really worried about traffic or scaling here.
While adding this optimization, I struggled a bit because ElevenLabs would send the first frame back, then cut off. Then I realized that it was waiting to generate becuase it thought I was going to send more frames. So I needed to “flush” the generation after I finished sending my tokens. This also seemed to fix unnatural audio problems I was having. After applying this optimization, our time to first spoken word was slightly lower in the 3700 ms range.
After perusing their docs a bit more, I learned that ElevenLabs will send PCM buffers instead of MP3. Web Browser’s have to decode MP3 to PCM, which potentially introduces some overhead. One drawback is that you need to be on the independent creator tier to receive PCM instead of MP3. Now, if you’re wondering if I spent $99 to save some milliseconds for a meaningless demo, the answer is absolutely yes I did.
At this point, I believe I’ve exhausted a lot of the “easy” optimizations for TTS latency. One thing that does bother me about the ElevenLabs Websocket API is that there’s no way to receive binary payloads instead of JSON payloads. This is probably because they send alignment data, but I’m not using the alignment data here. When handling an incoming frame from their API we have to first decode the JSON, and then decode the Base64 encoded audio buffer. I’m not sure what the latency impact is, but I’m sure we could shave some time by avoiding both of these conversions. I also think the Base64 representation results in slightly larger buffers which could impact network latency.
The next area I looked to improve was the speech-to-text pipeline. I am using Nx. Serving specifically for Speech-to-Text. The benefit of this approach is that we can avoid an additional network call just for transcription. Of course, that assumes our transcription pipeline can run fast enough on our own hardware. XLA is notoriously slow on CPUs (it’s getting better). The first “optimization” I did was to switch to my GPU: 2050 ms
And that right there is a bitter lesson, because it’s the largest performance boost we’re going to get.
Next, I realized the model isn’t using F16, which can introduce some solid speed-ups: 1800 ms. Now, there are probably some additional optimizations we could add to Nx and EXLA specifically. For example, we don’t have a flash attention implementation. Of course, XLA does a great job of applying similar optimizations as a baseline, so I’m not sure how much it would help. There’s also fast JAX implementations of Whisper that claim up to 70x speed ups. One issue with a lof of these claimed speed-ups; however, is that they are almost always for long audio sequences. GPUs and TPUs do well with large batch sizes and sequence lengths, but not for batch size 1 and short sequence lengths like we care about in this implementation. One day I may go down the performance hole of fast batch size 1 transcription, but today is not that day.
At this point, I had moved on to improving some of the failure modes of my demo. While doing so, I learned much more about audio than I had previously known, and realized that the configuration I used to record audio can significantly improve whisper performance as well. Turns out there’s a nice guide of somebody discussing parameters that work. Specifically, you should use 16 kHz sample rate for transcriptions. Reducing the sample rate also should reduce network overhead because we have less data, but it could reduce quality of the transcription. Oh well. Additionally, I realized I wasn’t using a pre-made ElevenLabs voice. After introducing both of these optimizations, I was able to achieve 1520 ms turn time.
Finally, I realized I was doing all of my benchmarks on a development server. I switched my phoenix environment from dev to prod and got: 1375 ms. So, with all of these optimizations we’re sitting at about 1.3s turn around time in a conversation. When conversing, it starts to feel somewhat close to natural. I should also point out that this is also running over Tailscale, so there is about 100 ms ping between my Mac and the server running on my GPU. When I run this locally on my GPU, I can consistently get about 1000 ms and sometimes 900 ms turn around time. Still, unfortunately, this does not match Retell’s latency. According to them, they are able to achieve 800 ms consistently. I have some musings at the end about how this is possible.
I believe the biggest area I could improve the implementation is to use a better VAD implementation that relies on small rolling windows of activity rather than frames. We could probably get away with using 20-30 ms windows, which could theoretically offer a 480 ms latency improvement. I would like to eventually explore this.
In all honesty though, I think that is a significant improvement, and I could probably stop right here and be done with it.
If I were to keep going, I would explore using a local LLM with Nx and Bumblebee. Nx and Bumblebee support LLMs like Mistral and Llama out-of-the box. And our text generation servings support streaming. That means we can possibly eliminate any network latency to OpenAI, and instead run 2 of the 3 models locally. One issue with this is that Nx currently does not have any quantized inference support (it’s coming I promise), so my single 4090 is not sufficient to deploy both Whisper and Mistral. Fortunately, the folks at Fly.io were kind enough to give me access to some 80GB A100s. I will post a demo when I get one deployed 🙂
Maybe one day I will implement StyleTTS2 and see how efficient we can get with an entirely local inference pipeline.
Some people pointed out that my original demo did not have the same conversational experience as Retell’s, and they are absolutely right. Aside from latency, mine was prone to failure, picks up system sounds, picks up random noises like keyboard and mouse clicks, and doesn’t do well with ambient noise. They also have support for backchanneling, fillers and interruptions which introduces some element of “realness” when interacting with their agent.
Now I didn’t get around to adding backchannels or fillers, but I was able to make some slight improvements to the VAD algorithm I used, and I added support for interruptions.
The first failure mode that seems to happen is echo from the system sounds. Nero always records and will start transcribing after audio spikes over a certain threshold. After some digging into the getUserMedia API, I found options for echoCancellation, noiseSuppression, and autoGainControl. This is the same point I realized that I could specify the microphone sample rate for the optimization I could added from the last section. Most of these options are on by default depending on your browser, but I added them explicitly anyway:
Now that somewhat helped, but Nero still picks up it’s own audio. This probably requires a more sophisticated solution, but I moved on to the next problem.
The second obvious failure mode is the fact that it picks up keyboard clicks, and the silence timeout is hard to tune. My first attempt to fix this was to “ignore” large spikes in audio by “smoothing” the volume at each frame:
Then, with some advice from Paulo Valente, I implemented a biquad filter to with a low and high-pass in order to filter audio to the range of human speech:
In practice, both of these solutions actually seemed to work decent, but they could absolutely be better. I know it’s possible to improve the client-side filtering using a rolling-window that looks energy of the speaking frequences relative to energy of an entire sample. But, there are also machine learning models that perform VAD, and have 1ms inference times. I realized that it’s probably quicker to just send all of the data over the websocket in chunks, and perform VAD on the server. I’ll discuss that implementation a little later.
Next I wanted to add support for interruptions. In the Retell example, the speaker will cut off mid-speech if it detects that you are speaking. To implement this feature in Nero, I added a pushEvent to the Microphone hook which would push an interrupt event to the server anytime speech is detectected:
The server handles this event and broadcasts an event to the TTS channel to stop speaking:
And the channel handles the event by clearing out the output audio stream and queue:
Unfortunately, this does create a race condition. There’s a potential situation where a speaker interrupts and the speaking queue gets cleared on the client, but ElevenLabs is still streaming audio back to the server. The server is always going to just broadcast this info to the client, and as is the client will process it. This potentially creates a situation with weird continutations in the audio. To get around this, I refactored the TTS implementation so that each audio broadcast appends a 6 digit token to the payload. Then, all we need to do is keep the token in sync with the client and server. On the client, when processing the audio queue, it simply checks whether or not the token at the beginning of the payload matches, and if it doesn’t it ignores that sample.
The limitation with this implementation is it does not update the chat transcript. It’s entirely possible because we have access to the alignment information from ElevenLabs, but I just didn’t implement it at this time.
Another thing the Retell demo has support for is cues and hang ups after a duration of silence. If you are silent for too long, you’ll get a cue from the AI speaker asking you if you’re still there. After another duration of silence, it will hang up. This is something that’s pretty easy to do with LiveView and Process.send_after/4:
And then we can cancel the timer anytime we receive a transcription, and restart it after every turn speaking. Note that we can’t depend on the Phoenix speak async task ending as the trigger to send nudges. Instead, we need to push an event from the speaker hook that the audio has ended. This avoids a case where the speaker initiates a really long speech, which overlaps with the nudge_ms duration. Now, we can control the number of nudges with an assign. In my case, I just used a boolean:
Somewhere along the line I realized that my attempts at engineering solid VAD client-side were never going to deliver the experience that I wanted. I discussed with Andres Alejos a bit, and he found a Silero VAD model which is capable of performing VAD in 1ms on a single CPU thread. They also had an ONNX model—and we have a library in the Elixir ecosystem called Ortex which allows us to execute ONNX models.
To accomodate for the new VAD model, I ended up re-implementing the original LiveView I had as a WebSocket. This actually works out well because the WebSocket server is generic, and can be consumed by any language with a WebSocket client. The implementation is also relatively simple, and easily expanded to accomodate for other LLMs, TTS, and STT models. The WebSocket implementation has low latency (when running on a GPU), and supports interrupts.
You can find the project on my GitHub as well as an example using the server.
The final implementation I ended up with still does not match the quality of the Retell demo. That said, I think it’s a solid start for future work. I believe I acted with some hubris when first posting about this project, and I would like to say that Retell’s work should not be understated. I can appreciate the attention to detail that goes into making an effective conversational agent, and Retell’s demo shows they paid a lot of attention to the details. Kudos to them and their team.
I will also admit that my demo is playing to one benchmark. I’m optimizing the hell out of latency to support a single user—me. I think this solution would change if it needed to accomodate for multiple concurrent users.
Retell’s website claims they have a conversation orchestration model under the hood to manage the complexities of conversation. I had my doubts about that going into this, but I believe it now. Whether or not this model is actually a single model or a series of models for VAD, adding backchannels, etc. I’m not sure. I think eventually it will be a single model, but I’m not sure if it is now, which leads me to my next point.
While doing all of these optimizations, I could not help but think that it will eventually be all for naught. Not because I don’t think people will find it useful, but because large models trained on lots of data simply seem to always beat engineering effort. I believe the future of this area of work is in joint models. I think the only way to achieve real-time conversations is to merge parts of the stack. I predict in less than a year we will see an incredibly capable joint speech/text model. I recently saw a large audio model called Qwen-Audio that I believe is similar to what I envision.
Specifically, if somebody were kind enough to give me some money to throw at this problem, here is exactly what I would do:
Generate an Alpaca-style and/or LLaVA-style dataset of synthetic speech. Note that it would require a bit of pre-processing to change Alpaca inputs to mirror a style compatible with spoken-word. I would use ElevenLabs to generate the dataset in mulitple voices. Of course this dataset would be a bit too “clean,” so we’d need to apply some augmentations which add ambient noise, change speaking pitch and speed, etc. Bonus points: adding samples of “noise” which require no response to merge the VAD part of the pipeline in as well. You can even throw in text prompts that dictate when and when not to respond to support things like wake word detection without needing to train a separate model.
Create a LLaVA-style model with a Whisper or equivalent base, an LLM, and a projection layer.
Secure H100s, train model, and “turn H100s into $100s” (thank you @thmsmlr)
If you want to give me some $$$, my e-mail is smoriarity.5@gmail.com 🙂
I believe we are also close to just having full-on speech-to-speech models. A specific challenge I can see when creating these models is coming up with a high-quality dataset. I think if you make a deliberate attempt at “recording conversations” for the purposes of training, you will actually probably end up with a lower-quality dataset. People tend to change their behavior under observation. Additionally, conversations from movies and TV shows aren’t actually very natural. Even some podcasts have an unnatural converastional rhythm.
While watching Love is Blind with my fiancé, I realized you could probably get a decent amount of quality data from reality tv shows. The conversations in reality TV are overly dramatic and chaotic, but are (I think) closer to realistic than anything else.
I do wonder what a solid RAG implementation looks like on top of a conversational agent. RAG and complex CoT pipelines will introduce latency which could deteriorate the conversational experience. However, there are clever ways you can hide this. In conversations that require “search” between humans, e.g. like scheduling an appointment, you’ll often have one party saying “one moment please” before performing a system search. Building something like that in is entirely possible. Additionally, if your agent requires information up front about an individual, it’s possible to include that in the initial prompt.
I was very excited for this problem in particular because it’s literally the perfect application of Elixir and Phoenix. If you are building conversational agents, you should seriously consider giving Elixir a try. A large part of how quick this demo was to put together is because of how productive Elixir is.
This was a fun technical challenge. I am pleased with the performance of the final demonstration. I’m also happy I was able to OSS a small library for others to build off of. If you are interested in conversational agents, I encourage you to check it out, give feedback, and contribute! I know it’s very rough right now, but it will get better with time.
Additionally, I plan to periodically build out the rest of the Nero project, so please follow me on Twitter if you’d like to stay up to date.
...
Read the original on seanmoriarity.com »
Every JRMF puzzle is hands-on, play-based, and standards-aligned.
We design our activities to have a low floor so that anyone can find a way to engage and a high ceiling so that everyone can find a meaningful challenge.
All of our puzzles come with free festival guides that help you use our activities at home, in the classroom, or during math festivals.
Can you be the first player to get 3-in-a-row?
After a long day of picking apples, can you eat the last, juiciest apple?
Embrace your inner pool shark and try to make every shot.
Can you remove all the dots?
Can you build a bridge that connects the stars?
Can you make the longest line?
Help all the chameleons on the island change into the same color.
Can you make the perfect chocolate box for your customers?
Can you avoid taking the yucky chocolate piece?
Create colorful loops so that you can always find your way back to the beginning of the path.
Be the player to take the last token!
Crack the secret code in as few guesses as you can!
Can you find a way to stack every cup?
Use every digit of your numbers to solve these unique and challenging number puzzles.
Can you tile the board using each domino exactly once?
Can you trace each doodle without lifting your marker?
Can you make a better die than your partner?
Can you help the frogs and toads get to the other side of the pond?
Can you split up each puzzle so that more than half of the groups are purple groups?
Can you gracefully label these pumpkin patches?
Learn the mysteries of the simple yet surprising hexaflexagon, and take one home to show your friends!
Can you jump your way from start to finish?
Everything is made up of one, continuous line. Can you figure out what belongs in town?
Help as many ladybugs as possible land on the leaves.
Fill your entire garden with plots of carrots.
Can you arrange 5 numbers into a Magic Flower in which every group of three numbers in a line adds up to the same magic number?
Can you color each map so that no two neighboring states are colored the same color?
Can you find all the places where the Meeple can live?
Race your friends to find the patterns in these Halloween-themed SET cards!
Can you make each mosaic?
Can you make palindromes in the fewest number of swaps?
Among all the counterfeits there is only one true gold bar. Can you find it?
Can you find a way to cover the shapes with pentominoes?
Can you build a polyhedron using the shapes in each puzzle?
Learn how doctors help patients using pool testing through this engaging game.
Armed with only your understanding of prime numbers, play against your friends to find the best way to fill in our colorful cube.
Help your wolf, goat, and cabbage cross the river.
Race your friends to get the rook to the end of the chessboard. Can you find a way to win every time?
Can you make the pattern in the puzzle by rotating the cube?
Help the city planner build up the city with new skyscrapers.
Smiling is contagious! Can you turn every frown upside down?
Take turns drawing curves to connect dots. Can you be the one to draw the last curve?
Can you place the stars on the grid?
Can you find a way to place the next stepping stone?
Can you cover the entire square with pattern blocks?
How many toads can you help sunbathe by balancing them on a lilypad?
Can you make the right number of squares out of toothpicks?
Can you make the right number of triangles out of toothpicks?
Can you solve these puzzles based on an ancient legend, golden disks, and the end of the world?
Keep your sheep safe from the wolves!
...
Read the original on jrmf.org »
Google has fired 28 employees over their participation in a 10-hour sit-in at the search giant’s offices in New York and Sunnyvale, California, to protest the company’s business ties with the Israel government, The Post has learned.
The pro-Palestinian staffers — who had donned traditional Arab headscarves as they stormed and occupied the office of a top executive in California on Tuesday — were terminated late Wednesday after an internal investigation, Google vice president of global security Chris Rackow said in a companywide memo.
“They took over office spaces, defaced our property, and physically impeded the work of other Googlers,” Rackow wrote in the memo obtained by The Post. “Their behavior was unacceptable, extremely disruptive, and made co-workers feel threatened.”
In New York, protesters had occupied the 10th floor of Google’s offices in the Chelsea section of Manhattan as part of a protest that also extended to the company’s offices in Seattle for what it called “No Tech for Genocide Day of Action.”
“Behavior like this has no place in our workplace and we will not tolerate it,” Rackow wrote. “It clearly violates multiple policies that all employees must adhere to — including our code of conduct and policy on harassment, discrimination, retaliation, standards of conduct, and workplace concerns.”
Rackow added that the company “takes this extremely seriously, and we will continue to apply our longstanding policies to take action against disruptive behavior — up to and including termination.”
The fired staffers are affiliated with a group called No Tech For Apartheid, which has been critical of Google’s response to the Israel-Hamas war.
The group had posted several videos and livestreams of the protests on its X account — including the exact moment that employees were issued final warnings and arrested by local police for trespassing.
The protesters have demanded that Google pull out of a $1.2 billion “Project Nimbus” contract — in which Google Cloud and Amazon Web Services provide cloud-computing and artificial intelligence services for the Israeli government and military.
Critics at the company raised concerns that the technology would be weaponized against Palestinians in Gaza.
The impacted workers blasted Google over the firings in a statement shared by No Tech For Apartheid spokesperson Jane Chung.
“This evening, Google indiscriminately fired 28 workers, including those among us who did not directly participate in yesterday’s historic, bicoastal 10-hour sit-in protests,” the workers said in the statement.
“This flagrant act of retaliation is a clear indication that Google values its $1.2 billion contract with the genocidal Israeli government and military more than its own workers — the ones who create real value for executives and shareholders.”
“Sundar Pichai and Thomas Kurian are genocide profiteers,” the statement added, referring to Google’s CEO and the CEO of its cloud unit, respectively.
“We cannot comprehend how these men are able to sleep at night while their tech has enabled 100,000 Palestinians killed, reported missing, or wounded in the last six months of Israel’s genocide — and counting.”
An NYPD spokesperson said the Tuesday protest “involved approximately 50 participants” in total and confirmed “four arrests were made for trespassing inside the Google building.”
The Sunnyvale Department of Public Safety said the protest in California “consisted of around 80 participants.” A total of five protesters who refused to leave the Google office were “arrested without incident for criminal trespassing,” booked and released, a spokesperson added.
It couldn’t immediately be learned if all nine arrested employees were among those who were fired. Google had earlier placed the employees on administrative leave and cut their access to internal systems.
Last month, Google fired a software engineer who publicly blasted one of the company’s Israel-based executives during a tech conference in New York City.
When reached for comment, a Google spokesperson confirmed the firings.
“These protests were part of a longstanding campaign by a group of organizations and people who largely don’t work at Google,” the spokesperson said in a statement.
“A small number of employee protesters entered and disrupted a few of our locations. Physically impeding other employees’ work and preventing them from accessing our facilities is a clear violation of our policies, and completely unacceptable behavior.”
“We have so far concluded individual investigations that resulted in the termination of employment for 28 employees, and will continue to investigate and take action as needed,” the spokesperson added.
The demonstrators stormed the personal office of Google Cloud CEO Thomas Kurian in Sunnyvale.
Kurian’s custom-made, framed Golden State Warriors jersey was visible on the office wall in the background of the livestream, and employees wrote a list of their demands on his white board.
The companywide memo can be read in its entirety below.
You may have seen reports of protests at some of our offices yesterday. Unfortunately, a number of employees brought the event into our buildings in New York and Sunnyvale. They took over office spaces, defaced our property, and physically impeded the work of other Googlers. Their behavior was unacceptable, extremely disruptive, and made co-workers feel threatened. We placed employees involved under investigation and cut their access to our systems. Those who refused to leave were arrested by law enforcement and removed from our offices.
Following investigation, today we terminated the employment of twenty-eight employees found to be involved. We will continue to investigate and take action as needed.
Behavior like this has no place in our workplace and we will not tolerate it. It clearly violates multiple policies that all employees must adhere to — including our Code of Conduct and Policy on Harassment, Discrimination, Retaliation, Standards of Conduct, and Workplace Concerns.
We are a place of business and every Googler is expected to read our policies and apply them to how they conduct themselves and communicate in our workplace. The overwhelming majority of our employees do the right thing. If you’re one of the few who are tempted to think we’re going to overlook conduct that violates our policies, think again. The company takes this extremely seriously, and we will continue to apply our longstanding policies to take action against disruptive behavior — up to and including termination.
You should expect to hear more from leaders about standards of behavior and discourse in the workplace.
...
Read the original on nypost.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.