10 interesting stories served every morning and every evening.




1 534 shares, 29 trendiness, 865 words and 9 minutes reading time

16 million Americans will vote on hackable paperless machines

Despite the ob­vi­ous risk and years of warn­ings, at least eight American states and 16 mil­lion American vot­ers will use com­pletely pa­per­less ma­chines in the 2020 US elec­tions, a new re­port by New York University’s Brennan Center for Justice found.

Paperless vot­ing ma­chines per­sist de­spite a strong con­sen­sus among US cy­ber­se­cu­rity and na­tional se­cu­rity ex­perts that pa­per bal­lots and vote au­dits are nec­es­sary to en­sure the se­cu­rity of the next elec­tion. The Brennan Center re­port points to the Senate Intelligence Committee in­ves­ti­ga­tion of Russian in­ter­fer­ence in the 2016 sec­tion, which also rec­om­mends pa­per bal­lots for se­cu­rity and ver­i­fi­ca­tion.

America’s largest elec­tion tech­nol­ogy com­pany, Election Systems & Software, an­nounced ear­lier this year that it would stop sell­ing pa­per­less ma­chines. ES&S spokes­woman Katina Granger said the en­tire in­dus­try should fol­low suit.

Using a phys­i­cal pa­per record sets the stage for all ju­ris­dic­tions to per­form sta­tis­ti­cally valid post-elec­tion au­dits,” Granger told MIT Technology Review. We be­lieve that re­quir­ing a pa­per record for every voter would be a valu­able step in se­cur­ing America’s elec­tions.”

ES&S, the largest elec­tion tech ven­dor in the coun­try, cov­ers 44% of American vot­ers, a 2016 report by the University of Pennsylvania’s Wharton School found. Dominion Voting Systems cov­ers 37% of vot­ers, and Hart InterCivic 11%. Both still sell pa­per­less vot­ing ma­chines.

Senator Ron Wyden, a lead­ing Capitol Hill voice on elec­tion se­cu­rity, has per­sis­tently pushed leg­is­la­tion that would fed­er­ally man­date pa­per bal­lots, among other se­cu­rity mea­sures. The leg­is­la­tion has been blocked by the Republican Senate ma­jor­ity leader, Mitch McConnell.

Selling a pa­per­less vot­ing ma­chine is like sell­ing a car with­out brakes—some­thing is go­ing to go ter­ri­bly wrong,” Wyden says. It is ob­vi­ous that ven­dors won’t do the right thing on se­cu­rity by them­selves. Congress needs to set manda­tory fed­eral elec­tion se­cu­rity stan­dards that out­law pa­per­less vot­ing ma­chines and guar­an­tee every American the right to vote with a hand-marked pa­per bal­lot. Experts agree that hand-marked pa­per bal­lots and post-elec­tion au­dits are the best de­fense against for­eign hack­ing. Vendors should rec­og­nize that fact or get out of the way.”

Chris Krebs, the top cy­ber­se­cu­rity of­fi­cial in the Department of Homeland Security, said last week that pa­per bal­lot back­ups are needed in 2020. Congressional and law en­force­ment of­fi­cials have re­peat­edly said that for­eign pow­ers are poised to in­ter­fere in the up­com­ing American elec­tions.

Most of the states us­ing com­pletely pa­per­less ma­chines in 2020 are not his­tor­i­cally bat­tle­ground states, which are seen as the most valu­able tar­gets for in­ter­fer­ence and im­pact. Texas, Louisiana, Tennessee, Mississippi, Kansas, Indiana, Kentucky, and New Jersey will use pa­per­less ma­chines.

Some of those states, how­ever, are more closely con­tested than usual. Texas in par­tic­u­lar has been turn­ing in­creas­ingly pur­ple. In 2016, Democratic Senate can­di­date Beto O’Rourke stunned na­tional ob­servers by putting up a close race against Republican Ted Cruz. If Texas starts pro­duc­ing more close races, its per­sis­tence in us­ing pa­per­less ma­chines—the re­sult of a stalled leg­isla­tive bat­tle over elec­tion se­cu­rity—will make it a more tempt­ing tar­get.

If my goal is to throw an­other US elec­tion, I have lim­ited re­sources. I can’t at­tack every­where. I want to fo­cus on bat­tle­ground states where even a small push can have a large im­pact,” says Dan Wallach, a com­puter sci­ence pro­fes­sor at Rice University and a mem­ber of the Technical Guidelines Development Committee of the US Election Assistance Commission.

Security ex­perts over­whelm­ingly agree that pa­per bal­lots and risk-lim­it­ing au­dits are nec­es­sary to se­cure elec­tions in the 21st cen­tury. Emblematic of that con­sen­sus is  a 2018 re­port from the National Academies of Sciences, Engineering, and Medicine ti­tled Securing the Vote: Protecting American Democracy.” The con­clu­sions: Elections need hu­man-read­able bal­lots and man­dated au­dits be­fore elec­tion re­sults are cer­ti­fied.

Elections are man­aged pri­mar­ily by US states as op­posed to the fed­eral gov­ern­ment. Technical and se­cu­rity re­quire­ments vary by state and even by lo­cal gov­ern­ment. Despite slow move­ment and re­sis­tance from some cor­ners of the coun­try, progress away from pa­per­less ma­chines has been sig­nif­i­cant: com­pared with the 16 mil­lion Americans who will vote on pa­per­less ma­chines in 2020, 27.5 mil­lion did so in 2016, ac­cord­ing to the Brennan Center re­port.

Backups, how­ever, are not a sil­ver bul­let for elec­tion se­cu­rity. Security ex­perts say pa­per bal­lots are so im­por­tant pre­cisely be­cause sub­se­quent au­dits are nec­es­sary, and 17 of the 42 states re­quir­ing pa­per do not re­quire au­dits.

The name of the game as elec­tion of­fi­cials is to pro­duce elec­tion re­sults that are con­vinc­ing to the loser,” Wallach says. The win­ner is happy to win; the loser re­quires ev­i­dence. In the face of pro­pa­ganda and in­ter­fer­ence, you have to pro­duce con­vinc­ing re­sults—con­vinc­ing the loser to say, Yeah, I lost.’ This is why peo­ple like pa­per. Once pa­per has been printed, it’s hard for some­one like Vladimir Putin to reach out over the in­ter­net and change what’s al­ready been printed on dead trees.”

Audits mean a high chance that in­cor­rect out­comes can be de­tected with sta­tis­ti­cal ef­fi­ciency, the National Academies of Sciences con­cluded.

The ex­perts rec­om­mended that all vot­ing ma­chines should use hu­man-read­able pa­per bal­lots and that ma­chines with­out the abil­ity for in­de­pen­dent au­dit­ing should be re­moved from elec­tions right away.

...

Read the original on www.technologyreview.com »

2 418 shares, 22 trendiness, 777 words and 8 minutes reading time

Researcher publishes second Steam zero day after getting banned on Valve's bug bounty program

A Russian se­cu­rity re­searcher has pub­lished de­tails about a zero-day in the Steam gam­ing client. This is the sec­ond Steam zero-day the re­searcher has made pub­lic in the past two weeks.

However, while the se­cu­rity re­searcher re­ported the first one to Valve and tried to have it fixed be­fore pub­lic dis­clo­sure, he said he could­n’t do the same with the sec­ond be­cause the com­pany banned him from sub­mit­ting fur­ther bug re­ports via its pub­lic bug bounty pro­gram on the HackerOne plat­form.

The en­tire chain of events be­hind the pub­lic dis­clo­sure of these two zero-days has caused quite a drama and dis­cus­sions in the in­fosec com­mu­nity.

All the neg­a­tive com­ments have been aimed at Valve and the HackerOne staff, with both be­ing acused of un­pro­fes­sional be­hav­ior.

Security re­searchers and reg­u­lar Steam users alike are mad be­cause Valve re­fused to ac­knowl­edge the re­ported is­sue as a se­cu­rity flaw, and de­clined to patch it.

When the se­cu­rity re­searcher — named Vasily Kravets– wanted to pub­licly dis­close the vul­ner­a­bil­ity, a HackerOne staff mem­ber for­bade him from do­ing so, even if Valve had no in­ten­tion of fix­ing the is­sue — ef­fec­tively try­ing to pre­vent the re­searcher from let­ting users know there was a prob­lem with the Steam client at all.

Kravets did even­tu­ally pub­lish de­tails about the Steam zero-day, which was an el­e­va­tion of priv­i­lege (also known as a lo­cal priv­i­lege es­ca­la­tion) bug that al­lowed other apps or mal­ware on a user’s com­puter to abuse the Steam client to run code with ad­min rights.

Kravets said he was banned from the plat­form fol­low­ing the pub­lic dis­clo­sure of the first zero-day. His bug re­port was heav­ily cov­ered in the me­dia, and Valve did even­tu­ally ship a fix, more as a re­ac­tion to all the bad press the com­pany was get­ting.

The patch was al­most im­me­di­atelly proved to be in­suf­fi­cient, and an­other se­cu­rity re­searcher found an easy way to go around it al­most right away.

Furthermore, a well-known and highly re­spected se­cu­rity re­searcher named Matt Nelson also re­vealed he found the same ex­act bug, but af­ter Kravets, which he too re­ported to Valve’s HackerOne pro­gram, only to go through a sim­i­lar bad ex­pe­ri­ence as Kravets.

Nelson said Valve and HackerOne took five days to ac­knowl­edge the bug, re­fused to patch it, and then locked the bug re­port when Nelson wanted to dis­close the bug pub­licly and warn users.

Nelson later re­leased proof-of-con­cept code for the first Steam zero-day, and also crit­i­cized Valve and HackerOne for their abysmall han­dling of his bug re­port.

Today, Kravets pub­lished de­tails about a sec­ond Valve zero-day, which is an­other EoP/LPE in the Steam client, al­low­ing ma­li­cious apps to gain ad­min rights through Valve’s Steam app. Demos of the sec­ond Steam zero-day are em­bed­ded be­low, and a tech­ni­cal write-up is avail­able on Kravets’ site.

A Valve spokesper­son did not re­ply to a re­quest for com­ment, but the com­pany rarely com­ments on se­cu­rity is­sues.

All of Valve’s prob­lems seem to come from the fact that the com­pany has placed EoP/LPE vul­ner­a­bil­i­ties as out-of-scope” for its HackerOne plat­form, mean­ing the com­pany does­n’t view them as se­cu­rity is­sues.

Nelson, a se­cu­rity re­searcher who has made a name for him­self for find­ing a slew of in­ter­est­ing bugs in Microsoft prod­ucts, does­n’t agree with Valve’s de­ci­sion.

EoP/LPE vul­ner­a­bil­i­ties can’t al­low a threat ac­tor to hack a re­mote app or com­puter. They are vul­ner­a­bil­i­ties abused dur­ing post-ex­ploita­tion, mostly so at­tack­ers can take full con­trol over a tar­get by gain­ing root/​ad­min/​sys­tem rights.

While Valve does­n’t con­sider these as se­cu­rity flaws, every­one else does. For ex­am­ple, Microsoft patches tens of EoP/LPE flaws each month, and OWASP con­sid­ers EoP/LPE as the fifth most dan­ger­ous se­cu­rity flaw in its in­fa­mous Top 10 Vulnerabilities list.

By re­fus­ing to patch the first zero-day, Valve in­ad­ver­tantly sent a mes­sage out that it does­n’t care about the se­cu­rity of its prod­uct, putting the com­pa­ny’s 100+ mil­lion Windows users in dan­ger just by hav­ing the Steam client in­stalled on their com­put­ers.

Sure! Valve is right, in its own way. An at­tacker can’t use an EoP/LPE to break into a Steam user’s client. That’s a fact. But, that’s not the point.

When users in­stall the Steam client on their com­put­ers, they also don’t ex­pect the app to be a launch­ing pad for mal­ware or other at­tacks.

An app and users’ se­cu­rity is more than re­mote code ex­e­cu­tion (RCE) bugs. Otherwise, if EoP/LPE bugs weren’t a big deal, every­one else would­n’t bother patch­ing them ei­ther.

...

Read the original on www.zdnet.com »

3 376 shares, 29 trendiness, 173 words and 2 minutes reading time

Should the Electoral College Be Eliminated? 15 States Are Trying to Make It Obsolete

The man who helped in­vent scratch-off lot­tery tick­ets now has his sights set on a big­ger prize: over­haul­ing the way the United States elects pres­i­dents.

On Tuesday, Nevada be­came the lat­est state to pass a bill that would grant its elec­toral votes to who­ever wins the pop­u­lar vote across the coun­try, not just in Nevada. The move­ment is the brain­child of John Koza, a co-founder of National Popular Vote, an or­ga­ni­za­tion that is work­ing to elim­i­nate the in­flu­ence of the Electoral College.

If Nevada’s gov­er­nor signs the bill, the state will be­come the 15th — plus the District of Columbia — to join an in­ter­state pact of states promis­ing to switch to the new sys­tem. Those states, in­clud­ing Nevada, have a to­tal of 195 elec­toral votes. The pact would take ef­fect once enough states have joined to guar­an­tee the na­tional win­ner 270 elec­toral votes, en­sur­ing elec­tion.

Enforcement, how­ever, could be very dif­fi­cult with­out con­gres­sional ap­proval, ac­cord­ing to con­sti­tu­tional law ex­perts. And the pact would be highly vul­ner­a­ble to le­gal chal­lenges, they say.

...

Read the original on www.nytimes.com »

4 362 shares, 16 trendiness, 331 words and 4 minutes reading time

Mozilla takes action to protect users in Kazakhstan – The Mozilla Blog

Today, Mozilla and Google took ac­tion to pro­tect the on­line se­cu­rity and pri­vacy of in­di­vid­u­als in Kazakhstan. Together the com­pa­nies de­ployed tech­ni­cal so­lu­tions within Firefox and Chrome to block the Kazakhstan gov­ern­men­t’s abil­ity to in­ter­cept in­ter­net traf­fic within the coun­try.

The re­sponse comes af­ter cred­i­ble re­ports that in­ter­net ser­vice providers in Kazakhstan have re­quired peo­ple in the coun­try to down­load and in­stall a gov­ern­ment-is­sued cer­tifi­cate on all de­vices and in every browser in or­der to ac­cess the in­ter­net. This cer­tifi­cate is not trusted by ei­ther of the com­pa­nies, and once in­stalled, al­lowed the gov­ern­ment to de­crypt and read any­thing a user types or posts, in­clud­ing in­ter­cept­ing their ac­count in­for­ma­tion and pass­words. This tar­geted peo­ple vis­it­ing pop­u­lar sites Facebook, Twitter and Google, among oth­ers.

People around the world trust Firefox to pro­tect them as they nav­i­gate the in­ter­net, es­pe­cially when it comes to keep­ing them safe from at­tacks like this that un­der­mine their se­cu­rity. We don’t take ac­tions like this lightly, but pro­tect­ing our users and the in­tegrity of the web is the rea­son Firefox ex­ists.” — Marshall Erwin, Senior Director of Trust and Security, Mozilla

We will never tol­er­ate any at­tempt, by any or­ga­ni­za­tion—gov­ern­ment or oth­er­wise—to com­pro­mise Chrome users’ data. We have im­ple­mented pro­tec­tions from this spe­cific is­sue, and will al­ways take ac­tion to se­cure our users around the world.” — Parisa Tabriz, Senior Engineering Director, Chrome

This is not the first at­tempt by the Kazakhstan gov­ern­ment to in­ter­cept the in­ter­net traf­fic of every­one in the coun­try. In 2015, the Kazakhstan gov­ern­ment at­tempted to have a root cer­tifi­cate in­cluded in Mozilla’s trusted root store pro­gram. After it was dis­cov­ered that they were in­tend­ing to use the cer­tifi­cate to in­ter­cept user data, Mozilla de­nied the re­quest. Shortly af­ter, the gov­ern­ment forced cit­i­zens to man­u­ally in­stall its cer­tifi­cate but that at­tempt failed af­ter or­ga­ni­za­tions took le­gal ac­tion.

Each com­pany will de­ploy a tech­ni­cal so­lu­tion unique to its browser. For ad­di­tional in­for­ma­tion on those so­lu­tions please see the be­low links.

...

Read the original on blog.mozilla.org »

5 346 shares, 26 trendiness, 1958 words and 17 minutes reading time

Thoughts on Rust bloat

I’m about to ac­cept a PR that will in­crease druid’s com­pile time about 3x and its ex­e­cutable size al­most 2x. In this case, I think the trade­off is worth it (without lo­cal­iza­tion, a GUI toolkit is strictly a toy), but the bloat makes me un­happy and I think there is room for im­prove­ment in the Rust ecosys­tem.

For me, bloat in Rust is mostly about com­pile times and ex­e­cutable size. Compile time is on the top 10 list of bad things about the Rust de­vel­op­ment ex­pe­ri­ence, but to some ex­tent it’s un­der the de­vel­op­er’s con­trol, es­pe­cially by choos­ing whether or not to take de­pen­den­cies on bloated crates.

Bloat is an en­demic prob­lem in soft­ware, but there are a few things that make it a par­tic­u­lar chal­lenge for Rust:

* Cargo makes it so easy to just reach for a crate.

One of the sub­tler ways com­pile times af­fect the ex­pe­ri­ence is in tools like RLS.

It’s go­ing to vary from per­son to per­son, but I per­son­ally do care a lot. One of my hopes for xi-ed­i­tor is that the core would be light­weight, es­pe­cially as we could fac­tor out con­cerns like UI. However, the re­lease bi­nary is now 5.9M (release build, Windows, and does­n’t in­clude syn­tax col­or­ing, which is an ad­di­tional 2.1M). I’ve done a bunch of other things across the Rust ecosys­tem to re­duce bloat, and I’ll brag a bit about that in this post.

Of course, the rea­son why I’m con­sid­er­ing such a huge jump in com­pile times on druid is that I want lo­cal­iza­tion, an im­por­tant and com­plex fea­ture. Doing it right re­quires quite a bit of logic around lo­cale match­ing, Unicode, and nat­ural lan­guage pro­cess­ing (such as plural rules). I don’t ex­pect a tiny crate for this.

One re­cent case we saw a sim­i­lar trade­off was the ob­ser­va­tion that the uni­case dep adds 50k to the bi­nary size for pull­down-cmark. In this case, the CommonMark spec de­mands Unicode case-fold­ing, and with­out that, it’s no longer com­ply­ing with the stan­dard. I un­der­stand the temp­ta­tion to cut this cor­ner, but I think hav­ing ver­sions out there that are not spec-com­pli­ant is a bad thing, es­pe­cially un­friendly to the ma­jor­ity of peo­ple in the world whose na­tive lan­guage is other than English.

So, it’s im­por­tant not to con­fuse lean en­gi­neer­ing with a lack of im­por­tant fea­tures. I would say bloat is un­needed re­source con­sump­tion be­yond what’s nec­es­sary to meet the re­quire­ments. Unicode and in­ter­na­tion­al­iza­tion are a par­tic­u­larly con­tentious point, both be­cause they ac­tu­ally do re­quire code and data to get right, but also be­cause there’s a ton of po­ten­tial for bloat.

I would ap­ply a higher stan­dard to foundational” crates, which are in­tended to be used by most Rust ap­pli­ca­tions that need the func­tion­al­ity. Bloat in those is a rea­son not to use the de­pen­dency, or to frag­ment the ecosys­tem into dif­fer­ent so­lu­tions de­pend­ing on needs and tol­er­ance for bloat.

I think a par­tic­u­lar risk are crates pro­vid­ing gen­er­ally use­ful fea­tures, ones that would def­i­nitely make the cut in a batteries in­cluded” lan­guage. Some of these (bitflags, lazy_sta­tic, cfg-if, etc) are not very heavy, and pro­vide ob­vi­ous ben­e­fit, es­pe­cially to make the API more hu­mane. For oth­ers (rental, fail­ure), the cost is higher and I would gen­er­ally rec­om­mend not us­ing them in foun­da­tional crates. But for your own app, if you like them, sure. I be­lieve rental might be the most ex­pen­sive tran­si­tive de­pen­dency for flu­ent, as I find it takes 27.3s (debug, Windows; 53.2s for re­lease) for the crate alone.

I’m con­cerned about bloat in gfx-rs - about a minute for a de­bug build, and about 3M (Windows, quad ex­am­ple). For this rea­son (and sta­bil­ity and doc­u­men­ta­tion), I’m lean­ing to­wards mak­ing the GPU ren­der­ers for piet use the un­der­ly­ing graph­ics APIs di­rectly rather than us­ing this ab­strac­tion layer. I’ve found sim­i­lar pat­terns with other wrapper” crates, in­clud­ing di­rec­t2d. But here the trade­offs are com­plex.

I don’t have hard num­bers yet, but I’ve found that the rust-objc macros pro­duce quite bloated code, on the or­der of 1.5k per method in­vo­ca­tion. This is lead­ing me to con­sider rewrit­ing the ma­cOS plat­form bind­ing code in Objective-C di­rectly (using C as the com­mon FFI is not too bad), rather than re­ly­ing on Rust code that uses the dy­namic Objective-C run­time. I ex­pect bloat here to af­fect a fairly wide range of code that calls into the ma­cOS (and iOS) plat­form, so it would be a good topic to in­ves­ti­gate more deeply.

I some­times hear that it’s ok to de­pend on com­monly-used crates, be­cause their cost is amor­tized among the var­i­ous users that share them. I’m not con­vinced, for a va­ri­ety of rea­sons. For one, it’s com­mon that you get dif­fer­ent ver­sions any­way (the Zola build cur­rently has two ver­sions each of uni­case, park­ing_lot, park­ing_lot_­core, cross­beam-deque, toml, de­rive_­more, lock­_api, scope­guard, and winapi). Second, if gener­ics are used heav­ily (see be­low), there’ll likely be code du­pli­ca­tion any­way.

That said, for stuff like Unicode data, it is quite im­por­tant that there as few copies as pos­si­ble in the bi­nary. The best choice is crates en­gi­neered to be lean.

A par­tic­u­larly con­tentious ques­tion is proc macros. The sup­port crates for these (syn and quote) take maybe 10s to com­pile, and don’t di­rectly im­pact ex­e­cutable size. It’s a ma­jor boost to the ex­pres­siv­ity of the Rust lan­guage, and we’ll likely use them in druid, though have been dis­cussing mak­ing them op­tional.

What I’d per­son­ally like to see is proc macros sta­bi­lize more and then be adopted into the lan­guage.

Digging into xi-ed­i­tor, the biggest sin­gle source of bloat is serde, and in gen­eral the fact that it se­ri­al­izes every­thing into JSON mes­sages. This was some­thing of an ex­per­i­ment, and in ret­ro­spect I would say one of the things I’m most un­happy about. It seems that ef­fi­cient se­ri­al­iza­tion is not a solved prob­lem yet. [Note also that JSON se­ri­al­iza­tion is ex­tremely slow in Swift]

The par­tic­u­lar rea­son serde is so bloated is that it monomor­phizes every­thing. There are al­ter­na­tives; minis­erde in par­tic­u­lar yields smaller bi­na­ries and com­pile times by us­ing dy­namic dis­patch (trait ob­jects) in place of monomor­phiza­tion. But it has other lim­i­ta­tions and so has­n’t caught on yet.

In gen­eral, overuse of poly­mor­phism is a lead­ing cause of bloat. For ex­am­ple, resvg switched from lyon to kurbo [Note added: RazrFalcon points out that the big con­tri­bu­tion to lyon com­pile times is proc macros, not poly­mor­phism, and that’s since been fixed]. We don’t adopt the lyon / eu­clid ecosys­tem, also for this rea­son, which is some­thing of a shame be­cause now there’s more frag­men­ta­tion. When work­ing on kurbo, I did ex­per­i­ments in­di­cat­ing there was no real ben­e­fit to al­low­ing float­ing point types other than f64, so just de­cided that would be the type for co­or­di­nates. I’m happy with this choice.

For a va­ri­ety of rea­sons, async code is con­sid­er­ably slower to com­pile than cor­re­spond­ing sync code, though the com­piler team has been mak­ing great progress. Even though async/​await is the shiny new fea­ture, it’s im­por­tant to re­al­ize that old-fash­ioned sync code is still bet­ter in a lot of cases. Sure, if you’re writ­ing high-scale Internet servers, you need async, but there are a lot of other cases.

I’ll pick on Zola for this one. A re­lease build is over 9 min­utes and 15M in size. (Debug builds are about twice as fast but 3-5x big­ger). Watching the com­pile (over 400 crates to­tal!) it’s clear that its web serv­ing (actix based) ac­counts for a lot of that, pulling in a big chunk of the tokio ecosys­tem as well. For just pre­view­ing sta­tic web­sites built with the tool, it might be overkill. That said, for this par­tic­u­lar ap­pli­ca­tion per­haps bloat is not as im­por­tant, and there are ben­e­fits to us­ing a pop­u­lar, fea­ture­ful web serv­ing frame­work.

As a re­sult, I’ve cho­sen not to use async in druid, but rather a sim­pler, sin­gle-threaded ap­proach, even though async ap­proaches have been pro­posed.

It’s com­mon for a crate to have some core func­tion­al­ity, then other stuff that only some users will want. I think it’s a great idea to have op­tional de­pen­den­cies. For ex­am­ple, xi-rope had the abil­ity to se­ri­al­ize deltas to JSON be­cause we used that in xi-ed­i­tor, but that’s a very heavy­weight de­pen­dency for peo­ple who just want an ef­fi­cient data struc­ture for large strings. So we made that op­tional.

An al­ter­na­tive is to frag­ment the crate into finer grains; rand is a par­tic­u­lar of­fender here, as it’s not un­com­mon to see 10 sub­crates in a build. We’ve found that hav­ing lots of sub­crates of­ten makes life harder for users be­cause of the in­creased co­or­di­na­tion work mak­ing sure ver­sions are com­pat­i­ble.

Another crate that of­ten shows up in Rust builds is phf, an im­ple­men­ta­tion of per­fect hash­ing. That’s of­ten a great idea and what you want in your bi­na­ries, but it also ac­counts for ~13s of com­pile time when us­ing the macro ver­sion (again bring­ing in two sep­a­rate copies of quote and syn). [Note added: sfack­ler points out that you can use phf-code­gen to gen­er­ate Rust source and check that into your re­pos.]

For op­ti­miz­ing com­pile times in uni­code-nor­mal­iza­tion, I de­cided to build the hash ta­bles us­ing a cus­tom tool, and check those into the repo. That way, the work is done only when the data ac­tu­ally changes (about once a year, as Unicode revs), as op­posed to every sin­gle com­pile. I’m proud of this work, as it im­proved the com­pile time for uni­code-nor­mal­iza­tion by about 3x, and I do con­sider that an im­por­tant foun­da­tional crate.

Compile time and ex­e­cutable size are as­pects of per­for­mance (even though of­ten not as vis­i­ble as run­time speed), and per­for­mance cul­ture ap­plies. Always mea­sure, us­ing tools like cargo-bloat where ap­pro­pri­ate, and keep track of re­gres­sions.

A good case study for cargo-bloat is clap, though it’s still pretty heavy­weight to­day (it ac­counts for about 1M of Zola’s de­bug build, mea­sured on ma­cOS).

There’s also an ef­fort to an­a­lyze bi­nary sizes more sys­tem­at­i­cally. I ap­plaud such ef­forts and would love it if they were even more vis­i­ble. Ideally, crates.io would in­clude some kind of bloat re­port along with its other meta­data, al­though us­ing fully au­to­mated tools has lim­i­ta­tions (for ex­am­ple, a hello world” ex­am­ple us­ing clap might be pretty mod­est, but one with hun­dreds of op­tions might be huge).

Once you ac­cept bloat, it’s very hard to claw it back. If your pro­ject has multi-minute com­piles, peo­ple won’t even no­tice a 10s re­gres­sion in com­pile time. Then these pile up, and it gets harder and harder to mo­ti­vate the work to re­duce bloat, be­cause each sec­ond gained in com­pile time be­comes such a small frac­tion of the to­tal.

As druid de­vel­ops into a real GUI, I’ll be fac­ing many more of these kinds of choices, and both com­pile times and ex­e­cutable sizes will in­evitably get larger. But avoid­ing bloat is just an­other place to ap­ply en­gi­neer­ing skill. In writ­ing this blog post, I’m hop­ing to raise aware­ness of the is­sue, give use­ful tips, and en­list the help of the com­mu­nity to keep the Rust ecosys­tem as bloat-free as pos­si­ble.

As with all en­gi­neer­ing, it’s a mat­ter of trade­offs. Which is more im­por­tant for druid, hav­ing fast com­piles, or be­ing on board with the abun­dance of fea­tures pro­vided by the Rust ecosys­tem such as flu­ent? That does­n’t have an ob­vi­ous an­swer, so I in­tend to mostly lis­ten to feed­back from users and other de­vel­op­ers.

...

Read the original on raphlinus.github.io »

6 341 shares, 5 trendiness, 3186 words and 30 minutes reading time

Did North Dakota Regulators Hide an Oil and Gas Industry Spill Larger Than Exxon Valdez?

In July 2015 work­ers at the Garden Creek I Gas Processing Plant, in Watford City, North Dakota, no­ticed a leak in a pipeline and re­ported a spill to the North Dakota Department of Health that re­mains of­fi­cially listed as 10 gal­lons, the size of two bot­tled wa­ter de­liv­ery jugs.

But a whis­tle-blower has re­vealed to DeSmog the in­ci­dent is ac­tu­ally on par with the 1989 Exxon Valdez oil spill in Alaska, which re­leased roughly 11 mil­lion gal­lons of thick crude.

The Garden Creek spill is in fact over 11 mil­lion gal­lons of con­den­sate that leaked through a crack in a pipeline for over 3 years,” says the whis­tle-blower, who has ex­per­tise in en­vi­ron­men­tal sci­ence but re­fused to be named or give other back­ground in­for­ma­tion for fear of los­ing their job. They pro­vided to DeSmog a doc­u­ment that de­tails re­me­di­a­tion ef­forts and ver­i­fies the spill’s mon­strous size.

Up to 5,500,000 gal­lons” of hy­dro­car­bons have been re­moved from the site, the 2018 doc­u­ment states, based upon an…es­ti­mate of ap­prox­i­mately 11 mil­lion gal­lons re­leased.”

Garden Creek is op­er­ated by the Oklahoma-based oil and gas ser­vice com­pany, ONEOK Partners, and processes nat­ural gas and nat­ural gas liq­uids, also called nat­ural gas con­den­sate, brought to the fa­cil­ity via pipeline from Bakken wells.

Neither the National Oceanic and Atmospheric Administration (NOAA), which mon­i­tors coastal spills, nor the Environmental Protection Agency (EPA) could pro­vide records to put the spill’s size in con­text, but ac­cord­ing to avail­able re­ports, if the 11-million-gallon fig­ure is ac­cu­rate, the Garden Creek spill ap­pears to be among the largest recorded oil and gas in­dus­try spills in the his­tory of the United States.

However, the American pub­lic is un­aware, be­cause the spill re­mains of­fi­cially listed as just 10 gal­lons. That is de­spite the fact that a North Dakota reg­u­la­tor has ac­knowl­edged the spill was much larger, and even the of­fi­cial record, right af­ter stat­ing the spill was 10 gal­lons, notes that the area was saturated with nat­ural gas con­den­sate of an un­known vol­ume,” and thus may have been larger.

Scott Skokos, Executive Director of the Dakota Resource Council, an or­ga­ni­za­tion that works to pro­tect North Dakota’s nat­ural re­sources and fam­ily farms, ques­tioned whether it was le­gal for the state to cover up or down­play spills.

I have seen many in­stances where it ap­pears spills are be­ing cov­ered up, and there ap­pears to be a pat­tern of down­play­ing spills, which makes the nar­ra­tive sur­round­ing oil and gas de­vel­op­ment look rosy and makes the in­dus­try look bet­ter po­lit­i­cally,” says Skokos. If this pat­tern is as wide­spread as it seems, then we have a gov­ern­ment that is con­spir­ing to pro­tect the oil in­dus­try. This is not only reck­less and un­eth­i­cal, but also po­ten­tially il­le­gal.”

In my view,” Skokos added, this is not look­ing out for the best in­ter­est of the state or the peo­ple who live in the state, it is only look­ing out for cor­po­ra­tions. And these are not even cor­po­rate cit­i­zens of this state, they are cor­po­rate cit­i­zens of some­where else.”

Spills are per­va­sive in North Dakota’s oil in­dus­try and have been the fo­cus of nu­mer­ous me­dia re­ports. State reg­u­la­tors have of­ten been un­able — or un­will­ing — to com­pel en­ergy com­pa­nies to clean up their mess,” ProPublica re­ported in a 2012 investigation.

A 2015 Inside Energy ar­ti­cle noted state re­ports are rid­dled with in­ac­cu­ra­cies and es­ti­mates” and cited a 2011 spill of oil and gas waste­water by a Texas-based com­pany listed as 12,600 gal­lons but later de­ter­mined to be at least two mil­lion gal­lons. An eight-year data­base of spills com­piled by the New York Times in 2014 showed two spills of roughly one mil­lion gal­lons.

But no news agency has re­ported on any spill in North Dakota near the mag­ni­tude of Garden Creek.

Pumpjacks and flar­ing in McKenzie County, North Dakota, east of Arnegard and west of Watford City. Credit: Tim Evanson,

Gas pro­cess­ing plants are sprawl­ing in­dus­trial fa­cil­i­ties and con­tain nu­mer­ous pipes and tow­ers that help clean and sep­a­rate the stream of nat­ural gas and nat­ural gas liq­uids like ethane, bu­tane, and propane car­ried in gath­er­ing pipelines that orig­i­nate at well­heads.

The ex­plo­sion of frack­ing across the U. S. and the boom­ing de­vel­op­ment of America’s gas-rich shale plays have planted gas pro­cess­ing plants, which emit a near-con­tin­u­ous stream of green­house gases and car­cino­gens, from the Pittsburgh sub­urbs and Ohio’s Amish coun­try to the high plains of Colorado and the bad­lands of North Dakota.

There should be on­go­ing in­ves­ti­ga­tions of these fa­cil­i­ties reg­u­larly,” says Emily Collins, Executive Director of Fair Shake, an Ohio-based non­profit en­vi­ron­men­tal law firm. But there is­n’t.

There is so much to keep track of for these reg­u­la­tors that spills, among other things, are lost in the mix,” says Collins. The old for­mula of hav­ing in­spec­tions and in­ves­ti­ga­tions where you show up once a year clearly does­n’t work here, not with the pace, not with how many places are at is­sue all of the sud­den. We are just not able to han­dle it all.”

Map of west­ern North Dakota that in­cludes well den­sity (number of wells per 5 km ra­dius), re­ported brine spills from 2007 to 2015 (red cir­cles), and sam­pling sites of sam­ples col­lected in July 2015 (green tri­an­gles). Credit: Lauer et al. 2016

Meanwhile, ex­am­i­na­tion of the in­dus­try, its spills, and its placid reg­u­la­tors has made its way to the U. S. Congress. The Subcommittee on Energy and Mineral Resources of the House’s Natural Resource Committee has been hold­ing hear­ings on the im­pacts of oil and gas de­vel­op­ment on lo­cal com­mu­ni­ties, landown­ers, tax­pay­ers, and the en­vi­ron­ment.

In May, Collins tes­ti­fied be­fore the sub­com­mit­tee, along with 71-year-old North Dakota farmer Daryl Peterson. He shared har­row­ing sto­ries about decades of spills of toxic oil and gas in­dus­try waste on his farm­land, and the ut­ter ne­glect of the is­sue by his state’s reg­u­la­tors.

In my ex­pe­ri­ence, reg­u­la­tors have been re­luc­tant to en­force com­pli­ance,” Peterson told Congress. And have min­i­mized the im­pacts, rather than hold­ing the oil com­pa­nies ac­count­able.”

On April 29, 2019, over­sight of spills shifted from the North Dakota Department of Health to a new agency, the Department of Environmental Quality, but the state’s Spill Investigation Program Manager has re­mained Bill Suess.

I know for a fact that Bill Suess was made aware of Garden Creek’s size in October of 2018 af­ter a 3-year in­ves­ti­ga­tion was com­pleted to as­sess size and scope,” the whis­tle-blower told DeSmog. Bill and state staff were pre­sented an up­dated ver­sion of the spill size…at the state Gold Seal build­ing in a PowerPoint presentation.”

In a phone con­ver­sa­tion with DeSmog in mid-July, Suess ex­plained that he had never seen a doc­u­ment show­ing the spill’s size to be any num­ber other than 10 gal­lons, and he re­jected the fact that the spill was 11 mil­lion gal­lons.

That would be by far the largest spill on land in U. S. his­tory. I mean you are talk­ing 261,000 bar­rels,” Suess said. That would be sig­nif­i­cant, and I will guar­an­tee you it is not that vol­ume. I have re­ceived no doc­u­men­ta­tion and I have no sci­en­tific ev­i­dence to show it is any­where near that vol­ume.”

Suess read­ily ac­knowl­edged that the of­fi­cially listed spill size was too low. We know it is sig­nif­i­cantly big­ger than 10 gal­lons. We have known that since Day One,” Suess con­tin­ued. Yet he de­fended the state’s de­ci­sion to con­tinue to list the spill as just 10 gallons.

In North Dakota we do not reg­u­late based on vol­ume,” Suess added. Whether we put a 10 there, a 100 there, a 1,000 there is not go­ing to change our re­sponse to the spill, it is not go­ing to change what the re­spon­si­ble party has to do, not go­ing to change their re­me­di­a­tion, it is not go­ing to change any­thing other than your cu­rios­ity.”

Crestwood dis­cov­ered a 1 mil­lion gal­lon brine spill from its Arrow pipeline on July 8, 2014. Located north of Mandaree, North Dakota, on the Fort Berthold Reservation. Mandaree is one of the six seg­ments on Fort Berthold and where most Mandan and Hidatsa peo­ple live. Courtesy of Lisa DeVille

DeSmog pre­sented de­tails of the Garden Creek spill to North Dakota en­vi­ron­men­tal at­tor­ney Fintan Dooley, who leads the North Dakota Salted Lands Council, an or­ga­ni­za­tion ded­i­cated to re­me­di­at­ing spills.

You got a big fish hooked here,” he said. This has all the signs of a civil con­spir­acy. If in­stead of 10, it was 110 or 1010 gal­lons, one could make the de­ter­mi­na­tion the orig­i­nal re­port was a mis­take, but to leave un­cor­rected a mis­take this big is not an ac­ci­dent, it smells of de­cep­tion and de­lib­er­a­tion and this is not the first in­ci­dent of de­cep­tive record-keep­ing in North Dakota — I think a good ques­tion to ask is, how many state of­fi­cials are im­pli­cated in cov­er­ing up this story?”

The North Dakota Century Code, which con­tains all state laws, cov­ers per­jury, fal­si­fi­ca­tion, and breach of duty in Chapter 12.1-11. Subsection 05, Tampering with pub­lic records,” states the fol­low­ing:

A per­son is guilty of an of­fense if he: a. Knowingly makes a false en­try in or false al­ter­ation of a gov­ern­ment record; or b. Knowingly, with­out law­ful au­thor­ity, de­stroys, con­ceals, re­moves, or oth­er­wise im­pairs the ver­ity or avail­abil­ity of a gov­ern­ment record.”

The of­fense, if com­mit­ted by a pub­lic ser­vant who has cus­tody of the gov­ern­ment record,” is a felony. The crime car­ries a pos­si­ble five-year prison sen­tence.

DeSmog con­fronted Suess with this por­tion of the code, and asked him if he be­lieved he, or some­one, was guilty of fal­si­fy­ing gov­ern­ment records. No, I am not guilty, but if I changed that num­ber I would be,” he said. If I were to go in there and just change that [10 gal­lons] to a larger num­ber that I don’t have any sci­en­tific ev­i­dence or doc­u­men­ta­tion for, then I would be fal­si­fy­ing it.”

The en­vi­ron­men­tal at­tor­ney Fintan Dooley does not buy that of­fi­cials be­haved ap­pro­pri­ately. There has been a lot of talk around the state capi­tol lately about of­fi­cial breach of pub­lic trust, and I am just won­der­ing how far this prac­tice of fal­si­fi­ca­tion of records will be al­lowed to go?” he said. The whole thing can be pros­e­cuted, and if this pre­sents an op­por­tu­nity to pros­e­cute, I think that is just won­der­ful.” Any de­ci­sions re­gard­ing pros­e­cu­tion, he stresses, are up to a state at­tor­ney.

When asked ex­actly who would be charged with a crime, Dooley said, If any­one is go­ing to file a crim­i­nal charge, they must file it against an in­di­vid­ual. If there was a whole se­ries of peo­ple in­volved, the best prac­tice would be to iden­tify all of them.”

Natural gas flares from a flare-head at the Orvis State well on the Evanson fam­ily farm in McKenzie County, North Dakota, west of Watford City. Credit: Tim Evanson,  - 2.0

Garden Creek I be­came op­er­a­tional in January 2012. The pro­ject was ap­plauded by state and in­dus­try of­fi­cials for its abil­ity to re­duce the re­lease of the promi­nent green­house gas methane in the oil­field by con­tain­ing and pro­cess­ing that and other nat­ural gas byprod­ucts. Flaring, or burn­ing, nat­ural gas is com­mon in the re­gion’s oil­fields.

The com­ple­tion of this fa­cil­ity is a pos­i­tive step to­ward re­duc­ing flar­ing ac­tiv­i­ties in North Dakota,” ONEOK pres­i­dent Terry Spencer told a Watford City news­pa­per in 2012. In 2015, at the time the spill was no­ticed, ONEOK was in the process of con­struct­ing a net­work of ad­di­tional gas pro­cess­ing plants across the Bakken. In one in­dus­try press re­lease, the com­pany bragged of better-than-expected plant per­for­mance at ex­ist­ing and planned pro­cess­ing plants.”

There was mo­tive to cover up the ac­tual size of the spill to al­low their in­fra­struc­ture to be com­pleted,” says the whis­tle-blower. Furthermore, by the sum­mer of 2016, as the cleanup at Garden Creek I was mov­ing along, protests against the con­struc­tion of the Dakota Access pipeline (DAPL) at the Standing Rock Sioux Indian Reservation were in full swing. One ma­jor con­cern voiced by the tribe was that a spill could de­stroy farm­land and con­t­a­m­i­nate drink­ing wa­ter for thou­sands of peo­ple.

On August 31, 2016, Happy” American Horse from the Sicangu Nation locked him­self to con­struc­tion equip­ment as a di­rect ac­tion against the Dakota Access pipeline. Credit: De­siree Kane,   3.0

Public out­cry against gas col­lec­tion could have threat­ened ONEOKs ex­pan­sion plans and might have stood in the way of the state’s flar­ing re­duc­tion goals,” says the whis­tle-blower. It’s also pos­si­ble that it could have fur­ther gal­va­nized pub­lic opin­ion against the DAPL pro­ject. In short, it’s pos­si­ble that the North Dakota Department of Health faced heavy pres­sure from both state and in­dus­try to keep this on the down low.”

David Glatt, Director of North Dakota’s Department of Environmental Quality, said, The state makes pub­lic all spill re­ports it re­ceives, so there is no un­der re­port­ing by the state.” ONEOK has not re­sponded to DeSmog’s ques­tions on this in­ci­dent. DeSmog has filed an open records re­quest with the State of North Dakota for ad­di­tional in­for­ma­tion and de­tails re­lated to the Garden Creek I spill.

In July, Suess told DeSmog, Remediation is still on­go­ing. It is go­ing to be a slow process, it will be a few years, I think.” Suess said he was plan­ning to re­visit the spill site but did not ex­pect any­thing he found there would lead him to al­ter the of­fi­cially recorded spill size. I have a sched­ule to go out there later this month, but I still prob­a­bly would­n’t change that 10-gallon num­ber be­cause I still won’t have an ac­cu­rate num­ber,” he said.

In May, just as North Dakota’s plant­ing sea­son was be­gin­ning, I met with sev­eral North Dakota res­i­dents whose farms or com­mu­ni­ties had been marred by oil and gas in­dus­try spills, in­clud­ing the land of farmer Daryl Peterson, whose 2,500 acres of grains, soy­beans, and corn have been con­t­a­m­i­nated by more than a dozen spills of brine.

This oil and gas waste prod­uct is loaded with salt and also con­tains toxic heavy met­als and ra­dioac­tiv­ity. Peterson pointed to dead zones on his land that are un­fit for crops though still fit for gov­ern­ment taxes. The spills have also tainted his ground­wa­ter.

Daryl Peterson’s North Dakota farm has suf­fered from more than a dozen oil and gas in­dus­try brine spills. Courtesy of Daryl Peterson

State reg­u­la­tors de­clare most spills are cleaned up to EPA stan­dards and land pro­duc­tiv­ity is re­stored but very of­ten this has not been the case,” said Peterson, who, to­gether with his wife Christine, has farmed this land in Bottineau County, near the Canadian bor­der, for more than 40 years.

The oil in­dus­try con­trols pol­i­tics in North Dakota and long-term con­se­quences to our pre­cious land, air, and wa­ter re­sources are be­ing ig­nored with this gold rush men­tal­ity. With the prospect of 40,000 more wells in North Dakota, the fu­ture of our boun­ti­ful agri­cul­ture state is in great jeop­ardy,” said Pe­ter­son.

Suess de­fended his agen­cy’s meth­ods. What I be­lieve the North Dakota pub­lic wants to know is not how big is it, but is this spill a risk to me,” he said. Personally, I have ac­tu­ally been told by oth­ers that we are one of the most trans­par­ent agen­cies out there. My boss is the North Dakota tax­payer, and my door is al­ways open, any cit­i­zen can walk in at any time and talk to me.”

However, other North Dakota res­i­dents deal­ing with spills strongly dis­agree. In May DeSmog also toured spills on the Fort Berthold Indian Reservation, in the heart of the Bakken oil boom in west­ern North Dakota, with Lisa DeVille and her hus­band Walter DeVille Sr. The cou­ple lives in the com­mu­nity of Mandaree and helps lead an en­vi­ron­men­tal ad­vo­cacy group called Fort Berthold Protectors of Water & Earth Rights, or POWER.

You can see the earth slowly dy­ing,” said Lisa, who has two mas­ter’s de­grees in busi­ness and re­turned to school to get a bach­e­lor’s* de­gree in en­vi­ron­men­tal sci­ence so she could bet­ter mon­i­tor all the spills and con­t­a­m­i­na­tion on her land and ad­vo­cate for her com­mu­nity.

Every day we have a spill,” she said. Whether it is frac sand spilled, trucks that stall out and drop their oil on roads, trucks wreck­ing on the road and spilling oil and gas waste prod­uct, or our in­vis­i­ble spill, the methane re­leased into the air from flar­ing and vent­ing.”

Aerial view of a 1 mil­lion gal­lon brine spill from Crestwood’s Arrow pipeline on July 8, 2014. Located north of Mandaree, North Dakota, on the Fort Berthold Reservation. Mandaree is one of the six seg­ments on Fort Berthold and where most Mandan and Hidatsa peo­ples live. Photo credit: Sarah Christianson

The North Dakota Spill Investigation Program Manager can say that his door is open, but North Dakota is pro­tect­ing in­dus­try, not peo­ple, and it is up­set­ting to me,” Lisa added.

My peo­ple — the Mandan, Hidatsa, and Arikara Nation — have been here for cen­turies, there have been many bro­ken promises, and they have been lied to and are still be­ing lied to about all this oil and gas con­t­a­m­i­na­tion. No one knows the amount of spills on Fort Berthold be­cause in­dus­try will lie to our tribal lead­ers. Also, there is no data for the pub­lic to see. There are no stud­ies, re­search, or analy­sis to cre­ate laws or codes for en­vi­ron­men­tal jus­tice.”

In July 2014, one mil­lion gal­lons of oil and gas waste spilled from a pipeline and into a ravine that drains into the tribe’s main reser­voir for drink­ing wa­ter. In a 2016 pa­per, Duke University re­searchers, in­clud­ing geo­chemist Avner Vengosh, re­vealed the spill, as well as sev­eral oth­ers in the Bakken, had laced the land with heavy met­als and ra­dioac­tiv­ity.

When asked in May 2019 if he was aware of this re­search, Glatt, di­rec­tor of the North Dakota Department of Environmental Quality, said he ques­tioned Vengosh’s initial premise” and be­lieved the re­searchers were looking for the worst case sce­nario.”

I haven’t seen his re­port; I just did­n’t even know it was out there,” said Glatt. I knew he was in the state. This is the first time I hear that he wrote a re­port.”

As law­suits against the oil and gas in­dus­try for cli­mate im­pacts con­tinue and a grow­ing web of grass­roots groups spot­light the in­dus­try’s wide arc of pol­lu­tion, the un­cov­er­ing of the oil and gas in­dus­try’s vast closet of toxic skele­tons seems in­evitable.

Ultimately I am fed up with the rushed drilling pro­grams and the lack of ac­count­abil­ity when it comes to en­vi­ron­men­tal im­pacts,” says the whis­tle-blower. I am also dis­gusted with how state of­fi­cials and city coun­cil mem­bers view these threats and deem it ac­cept­able to po­ten­tially harm hu­man health.”

Why, the whis­tle-blower added, are we shield­ing the truth from pub­lic scrutiny?”

*Updated 8/20/19: This story has been up­dated to cor­rect Lisa DeVille’s de­gree in en­vi­ron­men­tal sci­ence, which is a bach­e­lor’s, not a mas­ter’s.

...

Read the original on www.desmogblog.com »

7 322 shares, 21 trendiness, 4151 words and 38 minutes reading time

Building a distributed time-series database on PostgreSQL

Today we are an­nounc­ing the dis­trib­uted ver­sion of TimescaleDB, which is cur­rently in pri­vate beta (public ver­sion slated for later this year).TimescaleDB, a time-se­ries data­base on PostgreSQL, has been pro­duc­tion-ready for over two years, with mil­lions of down­loads and pro­duc­tion de­ploy­ments world­wide. Today, for the first time, we are pub­licly shar­ing our de­sign, plans, and bench­marks for the dis­trib­uted ver­sion of TimescaleDB. First re­leased 30+ years ago, PostgreSQL to­day is mak­ing an un­de­ni­able come­back. It is the fastest grow­ing data­base right now, faster than MongoDB, Redis, MySQL, and oth­ers. PostgreSQL it­self has also ma­tured and broad­ened in ca­pa­bil­i­ties, thanks to a core group of main­tain­ers and a grow­ing com­mu­nity. Yet if one main crit­i­cism of PostgreSQL ex­ists, it is that hor­i­zon­tally scal­ing out work­loads to mul­ti­ple ma­chines is quite chal­leng­ing. While sev­eral PostgreSQL pro­jects have de­vel­oped scale-out op­tions for OLTP work­loads, time-se­ries work­loads, which we spe­cial­ize in, rep­re­sent a dif­fer­ent kind of prob­lem.Sim­ply put, time-se­ries work­loads are dif­fer­ent than typ­i­cal data­base (OLTP) work­loads. This is for sev­eral rea­sons: writes are in­sert, not up­date heavy, and those in­serts are typ­i­cally to re­cent time ranges; reads are typ­i­cally on con­tin­u­ous time-ranges, not ran­dom; writes and reads typ­i­cally hap­pen in­de­pen­dently, rarely in the same trans­ac­tion. Also, time-se­ries in­sert vol­umes tend to be far higher and data tends to ac­cu­mu­late far more quickly than in OLTP. So scal­ing writes, reads, and stor­age is a stan­dard con­cern for time se­ries. These were the same prin­ci­ples upon which we de­vel­oped and first launched TimescaleDB two years ago. Since then, de­vel­op­ers all over the world have been able to scale a sin­gle TimescaleDB node, with repli­cas for au­to­mated failover, to 2 mil­lion met­rics per sec­ond and 10s of ter­abytes of data stor­age. This has worked quite well for the vast ma­jor­ity of our users. But of course, work­loads grow and soft­ware de­vel­op­ers (including us!) al­ways want more. What we need is a dis­trib­uted sys­tem on PostgreSQL for time-se­ries work­loads.Our new dis­trib­uted ar­chi­tec­ture, which a ded­i­cated team has been hard at work de­vel­op­ing since last year, is mo­ti­vated by a new vi­sion: scal­ing to over 10 mil­lion met­rics a sec­ond, stor­ing petabytes of data, and pro­cess­ing queries even faster via bet­ter par­al­leliza­tion. Essentially, a sys­tem that can grow with you and your time-se­ries work­loads.Most data­base sys­tems that scale-out to mul­ti­ple nodes rely on hor­i­zon­tally par­ti­tion­ing data by one di­men­sion into shards, each of which can be stored on a sep­a­rate node.We chose not to im­ple­ment tra­di­tional shard­ing for scal­ing-out TimescaleDB. Instead, we em­braced a core con­cept from our sin­gle-node ar­chi­tec­ture: the chunk. Chunks are cre­ated by au­to­mat­i­cally par­ti­tion­ing data by mul­ti­ple di­men­sions (one of which is time). This is done in an fine-grain way such that one dataset may be com­prised of 1000s of chunks, even on a sin­gle node. Unlike shard­ing, which typ­i­cally only en­ables scale-out, chunk­ing is quite pow­er­ful in its abil­ity to en­able a broad set of ca­pa­bil­i­ties. For ex­am­ple:

Scale-up (on same node) and scale-out (across mul­ti­ple nodes)Elas­tic­ity: Adding and delet­ing nodes by hav­ing data grow onto new nodes and age out of old ones­Par­ti­tion­ing flex­i­bil­ity: Changing the chunk size, or par­ti­tion­ing di­men­sions, with­out down­time (e.g., to ac­count for in­creased in­sert rates or ad­di­tional nodes)Data re­order­ing: Writing data in one or­der (e.g., by time) based on write pat­terns, but then rewrit­ing it later in an­other or­der (e.g., de­vice_id) based on query pat­ternsA much more de­tailed dis­cus­sion is later in this post.While we plan to start pub­lish­ing more bench­marks over the next few months, we wanted to share some early re­sults demon­strat­ing our dis­trib­uted ar­chi­tec­ture’s abil­ity to sus­tain high write rates. As you can see, at 9 nodes the sys­tem achieves an in­sert rate well over 12 mil­lion met­rics a sec­ond:TimescaleDB run­ning the open-source Time Series Benchmarking Suite, de­ployed on AWS run­ning m5.2xlarge data nodes and a m5.12xlarge ac­cess node, both with stan­dard EBS gp2 stor­age. More on ac­cess and data nodes later in the post.This multi-node ver­sion of TimescaleDB is cur­rently in pri­vate beta. If you’d like to join the pri­vate beta, please fill out this form. You can also view the doc­u­men­ta­tion here. The rest of this post de­scribes the un­der­ly­ing de­sign prin­ci­ples of our dis­trib­uted ar­chi­tec­ture and how it works. There is also a FAQ at the end of the post with an­swers to ques­tions we com­monly hear about this ar­chi­tec­ture. Please con­tinue read­ing to learn more.The five ob­jec­tives of data­base scal­ing­Based on our own ex­pe­ri­ence, com­bined with our in­ter­ac­tions with TimescaleDB users, we have iden­ti­fied five ob­jec­tives for scal­ing a data­base for time-se­ries work­loads:

Total stor­age vol­ume: Scaling to larger amounts of data un­der man­age­mentIn­sert rate: Supporting higher in­ges­tion rates of rows or dat­a­points per sec­ond­Query con­cur­rency: Supporting larger num­bers of con­cur­rent queries, some­times via data repli­ca­tion­Query la­tency: Reducing the la­tency to ac­cess a large vol­ume of data to han­dle a sin­gle query, typ­i­cally through query par­al­leliza­tion­Fault tol­er­ance: Storing the same por­tion of data on mul­ti­ple servers/​disks, with some au­toma­tion for failover in case of fail­ure­To­day, TimescaleDB lever­ages PostgreSQL stream­ing repli­ca­tion for pri­mary/​replica clus­ter­ing: There is a sin­gle pri­mary node that ac­cepts all writes, which then streams its data (more specif­i­cally, its Write Ahead Log) to one or more repli­cas. But ul­ti­mately, TimescaleDB us­ing PostgreSQL stream­ing repli­ca­tion re­quires that each replica store a full copy of the dataset, and the ar­chi­tec­ture maxes out its in­gest rate at the pri­ma­ry’s write rate and its query la­tency at a sin­gle node’s CPU/IOPS rate.While this ar­chi­tec­ture has worked well so far for our users, we can do even bet­ter.In com­puter sci­ence, the key to solv­ing big prob­lems is break­ing them down into smaller pieces and then solv­ing each of those sub-prob­lems, prefer­ably, in par­al­lel. In TimescaleDB, chunk­ing is the mech­a­nism by which we break down a prob­lem and scale PostgreSQL for time-se­ries work­loads. More specif­i­cally, TimescaleDB al­ready au­to­mat­i­cally par­ti­tions a table across mul­ti­ple chunks on the same in­stance, whether on the same or dif­fer­ent disks. But man­ag­ing lots of chunks (i.e., sub-problems”) can also be a daunt­ing task so we came up with the hy­per­table ab­strac­tion to make par­ti­tioned ta­bles easy to use and man­age. Now, in or­der to take the next step and scale to mul­ti­ple nodes, we are adding the ab­strac­tion of a dis­trib­uted hy­per­table. Fortunately, hy­per­ta­bles ex­tend nat­u­rally to mul­ti­ple nodes: in­stead of cre­at­ing chunks on the same in­stance, we now place them across dif­fer­ent in­stances.Still, dis­trib­uted hy­per­ta­bles pose new chal­lenges in terms of man­age­ment and us­abil­ity when op­er­at­ing at scale. To stay true to every­thing that makes hy­per­ta­bles great, we care­fully de­signed our sys­tem around the fol­low­ing prin­ci­ples.Use ex­ist­ing ab­strac­tions: Hypertables and chunk­ing ex­tend nat­u­rally to mul­ti­ple nodes. By build­ing on these ex­ist­ing ab­strac­tions, to­gether with ex­ist­ing PostgreSQL ca­pa­bil­i­ties, we pro­vide a ro­bust and fa­mil­iar foun­da­tion for scale-out clus­ter­ing.

Be trans­par­ent: From a user’s per­spec­tive, in­ter­act­ing with dis­trib­uted hy­per­ta­bles should be akin to work­ing with reg­u­lar hy­per­ta­bles, e.g., fa­mil­iar en­vi­ron­ment, com­mands, func­tion­al­ity, meta­data, and ta­bles. Users need not be aware that they are in­ter­act­ing with a dis­trib­uted sys­tem and should not need to take spe­cial ac­tions when do­ing so (e.g., ap­pli­ca­tion-aware shard man­age­ment).

Scale ac­cess and stor­age in­de­pen­dently: Given that ac­cess and stor­age needs vary across work­loads (and time), the sys­tem should be able to scale ac­cess and stor­age in­de­pen­dently. One way of do­ing this is via two types of data­base nodes, one for ex­ter­nal ac­cess (“access node”) and an­other for data stor­age (“data node”).

Be easy to op­er­ate: A sin­gle in­stance should be able to func­tion as ei­ther an ac­cess node or data node (or even both at the same time), with suf­fi­cient meta­data and dis­cov­ery to al­low each node to play its nec­es­sary role.

Be easy to ex­pand: It should be easy to add new nodes to the sys­tem to in­crease ca­pac­ity, in­clud­ing up­grad­ing from a sin­gle-node de­ploy­ment (in which a sin­gle in­stance should seam­lessly be­come a data node in a multi-node de­ploy­ment).

Provide flex­i­bil­ity in data place­ment: The de­sign should ac­count for data repli­ca­tion and en­able the sys­tem to have sig­nif­i­cant flex­i­bil­ity in data place­ment. Such flex­i­bil­ity can sup­port col­lo­cated JOIN op­ti­miza­tions, het­ero­ge­neous nodes, data tier­ing, AZ-aware place­ment, and so forth. An in­stance serv­ing as an ac­cess node should also be able to act as a data node, as well as store non-dis­trib­uted ta­bles.

Support pro­duc­tion de­ploy­ments: The de­sign should sup­port high-avail­abil­ity de­ploy­ments, where data is repli­cated across mul­ti­ple servers and the sys­tem au­to­mat­i­cally de­tects and trans­par­ently re­cov­ers from any node fail­ures. Now, let’s un­der­stand how our dis­trib­uted ar­chi­tec­ture be­gins to fol­low these prin­ci­ples in prac­tice.Fol­low­ing the above de­sign prin­ci­ples, we built a multi-node data­base ar­chi­tec­ture that al­lows hy­per­ta­bles to dis­trib­ute across many nodes to achieve greater scale and per­for­mance. Users in­ter­act with dis­trib­uted hy­per­ta­bles in much the same way as they would with a reg­u­lar hy­per­table (which it­self looks just like a reg­u­lar Postgres table). As a re­sult, in­sert­ing data into or query­ing data from a dis­trib­uted hy­per­table looks iden­ti­cal to do­ing that with a stan­dard table. For in­stance, let’s con­sider a table with the fol­low­ing schema:CRE­ATE TABLE mea­sure­ments (

time TIMESTAMPTZ NOT NULL,

de­vice_id TEXT NOT NULL,

tem­per­a­ture DOUBLE PRECISION NULL,

hu­mid­ity DOUBLE PRECISION NULL

This table is turned into a dis­trib­uted hy­per­table by par­ti­tion­ing on both the time and de­vice_id columns:SE­LECT cre­ate_dis­trib­uted_hy­per­table(‘mea­sure­ments’, time’, device_id’);Following this com­mand, all the nor­mal table op­er­a­tions still work as ex­pected: in­serts, queries, schema mod­i­fi­ca­tions, etc. Users do not have to worry about tu­ple rout­ing, chunk (partition) cre­ation, load bal­anc­ing, and fail­ure re­cov­ery: the sys­tem han­dles all these con­cerns trans­par­ently. In fact, users can con­vert their ex­ist­ing hy­per­ta­bles into dis­trib­uted hy­per­ta­bles by seam­lessly in­cor­po­rat­ing their stand­alone TimescaleDB in­stance into a clus­ter. Now let’s look at the ar­chi­tec­ture that makes all of this pos­si­ble.At a high-level, our dis­trib­uted data­base ar­chi­tec­ture con­sists of ac­cess nodes, to which clients con­nect, and data nodes where data for the dis­trib­uted hy­per­table re­sides. (While we are ini­tially fo­cused on sup­port­ing a sin­gle ac­cess node with op­tional read repli­cas, our ar­chi­tec­ture will ex­tend to a log­i­cally dis­trib­uted ac­cess node in the fu­ture.)Both types of nodes run the same TimescaleDB/PostgreSQL stack, al­though in dif­fer­ent con­fig­u­ra­tions. In par­tic­u­lar, an ac­cess node needs meta­data (e.g., cat­a­log in­for­ma­tion) to track state across the data­base clus­ter, such as the nodes in the clus­ter and where data re­sides (stored as chunks”), so that the ac­cess node can in­sert data on the nodes that have match­ing chunks and per­form query plan­ning to ex­clude chunks, and ul­ti­mately en­tire nodes, from a query. While the ac­cess node has lots of knowl­edge about the state of the dis­trib­uted data­base, data nodes are dumb”: they are es­sen­tially sin­gle node in­stances that can be added and re­moved us­ing sim­ple ad­min­is­tra­tive func­tions on the ac­cess node.The main dif­fer­ence in cre­at­ing dis­trib­uted hy­per­ta­bles com­pared to reg­u­lar hy­per­ta­bles, how­ever, is that we rec­om­mend hav­ing a sec­ondary space” par­ti­tion­ing di­men­sion. While not a strict re­quire­ment, the ad­di­tional space” di­men­sion en­sures that data is evenly spread across all the data nodes when a table ex­pe­ri­ences (roughly) time-or­dered in­serts.The ad­van­tage of a multi-di­men­sional dis­trib­uted hy­per­table is il­lus­trated in the fig­ure above. With time-only par­ti­tion­ing, chunks for two time in­ter­vals (t1 and t2) are cre­ated on data nodes DN1 and DN2, in that or­der. While with multi-di­men­sional par­ti­tion­ing, chunks are cre­ated along the space di­men­sion on dif­fer­ent nodes for each time in­ter­val. Thus, in­serts to t1 are dis­trib­uted across mul­ti­ple nodes in­stead of just one of them.Not re­ally. While many of our goals are achieved by tra­di­tional (single di­men­sional) database shard­ing” ap­proaches (where the num­ber of shards are pro­por­tional to the num­ber of servers), dis­trib­uted hy­per­ta­bles are de­signed for multi-di­men­sional chunk­ing with a large num­ber of chunks (from 100s to 10,000s), of­fer­ing more flex­i­bil­ity in how chunks are dis­trib­uted across a clus­ter. On the other hand, tra­di­tional shards are typ­i­cally pre-cre­ated and tied from the start to in­di­vid­ual servers. Thus, adding new servers to a sharded sys­tem is of­ten a dif­fi­cult and dis­rup­tive process that might re­quire re­dis­trib­ut­ing (and lock­ing) large amounts of data. By con­trast, TimescaleDB’s multi-di­men­sional chunk­ing auto-cre­ates chunks, keeps re­cent data chunks in mem­ory, and pro­vides time-ori­ented data life­cy­cle man­age­ment (e.g., for data re­ten­tion, re­order­ing, or tier­ing poli­cies).In­creases ag­gre­gate disk IOPS by par­al­leliz­ing op­er­a­tions across mul­ti­ple nodes and disksE­las­ti­cally scales out to new data node­sChunks are au­to­mat­i­cally cre­ated and sized ac­cord­ing to the cur­rent par­ti­tion­ing con­fig­u­ra­tion. This can change over time: i.e., a new chunk can be sized or di­vided dif­fer­ently from a prior one, and both can co­ex­ist in the sys­tem. This al­lows a dis­trib­uted hy­per­table to seam­lessly ex­pand to new data nodes, by writ­ing re­cent chunks in a new par­ti­tion­ing con­fig­u­ra­tion that cov­ers ad­di­tional nodes, with­out af­fect­ing ex­ist­ing data or re­quir­ing lengthy lock­ing. Together with a re­ten­tion pol­icy that even­tu­ally drops old chunks, the clus­ter will re­bal­ance over time, as shown in the fig­ure be­low.This is a much less dis­rup­tive process than in a sim­i­lar sharded sys­tem since read locks are held on smaller chunks of data at a time. One might think that chunk­ing puts ad­di­tional bur­den on ap­pli­ca­tions and de­vel­op­ers. However, ap­pli­ca­tions in TimescaleDB do not in­ter­act di­rectly with chunks (and thus do not need to be aware of this par­ti­tion map­ping them­selves, un­like in some sharded sys­tems), nor does the sys­tem ex­pose dif­fer­ent ca­pa­bil­i­ties for chunks than the en­tire hy­per­table (e.g., in a num­ber of other stor­age sys­tems, one can ex­e­cute trans­ac­tions within shards but not across them). To il­lus­trate that this is the case, let’s look at how dis­trib­uted hy­per­ta­bles work in­ter­nally.How it works: the life of a re­quest (insert or query)Hav­ing learned about how to cre­ate dis­trib­uted hy­per­ta­bles and the un­der­ly­ing ar­chi­tec­ture, let’s look into the life of a re­quest”, to bet­ter un­der­stand the in­ter­ac­tions be­tween the ac­cess node and data nodes.In the fol­low­ing ex­am­ple, we will con­tinue to use the measurements” table we in­tro­duced ear­lier. To in­sert data into this dis­trib­uted hy­per­table, a sin­gle client con­nects to the ac­cess node and in­serts a batch of val­ues as nor­mal. Using a batch of val­ues is pre­ferred over row-by-row in­serts in or­der to achieve higher through­put. Such batch­ing is a very com­mon ar­chi­tec­tural id­iom, e.g., when in­gest­ing data into TimescaleDB from Kafka, Kinesis, IoT Hubs, or Telegraf.INSERT INTO mea­sure­ments VALUES

(‘2019-07-01 00:00:00.00-05’, A001, 70.0, 50.0),

(‘2019-07-01 00:00:00.10-05’, B015, 68.5, 49.7),

(‘2019-07-01 00:00:00.05-05’, D821, 69.4, 49.9),

(‘2019-07-01 00:00:01.01-05’, A001, 70.1, 50.0);Since measurements” is a dis­trib­uted hy­per­table, the ac­cess node does­n’t in­sert these rows lo­cally like it would with a reg­u­lar hy­per­table. Instead, it uses its cat­a­log in­for­ma­tion (metadata) to ul­ti­mately de­ter­mine the set of data nodes where the data should be stored. In par­tic­u­lar, for new rows to be in­serted, it first uses the val­ues of the par­ti­tion­ing columns (e.g., time and de­vice_id) to map each row to a chunk, and then de­ter­mines the set of rows that should be in­serted into each chunk, as shown in the fig­ure be­low.If an ap­pro­pri­ate chunk does not yet ex­ist for some of the rows, TimescaleDB will cre­ate new chunk(s) as part of the same in­sert trans­ac­tion, and then as­sign each new chunk to at least one data node. The ac­cess node cre­ates and as­signs new chunks along the space” di­men­sion (device_id), if such a di­men­sion ex­ists. Thus, each data node is re­spon­si­ble for only a sub­set of de­vices, but all of them will take on writes for the same time in­ter­vals.Af­ter the ac­cess node has writ­ten to each data node, it then ex­e­cutes a two-phase com­mit of these mini-batches to the in­volved data nodes so that all data be­long­ing to the orig­i­nal in­sert batch is in­serted atom­i­cally within one trans­ac­tion. This also en­sures that all the mini-batches can be rolled back in case of a fail­ure to in­sert on one of the data nodes (e.g., due to a data con­flict or failed data node).  The fol­low­ing shows the part of the SQL query that an in­di­vid­ual data node re­ceives, which is a sub­set of the rows in the orig­i­nal in­sert state­ment.IN­SERT INTO mea­sure­ments VALUES

(‘2019-07-01 00:00:00.00-05’, A001, 70.0, 50.0),

(‘2019-07-01 00:00:01.01-05’, A001, 70.1, 50.0); One nice thing about how TimescaleDB care­fully ties into the PostgreSQL query plan­ner is that it prop­erly ex­poses EXPLAIN in­for­ma­tion. You can EXPLAIN any re­quest (such as the INSERT re­quests above) and get full plan­ning in­for­ma­tion:EX­PLAIN (costs off, ver­bose)

INSERT INTO mea­sure­ments VALUES

(‘2019-07-01 00:00:00.00-05’, A001, 70.0, 50.0),

(‘2019-07-01 00:00:00.10-05’, B015, 68.5, 49.7),

(‘2019-07-01 00:00:00.05-05’, D821, 69.4, 49.9),

(‘2019-07-01 00:00:01.01-05’, A001, 70.1, 50.0);

QUERY PLAN

Custom Scan (HypertableInsert)

Insert on dis­trib­uted hy­per­table pub­lic.mea­sure­ments

Data nodes: da­ta_n­ode_1, da­ta_n­ode_2, da­ta_n­ode_3

-> Insert on pub­lic.mea­sure­ments

-> Custom Scan (DataNodeDispatch)

Output: *VALUES*”.column1, *VALUES*”.column2, *VALUES*”.column3, *VALUES*”.column4

Batch size: 1000

Remote SQL: INSERT INTO pub­lic.mea­sure­ments(“time”, de­vice_id, tem­per­a­ture, hu­mid­ity) VALUES ($1, $2, $3, $4), …, ($3997, $3998, $3999, $4000)

-> Custom Scan (ChunkDispatch)

Output: *VALUES*”.column1, *VALUES*”.column2, *VALUES*”.column3, *VALUES*”.column4

-> Values Scan on *VALUES*”

Output: *VALUES*”.column1, *VALUES*”.column2, *VALUES*”.column3, *VALUES*”.column4

(12 rows)In PostgreSQL, plans like the above one are trees where every node pro­duces a tu­ple (row of data) up the tree when the plan is ex­e­cuted. Essentially, a par­ent, which is sourced at the root, asks for new tu­ples un­til no more tu­ples can be pro­duced. In this par­tic­u­lar in­sert plan, tu­ples orig­i­nate at the ValueScan leaf node, which gen­er­ates a tu­ple from the orig­i­nal in­sert state­ment when­ever the ChunkDispatch par­ent asks for one. Whenever ChunkDispatch reads a tu­ple from its child, it routes” the tu­ple to a chunk, and cre­ates a chunk on a data node if nec­es­sary. The tu­ple is then handed up the tree to DataNodeDispatch that buffers the tu­ple in a per-node buffer as given by the chunk routed to in the pre­vi­ous step (every chunk has one or more as­so­ci­ated data nodes re­spon­si­ble for the chunk). DataNodeDispatch will buffer up to 1000 tu­ples per data node (configurable) un­til it flushes a buffer us­ing the given re­mote SQL. The servers in­volved are shown in the EXPLAIN, al­though not all of them might ul­ti­mately re­ceive data since the plan­ner can­not know at plan time how tu­ples will be routed dur­ing ex­e­cu­tion.It should be noted that dis­trib­uted hy­per­ta­bles also sup­port COPY for fur­ther per­for­mance dur­ing in­serts. Inserts us­ing COPY do not ex­e­cute a plan, like the one shown for INSERT above. Instead, a tu­ple is read di­rectly from the client con­nec­tion (in COPY mode) and then routed to the cor­re­spond­ing data node con­nec­tion (also in COPY mode). Thus, tu­ples are streamed to data nodes with very lit­tle over­head. However, while COPY is suit­able for bulk data loads, it does not sup­port things like RETURNING clauses and thus has lim­i­ta­tions that pro­hibit its use in all cases.Read queries on a dis­trib­uted hy­per­table fol­low a sim­i­lar path from ac­cess node to data nodes.  A client makes a stan­dard SQL re­quest to an ac­cess node:SE­LECT time_bucket(‘1 min­ute’, time) as minute,

de­vice_id, min(temp), max(temp), avg(temp)

FROM mea­sure­ments

WHERE de­vice_id IN (‘A001, B015)

AND time > NOW() - in­ter­val 1 hour’

GROUP BY minute, de­vice_id;Mak­ing this query per­for­mant on dis­trib­uted hy­per­ta­bles re­lies on three tac­tics: Optimally dis­trib­ut­ing and push­ing down work to data nodes, an­dEx­e­cut­ing in par­al­lel across the data nodes TimescaleDB is de­signed to im­ple­ment these tac­tics. However, given the length of this post so far, we’ll cover these top­ics in an up­com­ing ar­ti­cle.Se­lected users and cus­tomers have al­ready been test­ing dis­trib­uted TimescaleDB in pri­vate beta, and we plan to make an ini­tial ver­sion of it more widely avail­able later this year. It will sup­port most of the good prop­er­ties de­scribed above (high write rates, query par­al­lelism and pred­i­cate push down for lower la­tency), as well as some oth­ers that we will de­scribe in fu­ture posts (elastically grow­ing a clus­ter to scale stor­age and com­pute, and fault tol­er­ance via phys­i­cal replica sets).If you’d like to join the pri­vate beta, please fill out this form. You can also view the doc­u­men­ta­tion here. And if the chal­lenge of build­ing a next-gen­er­a­tion data­base in­fra­struc­ture is of in­ter­est to you, we’re hir­ing world­wide and al­ways look­ing for great en­gi­neers to join the team.Q: What about query per­for­mance bench­marks? What about high-avail­abil­ity, elas­tic­ity, and other op­er­a­tional top­ics?Given the length of this post so far, we opted to cover query per­for­mance bench­marks, how our dis­trib­uted ar­chi­tec­ture op­ti­mizes queries, as well as op­er­a­tional top­ics, in fu­ture posts.Q: How will multi-node TimescaleDB be li­censed?We’ll an­nounce li­cens­ing when multi-node TimescaleDB is more pub­licly avail­able.Q: Why did­n’t you use [fill in the blank] scale-out PostgreSQL op­tion?While there are ex­ist­ing op­tions for scal­ing PostgreSQL to mul­ti­ple nodes, we found that none of them pro­vided the ar­chi­tec­ture and data model needed to en­able scal­ing, per­for­mance, elas­tic­ity, data re­ten­tion poli­cies, etc, for time-se­ries data. Put an­other way, we found that by treat­ing time as a first-class cit­i­zen in a data­base ar­chi­tec­ture, one can en­able much bet­ter per­for­mance and a vastly su­pe­rior user ex­pe­ri­ence. We find this true for sin­gle-node as well as for multi-node. In ad­di­tion, there are many more ca­pa­bil­i­ties we want to im­ple­ment, and of­ten these in­te­grate closely into the code. If we did­n’t write the code our­selves with these in mind, it would have been quite chal­leng­ing, if not im­pos­si­ble.Q: How does this fit into the CAP Theorem?Any dis­cus­sion of a dis­trib­uted data­base ar­chi­tec­ture should touch on the CAP Theorem. As a quick re­minder, CAP states that there’s an im­plicit trade­off be­tween strong Consistency (Linearizability) and Availability in dis­trib­uted sys­tems, where avail­abil­ity is in­for­mally de­fined as be­ing able to im­me­di­ately han­dle reads and writes as long as any servers are alive (regardless of the num­ber of fail­ures). The P states that the sys­tem is able to han­dle par­ti­tions, but this is a bit of a mis­nomer: You can de­sign your sys­tem to ul­ti­mately pro­vide Consistency or Availability, but you can’t ul­ti­mately con­trol whether fail­ures (partitions) hap­pen.  Even if you try re­ally hard through ag­gres­sive net­work en­gi­neer­ing to make fail­ures rare, fail­ures still oc­ca­sion­ally hap­pen, and any sys­tem must then be opin­ion­ated for C or A.  And if you al­ways want Availability, even if par­ti­tions are rare (so that the sys­tem can, in the nor­mal case, pro­vide stronger con­sis­tency), then ap­pli­ca­tions must still suf­fer the com­plex­ity of han­dling in­con­sis­tent data for the un­com­mon case.The sum­mary: TimescaleDB does­n’t over­come the CAP Theorem. We do talk about how TimescaleDB achieves high avail­abil­ity”, us­ing the term as com­monly used in the data­base in­dus­try to mean repli­cated in­stances that per­form prompt and au­to­mated re­cov­ery from fail­ure. This is dif­fer­ent than for­mal Big A” Availability from the CAP Theorem, and TimescaleDB to­day sac­ri­fices Availability for Consistency un­der fail­ure con­di­tions.Time-se­ries work­loads in­tro­duce an in­ter­est­ing twist to this dis­cus­sion, how­ever. We’ve talked about how their write work­loads are dif­fer­ent (inserts to re­cent time ranges, not ran­dom up­dates), and how this pat­tern leads to dif­fer­ent pos­si­bil­i­ties for elas­tic­ity (and re­duced write am­pli­fi­ca­tion). But we also see a com­mon id­iom across ar­chi­tec­tures in­te­grat­ing TimescaleDB, whereby the write and read data paths are per­formed by dif­fer­ent ap­pli­ca­tions.  We rarely see read-write trans­ac­tions (aside from up­serts). While queries help drive dash­boards, re­port­ing/​alert­ing, or other real-time an­a­lyt­ics, data points are of­ten in­gested from sys­tems like Kafka, NATS, MQTT bro­kers, IoT Hubs, or other event­ing/​log­ging sys­tems. These up­stream sys­tems are typ­i­cally built around buffer­ing, which greatly ame­lio­rate avail­abil­ity is­sues if the sys­tem tem­porar­ily blocks writes (which any C” sys­tem must do): these up­stream sys­tems will just buffer and retry upon au­to­mated re­cov­ery.So in short, tech­ni­cally TimescaleDB is a CP sys­tem that sac­ri­fices A un­der fail­ure con­di­tions. But in prac­tice we find that, be­cause of up­stream buffers, this gen­er­ally is much less an is­sue.

Why SQL is beat­ing NoSQL, and what this means for the fu­ture of data

Match the flex­i­bil­ity and scale of your ap­pli­ca­tion with a stack that works for you.

...

Read the original on blog.timescale.com »

8 303 shares, 14 trendiness, 4888 words and 36 minutes reading time

WebAssembly Interface Types: Interoperate with All the Things! – Mozilla Hacks

People are ex­cited about run­ning WebAssembly out­side the browser.

That ex­cite­ment is­n’t just about WebAssembly run­ning in its own stand­alone run­time. People are also ex­cited about run­ning WebAssembly from lan­guages like Python, Ruby, and Rust.

Why would you want to do that? A few rea­sons:

* Make native” mod­ules less com­pli­cated

Runtimes like Node or Python’s CPython of­ten al­low you to write mod­ules in low-level lan­guages like C++, too. That’s be­cause these low-level lan­guages are of­ten much faster. So you can use na­tive mod­ules in Node, or ex­ten­sion mod­ules in Python. But these mod­ules are of­ten hard to use be­cause they need to be com­piled on the user’s de­vice. With a WebAssembly native” mod­ule, you can get most of the speed with­out the com­pli­ca­tion.

* Make it eas­ier to sand­box na­tive code

On the other hand, low-level lan­guages like Rust would­n’t use WebAssembly for speed. But they could use it for se­cu­rity. As we talked about in the WASI an­nounce­ment, WebAssembly gives you light­weight sand­box­ing by de­fault. So a lan­guage like Rust could use WebAssembly to sand­box na­tive code mod­ules.

* Share na­tive code across plat­forms

Developers can save time and re­duce main­tainance costs if they can share the same code­base across dif­fer­ent plat­forms (e.g. be­tween the web and a desk­top app). This is true for both script­ing and low-level lan­guages. And WebAssembly gives you a way to do that with­out mak­ing things slower on these plat­forms.

So WebAssembly could re­ally help other lan­guages with im­por­tant prob­lems.

But with to­day’s WebAssembly, you would­n’t want to use it in this way. You can run WebAssembly in all of these places, but that’s not enough.

Right now, WebAssembly only talks in num­bers. This means the two lan­guages can call each oth­er’s func­tions.

But if a func­tion takes or re­turns any­thing be­sides num­bers, things get com­pli­cated. You can ei­ther:

* Ship one mod­ule that has a re­ally hard-to-use API that only speaks in num­bers… mak­ing life hard for the mod­ule’s user.

* Add glue code for every sin­gle en­vi­ron­ment you want this mod­ule to run in… mak­ing life hard for the mod­ule’s de­vel­oper.

But this does­n’t have to be the case.

It should be pos­si­ble to ship a sin­gle WebAssembly mod­ule and have it run any­where… with­out mak­ing life hard for ei­ther the mod­ule’s user or de­vel­oper.

So the same WebAssembly mod­ule could use rich APIs, us­ing com­plex types, to talk to:

* Modules run­ning in their own na­tive run­time (e.g. Python mod­ules run­ning in a Python run­time)

* Other WebAssembly mod­ules writ­ten in dif­fer­ent source lan­guages (e.g. a Rust mod­ule and a Go mod­ule run­ning to­gether in the browser)

* The host sys­tem it­self (e.g. a WASI mod­ule pro­vid­ing the sys­tem in­ter­face to an op­er­at­ing sys­tem or the browser’s APIs)

And with a new, early-stage pro­posal, we’re see­ing how we can make this Just Work™, as you can see in this demo.

So let’s take a look at how this will work. But first, let’s look at where we are to­day and the prob­lems that we’re try­ing to solve.

WebAssembly is­n’t lim­ited to the web. But up to now, most of WebAssembly’s de­vel­op­ment has fo­cused on the Web.

That’s be­cause you can make bet­ter de­signs when you fo­cus on solv­ing con­crete use cases. The lan­guage was def­i­nitely go­ing to have to run on the Web, so that was a good use case to start with.

This gave the MVP a nicely con­tained scope. WebAssembly only needed to be able to talk to one lan­guage—JavaScript.

And this was rel­a­tively easy to do. In the browser, WebAssembly and JS both run in the same en­gine, so that en­gine can help them ef­fi­ciently talk to each other.

But there is one prob­lem when JS and WebAssembly try to talk to each other… they use dif­fer­ent types.

Currently, WebAssembly can only talk in num­bers. JavaScript has num­bers, but also quite a few more types.

And even the num­bers aren’t the same. WebAssembly has 4 dif­fer­ent kinds of num­bers: in­t32, in­t64, float32, and float64. JavaScript cur­rently only has Number (though it will soon have an­other num­ber type, BigInt).

The dif­fer­ence is­n’t just in the names for these types. The val­ues are also stored dif­fer­ently in mem­ory.

First off, in JavaScript any value, no mat­ter the type, is put in some­thing called a box (and I ex­plained box­ing more in an­other ar­ti­cle).

WebAssembly, in con­trast, has sta­tic types for its num­bers. Because of this, it does­n’t need (or un­der­stand) JS boxes.

This dif­fer­ence makes it hard to com­mu­ni­cate with each other.

But if you want to con­vert a value from one num­ber type to the other, there are pretty straight­for­ward rules.

Because it’s so sim­ple, it’s easy to write down. And you can find this writ­ten down in WebAssembly’s JS API spec.

This map­ping is hard­coded in the en­gines.

It’s kind of like the en­gine has a ref­er­ence book. Whenever the en­gine has to pass pa­ra­me­ters or re­turn val­ues be­tween JS and WebAssembly, it pulls this ref­er­ence book off the shelf to see how to con­vert these val­ues.

Having such a lim­ited set of types (just num­bers) made this map­ping pretty easy. That was great for an MVP. It lim­ited how many tough de­sign de­ci­sions needed to be made.

But it made things more com­pli­cated for the de­vel­op­ers us­ing WebAssembly. To pass strings be­tween JS and WebAssembly, you had to find a way to turn the strings into an ar­ray of num­bers, and then turn an ar­ray of num­bers back into a string. I ex­plained this in a pre­vi­ous post.

This is­n’t dif­fi­cult, but it is te­dious. So tools were built to ab­stract this away.

For ex­am­ple, tools like Rust’s wasm-bind­gen and Emscripten’s Embind au­to­mat­i­cally wrap the WebAssembly mod­ule with JS glue code that does this trans­la­tion from strings to num­bers.

And these tools can do these kinds of trans­for­ma­tions for other high-level types, too, such as com­plex ob­jects with prop­er­ties.

This works, but there are some pretty ob­vi­ous use cases where it does­n’t work very well.

For ex­am­ple, some­times you just want to pass a string through WebAssembly. You want a JavaScript func­tion to pass a string to a WebAssembly func­tion, and then have WebAssembly pass it to an­other JavaScript func­tion.

Here’s what needs to hap­pen for that to work:

the first JavaScript func­tion passes the string to the JS glue code

the JS glue code turns that string ob­ject into num­bers and then puts those num­bers into lin­ear mem­ory

then passes a num­ber (a pointer to the start of the string) to WebAssembly

the WebAssembly func­tion passes that num­ber over to the JS glue code on the other side

the sec­ond JavaScript func­tion pulls all of those num­bers out of lin­ear mem­ory and then de­codes them back into a string ob­ject

which it gives to the sec­ond JS func­tion

So the JS glue code on one side is just re­vers­ing the work it did on the other side. That’s a lot of work to recre­ate what’s ba­si­cally the same ob­ject.

If the string could just pass straight through WebAssembly with­out any trans­for­ma­tions, that would be way eas­ier.

WebAssembly would­n’t be able to do any­thing with this string—it does­n’t un­der­stand that type. We would­n’t be solv­ing that prob­lem.

But it could just pass the string ob­ject back and forth be­tween the two JS func­tions, since they do un­der­stand the type.

So this is one of the rea­sons for the WebAssembly ref­er­ence types pro­posal. That pro­posal adds a new ba­sic WebAssembly type called anyref.

With an anyref, JavaScript just gives WebAssembly a ref­er­ence ob­ject (basically a pointer that does­n’t dis­close the mem­ory ad­dress). This ref­er­ence points to the ob­ject on the JS heap. Then WebAssembly can pass it to other JS func­tions, which know ex­actly how to use it.

So that solves one of the most an­noy­ing in­ter­op­er­abil­ity prob­lems with JavaScript. But that’s not the only in­ter­op­er­abil­ity prob­lem to solve in the browser.

There’s an­other, much larger, set of types in the browser. WebAssembly needs to be able to in­ter­op­er­ate with these types if we’re go­ing to have good per­for­mance.

JS is only one part of the browser. The browser also has a lot of other func­tions, called Web APIs, that you can use.

Behind the scenes, these Web API func­tions are usu­ally writ­ten in C++ or Rust. And they have their own way of stor­ing ob­jects in mem­ory.

Web APIs’ pa­ra­me­ters and re­turn val­ues can be lots of dif­fer­ent types. It would be hard to man­u­ally cre­ate map­pings for each of these types. So to sim­plify things, there’s a stan­dard way to talk about the struc­ture of these types—Web IDL.

When you’re us­ing these func­tions, you’re usu­ally us­ing them from JavaScript. This means you are pass­ing in val­ues that use JS types. How does a JS type get con­verted to a Web IDL type?

Just as there is a map­ping from WebAssembly types to JavaScript types, there is a map­ping from JavaScript types to Web IDL types.

So it’s like the en­gine has an­other ref­er­ence book, show­ing how to get from JS to Web IDL. And this map­ping is also hard­coded in the en­gine.

For many types, this map­ping be­tween JavaScript and Web IDL is pretty straight­for­ward. For ex­am­ple, types like DOMString and JSs String are com­pat­i­ble and can be mapped di­rectly to each other.

Now, what hap­pens when you’re try­ing to call a Web API from WebAssembly? Here’s where we get to the prob­lem.

Currently, there is no map­ping be­tween WebAssembly types and Web IDL types. This means that, even for sim­ple types like num­bers, your call has to go through JavaScript.

WebAssembly passes the value to JS.

In the process, the en­gine con­verts this value into a JavaScript type, and puts it in the JS heap in mem­ory

Then, that JS value is passed to the Web API func­tion. In the process, the en­gine con­verts the JS value into a Web IDL type, and puts it in a dif­fer­ent part of mem­ory, the ren­der­er’s heap.

This takes more work than it needs to, and also uses up more mem­ory.

There’s an ob­vi­ous so­lu­tion to this—cre­ate a map­ping from WebAssembly di­rectly to Web IDL. But that’s not as straight­for­ward as it might seem.

For sim­ple Web IDL types like boolean and un­signed long (which is a num­ber), there are clear map­pings from WebAssembly to Web IDL.

But for the most part, Web API pa­ra­me­ters are more com­plex types. For ex­am­ple, an API might take a dic­tio­nary, which is ba­si­cally an ob­ject with prop­er­ties, or a se­quence, which is like an ar­ray.

To have a straight­for­ward map­ping be­tween WebAssembly types and Web IDL types, we’d need to add some higher-level types. And we are do­ing that—with the GC pro­posal. With that, WebAssembly mod­ules will be able to cre­ate GC ob­jects—things like structs and ar­rays—that could be mapped to com­pli­cated Web IDL types.

But if the only way to in­ter­op­er­ate with Web APIs is through GC ob­jects, that makes life harder for lan­guages like C++ and Rust that would­n’t use GC ob­jects oth­er­wise. Whenever the code in­ter­acts with a Web API, it would have to cre­ate a new GC ob­ject and copy val­ues from its lin­ear mem­ory into that ob­ject.

That’s only slightly bet­ter than what we have to­day with JS glue code.

We don’t want JS glue code to have to build up GC ob­jects—that’s a waste of time and space. And we don’t want the WebAssembly mod­ule to do that ei­ther, for the same rea­sons.

We want it to be just as easy for lan­guages that use lin­ear mem­ory (like Rust and C++) to call Web APIs as it is for lan­guages that use the en­gine’s built-in GC. So we need a way to cre­ate a map­ping be­tween ob­jects in lin­ear mem­ory and Web IDL types, too.

There’s a prob­lem here, though. Each of these lan­guages rep­re­sents things in lin­ear mem­ory in dif­fer­ent ways. And we can’t just pick one lan­guage’s rep­re­sen­ta­tion. That would make all the other lan­guages less ef­fi­cient.

But even though the ex­act lay­out in mem­ory for these things is of­ten dif­fer­ent, there are some ab­stract con­cepts that they usu­ally share in com­mon.

For ex­am­ple, for strings the lan­guage of­ten has a pointer to the start of the string in mem­ory, and the length of the string. And even if the string has a more com­pli­cated in­ter­nal rep­re­sen­ta­tion, it usu­ally needs to con­vert strings into this for­mat when call­ing ex­ter­nal APIs any­ways.

This means we can re­duce this string down to a type that WebAssembly un­der­stands… two i32s.

We could hard­code a map­ping like this in the en­gine. So the en­gine would have yet an­other ref­er­ence book, this time for WebAssembly to Web IDL map­pings.

But there’s a prob­lem here. WebAssembly is a type-checked lan­guage. To keep things se­cure, the en­gine has to check that the call­ing code passes in types that match what the callee asks for.

This is be­cause there are ways for at­tack­ers to ex­ploit type mis­matches and make the en­gine do things it’s not sup­posed to do.

If you’re call­ing some­thing that takes a string, but you try to pass the func­tion an in­te­ger, the en­gine will yell at you. And it should yell at you.

So we need a way for the mod­ule to ex­plic­itly tell the en­gine some­thing like: I know Document.createElement() takes a string. But when I call it, I’m go­ing to pass you two in­te­gers. Use these to cre­ate a DOMString from data in my lin­ear mem­ory. Use the first in­te­ger as the start­ing ad­dress of the string and the sec­ond as the length.”

This is what the Web IDL pro­posal does. It gives a WebAssembly mod­ule a way to map be­tween the types that it uses and Web IDLs types.

These map­pings aren’t hard­coded in the en­gine. Instead, a mod­ule comes with its own lit­tle book­let of map­pings.

So this gives the en­gine a way to say For this func­tion, do the type check­ing as if these two in­te­gers are a string.”

The fact that this book­let comes with the mod­ule is use­ful for an­other rea­son, though.

Sometimes a mod­ule that would usu­ally store its strings in lin­ear mem­ory will want to use an anyref or a GC type in a par­tic­u­lar case… for ex­am­ple, if the mod­ule is just pass­ing an ob­ject that it got from a JS func­tion, like a DOM node, to a Web API.

So mod­ules need to be able to choose on a func­tion-by-func­tion (or even even ar­gu­ment-by-ar­gu­ment) ba­sis how dif­fer­ent types should be han­dled. And since the map­ping is pro­vided by the mod­ule, it can be cus­tom-tai­lored for that mod­ule.

How do you gen­er­ate this book­let?

The com­piler takes care of this in­for­ma­tion for you. It adds a cus­tom sec­tion to the WebAssembly mod­ule. So for many lan­guage tool­chains, the pro­gram­mer does­n’t have to do much work.

For ex­am­ple, let’s look at how the Rust tool­chain han­dles this for one of the sim­plest cases: pass­ing a string into the alert func­tion.

#[wasm_bindgen]

...

Read the original on hacks.mozilla.org »

9 283 shares, 46 trendiness, 1349 words and 11 minutes reading time

Preindustrial workers worked fewer hours than today's

from The Overworked American: The Unexpected Decline of Leisure, by Juliet B. Schor

See also: Productivity and the Workweek

and: Eight cen­turies of an­nual hours

The labour­ing man will take his rest long in the morn­ing; a good piece of the day is spent afore he come at his work; then he must have his break­fast, though he have not earned it at his ac­cus­tomed hour, or else there is grudg­ing and mur­mur­ing; when the clock smiteth, he will cast down his bur­den in the mid­way, and what­so­ever he is in hand with, he will leave it as it is, though many times it is marred afore he come again; he may not lose his meat, what dan­ger so­ever the work is in. At noon he must have his sleep­ing time, then his bever in the af­ter­noon, which spendeth a great part of the day; and when his hour cometh at night, at the first stroke of the clock he casteth down his tools, leaveth his work, in what need or case so­ever the work standeth.

-James Pilkington, Bishop of

Durham, ca. 1570

One of cap­i­tal­is­m’s most durable myths is that it has re­duced hu­man toil. This myth is typ­i­cally de­fended by a com­par­i­son of the mod­ern forty-hour week with its sev­enty- or eighty-hour coun­ter­part in the nine­teenth cen­tury. The im­plicit — but rarely ar­tic­u­lated — as­sump­tion is that the eighty-hour stan­dard has pre­vailed for cen­turies. The com­par­i­son con­jures up the dreary life of me­dieval peas­ants, toil­ing steadily from dawn to dusk. We are asked to imag­ine the jour­ney­man ar­ti­san in a cold, damp gar­ret, ris­ing even be­fore the sun, la­bor­ing by can­dle­light late into the night.

These im­ages are back­ward pro­jec­tions of mod­ern work pat­terns. And they are false. Before cap­i­tal­ism, most peo­ple did not work very long hours at all. The tempo of life was slow, even leisurely; the pace of work re­laxed. Our an­ces­tors may not have been rich, but they had an abun­dance of leisure. When cap­i­tal­ism raised their in­comes, it also took away their time. Indeed, there is good rea­son to be­lieve that work­ing hours in the mid-nine­teenth cen­tury con­sti­tute the most prodi­gious work ef­fort in the en­tire his­tory of hu­mankind.

Therefore, we must take a longer view and look back not just one hun­dred years, but three or four, even six or seven hun­dred. Consider a typ­i­cal work­ing day in the me­dieval pe­riod. It stretched from dawn to dusk (sixteen hours in sum­mer and eight in win­ter), but, as the Bishop Pilkington has noted, work was in­ter­mit­tent - called to a halt for break­fast, lunch, the cus­tom­ary af­ter­noon nap, and din­ner. Depending on time and place, there were also mid­morn­ing and midafter­noon re­fresh­ment breaks. These rest pe­ri­ods were the tra­di­tional rights of la­bor­ers, which they en­joyed even dur­ing peak har­vest times. During slack pe­ri­ods, which ac­counted for a large part of the year, ad­her­ence to reg­u­lar work­ing hours was not usual. According to Oxford Professor James E. Thorold Rogers[1], the me­dieval work­day was not more than eight hours. The worker par­tic­i­pat­ing in the eight-hour move­ments of the late nine­teenth cen­tury was simply striv­ing to re­cover what his an­ces­tor worked by four or five cen­turies ago.”

An im­por­tant piece of ev­i­dence on the work­ing day is that it was very un­usual for servile la­bor­ers to be re­quired to work a whole day for a lord. One day’s work was con­sid­ered half a day, and if a serf worked an en­tire day, this was counted as two days-works.“[2] Detailed ac­counts of ar­ti­sans’ work­days are avail­able. Knoop and jones’ fig­ures for the four­teenth cen­tury work out to a yearly av­er­age of 9 hours (exclusive of meals and break­times)[3]. Brown, Colwin and Taylor’s fig­ures for ma­sons sug­gest an av­er­age work­day of 8.6 hours[4].

The con­trast be­tween cap­i­tal­ist and pre­cap­i­tal­ist work pat­terns is most strik­ing in re­spect to the work­ing year. The me­dieval cal­en­dar was filled with hol­i­days. Official — that is, church — hol­i­days in­cluded not only long vacations” at Christmas, Easter, and mid­sum­mer but also nu­mer­ous saints’ an­drest days. These were spent both in sober church­go­ing and in feast­ing, drink­ing and mer­ry­mak­ing. In ad­di­tion to of­fi­cial cel­e­bra­tions, there were of­ten weeks’ worth of ales — to mark im­por­tant life events (bride ales or wake ales) as well as less mo­men­tous oc­ca­sions (scot ale, lamb ale, and hock ale). All told, hol­i­day leisure time in me­dieval England took up prob­a­bly about one-third of the year. And the English were ap­par­ently work­ing harder than their neigh­bors. The an­cien règime in France is re­ported to have guar­an­teed fifty-two Sundays, ninety rest days, and thirty-eight hol­i­days. In Spain, trav­el­ers noted that hol­i­days to­taled five months per year.[5]

The peas­an­t’s free time ex­tended be­yond of­fi­cially sanc­tioned hol­i­days. There is con­sid­er­able ev­i­dence of what econ­o­mists call the back­ward-bend­ing sup­ply curve of la­bor — the idea that when wages rise, work­ers sup­ply less la­bor. During one pe­riod of un­usu­ally high wages (the late four­teenth cen­tury), many la­bor­ers re­fused to work by the year or the half year or by any of the usual terms but only by the day.” And they worked only as many days as were nec­es­sary to earn their cus­tom­ary in­come — which in this case amounted to about 120 days a year, for a prob­a­ble to­tal of only 1,440 hours an­nu­ally (this es­ti­mate as­sumes a 12-hour day be­cause the days worked were prob­a­bly dur­ing spring, sum­mer and fall). A thir­teenth-cen­tury es­time finds that whole peas­ant fam­i­lies did not put in more than 150 days per year on their land. Manorial records from four­teenth-cen­tury England in­di­cate an ex­tremely short work­ing year — 175 days — for servile la­bor­ers. Later ev­i­dence for farmer-min­ers, a group with con­trol over their work­time, in­di­cates they worked only 180 days a year.

[1] James E. Thorold Rogers, Six Centuries of Work and Wages (London: Allen and Unwin, 1949), 542-43.

[3] Douglas Knoop and G. P. Jones, The Medieval Mason (New York: Barnes and Noble, 1967), 105.

[4] R. Allen Brown, H.M. Colvin, and A.J. Taylor, The History of the King’s Works, vol. I, the Middle Ages (London: Her Majesty’s Stationary Office, 1963).

[5] Edith Rodgers, Discussion of Holidays in the Later Middle Ages (New York: Columbia University Press, 1940), 10-11. See also C.R. Cheney, Rules for the ob­ser­vance of feast-days in me­dieval England”, Bulletin of the Institute of Historical Research 34, 90, 117-29 (1961).

- Adult male peas­ant, U.K.: hours

Calculated from Gregory Clark’s es­ti­mate of 150 days per fam­ily, as­sumes 12 hours per day, 135 days per year for adult male (“Impatience, Poverty, and Open Field Agriculture”, mimeo, 1986)

- Casual la­borer, U.K.: hours

Calculated from Nora Ritchie’s es­ti­mate of 120 days per year. Assumes 12-hour day. (“Labour con­di­tions in Essex in the reign of Richard II, in E.M. Carus-Wilson, ed., Essays in Economic History, vol. II, London: Edward Arnold, 1962).

- English worker: hours

Juliet Schor’s es­time of av­er­age me­dieval la­borer work­ing two-thirds of the year at 9.5 hours per day

- Farmer-miner, adult male, U.K.: hours

Calculated from Ian Blanchard’s es­ti­mate of 180 days per year. Assumes 11-hour day (“Labour pro­duc­tiv­ity and work psy­chol­ogy in the English min­ing in­dus­try, 1400-1600”, Economic History Review 31, 23 (1978).

- Average worker, U.K.: hours

Based on 69-hour week; hours from W.S. Woytinsky, Hours of la­bor,” in Encyclopedia of the Social Sciences, vol. III (New York: Macmillan, 1935). Low es­ti­mate as­sumes 45 week year, high one as­sumes 52 week year

- Average worker, U.S.: hours

Based on 70-hour week; hours from Joseph Zeisel, The work­week in American in­dus­try, 1850-1956”, Monthly Labor Review 81, 23-29 (1958). Low es­ti­mate as­sumes 45 week year, high one as­sumes 52 week year

- Average worker, U.S.: hours

From The Overworked American: The Unexpected Decline of Leisure, by Juliet B. Schor, Table 2.4

- Manufacturing work­ers, U.K.: hours

Calculated from Bureau of Labor Statistics data, Office of Productivity and Technology

...

Read the original on groups.csail.mit.edu »

10 260 shares, 13 trendiness, 1665 words and 16 minutes reading time

No, WeWork Isn’t a Tech Company. Here’s Why That Matters

When WeWork, the cowork­ing space now known as The We Company, re­leased its S-1 fil­ing to go pub­lic, it spurred nu­mer­ous con­cerns about the com­pa­ny’s large val­u­a­tion ($47 bil­lion at last count). It also re­newed ques­tions about WeWork’s claims of be­ing a tech com­pany: What makes a mod­ern tech com­pany? Does WeWork meet those qual­i­fi­ca­tions? The au­thors ar­gue that WeWork does not meet any of the five qual­i­fi­ca­tions that en­able a mod­ern tech com­pany to achieve ex­po­nen­tial growth as well as win­ner-take-all prof­its. Analysts should be wary of think­ing every startup, how­ever dis­rup­tive, is a tech com­pany or is wor­thy of a tech val­u­a­tion, be­cause the tech” la­bel is­n’t what de­ter­mines share­holder value. What does are prof­its, re­turn on in­vest­ments, and div­i­dend-pay­ing po­ten­tial.

When WeWork, the cowork­ing space now known as The We Company, re­leased its S-1 fil­ing to go pub­lic, it spurred nu­mer­ous con­cerns about the com­pa­ny’s large val­u­a­tion ($47 bil­lion at last count). It also re­newed ques­tions about WeWork’s claims of be­ing a tech com­pany: What makes a mod­ern tech com­pany? Does WeWork meet those qual­i­fi­ca­tions? The au­thors ar­gue that WeWork does not meet any of the five qual­i­fi­ca­tions that en­able a mod­ern tech com­pany to achieve ex­po­nen­tial growth as well as win­ner-take-all prof­its. Analysts should be wary of think­ing every startup, how­ever dis­rup­tive, is a tech com­pany or is wor­thy of a tech val­u­a­tion, be­cause the tech” la­bel is­n’t what de­ter­mines share­holder value. What does are prof­its, re­turn on in­vest­ments, and div­i­dend-pay­ing po­ten­tial.

Last week, WeWork, the cowork­ing space now known as The We Company, re­leased its S-1 fil­ing to go pub­lic. That spurred nu­mer­ous con­cerns about the com­pa­ny’s large val­u­a­tion ($47 bil­lion at last count), given its hefty losses ($1.6 bil­lion on rev­enues of $1.8 bil­lion) and de­spite its rapid growth (86% year-over-year rev­enue growth). It also re­newed ques­tions about WeWork’s claims of be­ing a tech com­pany (the word technology” ap­pears 110 times in its prospec­tus) and about whether it’s worth a tech-type high val­u­a­tion. Pundits have long ar­gued that it is not a tech com­pany, but a mod­ern-day real es­tate com­pany — pur­chas­ing long-term leases from land­lords and rent­ing them out as short-term leases to ten­ants. Many have also ar­gued that WeWork does not de­serve the large EBITDA-based (earnings be­fore in­ter­est, de­pre­ci­a­tion, and taxes) val­u­a­tion mul­ti­ple that is of­ten as­cribed to tech com­pa­nies.

These con­cerns raise ques­tions like: What makes a mod­ern tech com­pany? Why do these com­pa­nies achieve such lofty val­u­a­tions? Does WeWork meet those qual­i­fi­ca­tions? And are the con­cerns over WeWork’s val­u­a­tion war­ranted? Here’s what, in our view, a mod­ern tech com­pany is — and why WeWork is­n’t one.

In our opin­ion, a suc­cess­ful mod­ern tech com­pany can trans­form whole in­dus­tries, achieve ex­pan­sion of scale and scope at break­neck speeds, and make enor­mous prof­its, with­out re­quir­ing sig­nif­i­cant cap­i­tal in­vest­ments. It typ­i­cally has most, if not all, of these five fea­tures:

Low vari­able costs. Google, Airbnb, Yelp, Uber, Twitter, and Facebook have scal­able vir­tual mod­els that can be ex­po­nen­tially mag­ni­fied overnight with lit­tle ad­di­tional costs. So, an ad­di­tional dol­lar of rev­enues comes with­out com­men­su­rate ex­penses. How much does it cost to make an­other copy of Windows 10 or ser­vice an­other Google or Facebook cus­tomer? Relatively lit­tle. Facebook’s gross mar­gins, for in­stance, run as high as 80%–85%.

This con­cept does not even re­motely ap­ply to WeWork. It is an of­fice rental com­pany, of­fer­ing free in­ter­net, beer, snacks, cof­fee, and work­ing space to pay­ing mem­bers. Even if it be­comes the largest and most suc­cess­ful player in its busi­ness, it will have sig­nif­i­cant op­er­at­ing ex­penses and, there­fore, wafer-thin profit mar­gins. (Think about it: Rent, util­i­ties, main­te­nance, in­sur­ance, se­cu­rity, re­fresh­ments — these all cost money!)

Low cap­i­tal in­vest­ments. Even though a mod­ern tech com­pany may in­vest in server farms, it will still of­ten re­main as­set light be­cause of its low re­quire­ments for land, build­ings, fac­to­ries, and ware­houses. For ex­am­ple, Facebook has just $25 bil­lion of phys­i­cal as­sets and a $525 bil­lion val­u­a­tion.

While WeWork rents real es­tate, it must de­velop and fur­nish it to the high­est-qual­ity lev­els to dis­tin­guish it­self from other of­fice providers. And rented premises are now con­sid­ered to be cap­i­tal as­sets. These facts im­ply two things. First, WeWork has much higher cap­i­tal re­quire­ments than a typ­i­cal tech com­pany for the same rev­enues. So, WeWork would never be able to grow non­lin­early based on its in­ter­nally gen­er­ated cash. It would keep ap­proach­ing cap­i­tal mar­kets and lenders to fund its am­bi­tious growth plans. More im­por­tant, the com­pa­ny’s abil­ity to gen­er­ate free cash flows (the amount left af­ter sub­tract­ing cap­i­tal in­vest­ment from its prof­its) is ques­tion­able. Contrast this with Facebook, whose rev­enue growth gen­er­ates huge prof­its. And given its low rein­vest­ment re­quire­ments to sus­tain growth, most of those prof­its are free cash flows, which are po­ten­tially payable to in­vestors as div­i­dends.

Second, ex­penses re­lated to de­pre­ci­a­tion and wear-and-tear of as­sets would be a much higher ex­pense for WeWork than for an as­set-light tech com­pany. That wear and tear would re­quire com­men­su­rate funds for re­place­ment. After all, cus­tomers at­tracted to trendy of­fices ex­pect that fused neon lights, worn-out car­pets, and bro­ken chairs and print­ers will be reg­u­larly re­placed. As Warren Buffett says, the tooth fairy does­n’t pay for those re­place­ment ex­penses. Hence, while an above-the-line per­for­mance met­ric, such as EBITDA, might work in the val­u­a­tion of an as­set-light tech com­pany, it is a mean­ing­less con­cept for WeWork. Other met­rics pro­posed by WeWork, such as com­mu­nity-ad­justed EBITDA, which ig­nore even the most ba­sic costs for pro­vid­ing ser­vices, such as lease rentals, are even more ridicu­lous.

A lot of cus­tomer data and cus­tomer in­ti­macy. Mod­ern tech com­pa­nies — think Uber, Amazon, Apple, Google, Yelp, Tesla, and Facebook — col­lect, store, or­ga­nize, and an­a­lyze years of user data. This data is not only vir­tual gold for those com­pa­nies, as it en­ables tar­geted ads and the sale of tai­lor-made prod­ucts; it also in­creases users’ switch­ing costs as peo­ple use the ser­vice and get cus­tomized so­lu­tions in re­turn. It is un­clear whether WeWork would col­lect this kind of data and how it would use that data to de­velop cus­tomer-in­ti­mate so­lu­tions. Too much mon­i­tor­ing and in­tru­sion in an of­fice lo­ca­tion could vi­o­late pri­vacy laws.

Network ef­fects. For most mod­ern tech com­pa­nies, the big­ger the net­work, the more valu­able the com­pany, but on an ex­po­nen­tial scale. Each new cus­tomer join­ing Facebook, even if re­motely lo­cated, cre­ates value for an ex­ist­ing cus­tomer, be­cause it ex­tends the ex­ist­ing cus­tomer’s net­work po­ten­tial. Any new cus­tomer join­ing Uber or Amazon im­proves the value propo­si­tion for an ex­ist­ing user by im­prov­ing the feed­back qual­ity, lo­gis­tics op­ti­miza­tion, and num­ber of sup­pli­ers ad­dress­ing the mar­ket. But it’s hard to see how some­one join­ing WeWork in, say, Indonesia cre­ates value for an ex­ist­ing mem­ber in Texas. The mem­ber does­n’t need WeWork’s net­work to col­lab­o­rate glob­ally, be­cause there are big­ger and bet­ter plat­forms (such as LinkedIn) to serve that pur­pose.

Ecosystems that boost ex­pan­sion with lit­tle cost. A mod­ern tech com­pany lever­ages its re­la­tion­ships with cus­tomers, and what it knows about their tastes and pref­er­ences, to de­liver more ser­vices. This is achieved at lit­tle cost by lever­ag­ing ecosys­tem part­ners’ as­sets. Consider Apple’s use of the iPhone and Amazon’s use of Echo de­vices to cross-sell apps, mu­sic, video, and pay­ment ser­vices. The com­pa­nies that con­trol these plat­forms take a cut from each dol­lar flow­ing through the sys­tem. WeWork might be able to en­ter other real es­tate ar­eas, such as apart­ments or schools, but that will re­quire mak­ing mas­sive in­vest­ments and ad­dress­ing new cus­tomers. Contrast that with Uber, which can ex­tend its of­fer­ings to Uber Eats with min­i­mal in­vest­ments.

In sum, WeWork does not meet any of the qual­i­fi­ca­tions that en­able a mod­ern tech com­pany to achieve ex­po­nen­tial growth as well as win­ner-take-all prof­its. In con­trast, even as­set- and in­ven­tory-heavy com­pa­nies like Amazon, Tesla, and Apple meet three of the five cri­te­ria.

WeWork seems to be a dis­rup­tive real es­tate com­pany, with the goal of chang­ing how peo­ple work, live, and grow.” Some may be im­pressed by WeWork’s tremen­dous growth. But this growth has re­quired and will con­tinue to re­quire mas­sive costs and in­vest­ments. More im­por­tant, this growth will not ex­po­nen­tially grow prof­its as it might for a dig­i­tal firm, for which any rev­enue growth af­ter break­ing even adds to prof­its and div­i­dend-pay­ing po­ten­tial.

But we do think that some of the other crit­i­cisms be­ing laid against the com­pany and its val­u­a­tion are ad­dress­able. And do­ing so re­quires ad­just­ing the price in­vestors are will­ing to pay for WeWork shares.

Concern about as­set-li­a­bil­ity du­ra­tion mis­match. WeWork ob­tains real es­tate us­ing long-term con­tracts (which have a long du­ra­tion of li­a­bil­ity) and rents prop­er­ties through short-term con­tracts (which have short du­ra­tion of rev­enue-gen­er­at­ing as­sets). This fea­ture is typ­i­cal of many real es­tate–based busi­nesses, such as ho­tels. Duration mis­match is a dou­ble-edged sword. It can pro­duce large prof­its in good times, be­cause the firms will have fixed their leas­ing costs a long time ago and can now com­mand higher mem­ber­ship fees. But it could drag WeWork into bank­ruptcy in bad times, be­cause its fixed li­a­bil­i­ties, which cur­rently stand at $34 bil­lion, would re­main payable, come rain or shine. Investors can as­sign an ap­pro­pri­ate dis­count to val­u­a­tion to ad­just for higher risks.

Concern about pub­lic of­fer­ing of shares that carry lower vot­ing rights than pro­mot­ers’ shares. Offering mul­ti­ple classes of shares is a com­mon fea­ture of many mod­ern pub­lic cor­po­ra­tions. To ease con­cern over pur­chas­ing shares that don’t have vot­ing rights, share­hold­ers can as­sign an ap­pro­pri­ate dis­count in ex­change for vot­ing rights, as they do with Facebook, Alphabet, and Spotify.

Concern about in­sider deal­ings, such as pro­mot­ers be­ing renters of the com­pa­ny’s real es­tate. While we op­pose any man­ager-owned ven­dors be­com­ing pre­ferred sup­pli­ers to a pub­lic cor­po­ra­tion, WeWork has dis­closed its in­sider deal­ings. Investors can there­fore as­sign a suit­able dis­count for po­ten­tial value loss due to in­sider deal­ings.

In sum­mary, we are re­luc­tant to put WeWork in the same cat­e­gory of tech com­pa­nies as Apple, Microsoft, Facebook, and Alphabet. It does­n’t have the fea­tures that make those com­pa­nies cash-gen­er­at­ing ma­chines, so it does­n’t war­rant a tech-type val­u­a­tion. We hope that an­a­lysts and man­agers will use the more holis­tic frame­work given here to de­ter­mine what is truly a tech com­pany. Analysts should be wary of think­ing every startup, how­ever dis­rup­tive, is a tech com­pany or is wor­thy of a tech val­u­a­tion, be­cause the tech” la­bel is­n’t what de­ter­mines share­holder value. What does are prof­its, re­turn on in­vest­ments, and div­i­dend-pay­ing po­ten­tial.

...

Read the original on hbr.org »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.