10 interesting stories served every morning and every evening.




1 353 shares, 78 trendiness

35% Faster Than The Filesystem

Table Of Contents

2. How These Measurements Are Made

SQLite reads and writes small blobs (for ex­am­ple, thumb­nail im­ages)

35% faster¹ than the same blobs can be read from or writ­ten to in­di­vid­ual files on disk us­ing fread() or fwrite().

Furthermore, a sin­gle SQLite data­base hold­ing 10-kilobyte blobs uses about 20% less disk space than stor­ing the blobs in in­di­vid­ual files.

The per­for­mance dif­fer­ence arises (we be­lieve) be­cause when work­ing from an SQLite data­base, the open() and close() sys­tem calls are in­voked only once, whereas open() and close() are in­voked once for each blob when us­ing blobs stored in in­di­vid­ual files. It ap­pears that the over­head of call­ing open() and close() is greater than the over­head of us­ing the data­base. The size re­duc­tion arises from the fact that in­di­vid­ual files are padded out to the next mul­ti­ple of the filesys­tem block size, whereas the blobs are packed more tightly into an SQLite data­base.

The mea­sure­ments in this ar­ti­cle were made dur­ing the week of 2017-06-05 us­ing a ver­sion of SQLite in be­tween 3.19.2 and 3.20.0. You may ex­pect fu­ture ver­sions of SQLite to per­form even bet­ter.

¹The 35% fig­ure above is ap­prox­i­mate. Actual tim­ings vary de­pend­ing on hard­ware, op­er­at­ing sys­tem, and the de­tails of the ex­per­i­ment, and due to ran­dom per­for­mance fluc­tu­a­tions on real-world hard­ware. See the text be­low for more de­tail. Try the ex­per­i­ments your­self. Report sig­nif­i­cant de­vi­a­tions on the SQLite fo­rum.

The 35% fig­ure is based on run­ning tests on every ma­chine that the au­thor has eas­ily at hand. Some re­view­ers of this ar­ti­cle re­port that SQLite has higher la­tency than di­rect I/O on their sys­tems. We do not yet un­der­stand the dif­fer­ence. We also see in­di­ca­tions that SQLite does not per­form as well as di­rect I/O when ex­per­i­ments are run us­ing a cold filesys­tem cache.

So let your take-away be this: read/​write la­tency for SQLite is com­pet­i­tive with read/​write la­tency of in­di­vid­ual files on disk. Often SQLite is faster. Sometimes SQLite is al­most as fast. Either way, this ar­ti­cle dis­proves the com­mon as­sump­tion that a re­la­tional data­base must be slower than di­rect filesys­tem I/O.

A 2022 study

(alternative link on GitHub) found that SQLite is roughly twice as fast at real-world work­loads com­pared to Btrfs and Ext4 on Linux.

Jim Gray

and oth­ers stud­ied the read per­for­mance of BLOBs ver­sus file I/O for Microsoft SQL Server and found that read­ing BLOBs out of the data­base was faster for BLOB sizes less than be­tween 250KiB and 1MiB. (Paper). In that study, the data­base still stores the file­name of the con­tent even if the con­tent is held in a sep­a­rate file. So the data­base is con­sulted for every BLOB, even if it is only to ex­tract the file­name. In this ar­ti­cle, the key for the BLOB is the file­name, so no pre­lim­i­nary data­base ac­cess is re­quired. Because the data­base is never used at all when read­ing con­tent from in­di­vid­ual files in this ar­ti­cle, the thresh­old at which di­rect file I/O be­comes faster is smaller than it is in Gray’s pa­per.

The Internal Versus External BLOBs ar­ti­cle on this web­site is an ear­lier in­ves­ti­ga­tion (circa 2011) that uses the same ap­proach as the Jim Gray pa­per — stor­ing the blob file­names as en­tries in the data­base — but for SQLite in­stead of SQL Server.

How These Measurements Are Made

I/O per­for­mance is mea­sured us­ing the

kvtest.c pro­gram from the SQLite source tree. To com­pile this test pro­gram, first gather the kvtest.c source file into a di­rec­tory with the SQLite amal­ga­ma­tion source files sqlite3.c” and sqlite3.h”. Then on unix, run a com­mand like the fol­low­ing:

Or on Windows with MSVC:

Instructions for com­pil­ing for Android are shown be­low.

Use the re­sult­ing kvtest” pro­gram to gen­er­ate a test data­base with 100,000 ran­dom un­com­press­ible blobs, each with a ran­dom size be­tween 8,000 and 12,000 bytes us­ing a com­mand like this:

If de­sired, you can ver­ify the new data­base by run­ning this com­mand:

Next, make copies of all the blobs into in­di­vid­ual files in a di­rec­tory us­ing a com­mand like this:

At this point, you can mea­sure the amount of disk space used by the test1.db data­base and the space used by the test1.dir di­rec­tory and all of its con­tent. On a stan­dard Ubuntu Linux desk­top, the data­base file will be 1,024,512,000 bytes in size and the test1.dir di­rec­tory will use 1,228,800,000 bytes of space (according to du -k”), about 20% more than the data­base.

The test1.dir” di­rec­tory cre­ated above puts all the blobs into a sin­gle folder. It was con­jec­tured that some op­er­at­ing sys­tems would per­form poorly when a sin­gle di­rec­tory con­tains 100,000 ob­jects. To test this, the kvtest pro­gram can also store the blobs in a hi­er­ar­chy of fold­ers with no more than 100 files and/​or sub­di­rec­to­ries per folder. The al­ter­na­tive on-disk rep­re­sen­ta­tion of the blobs can be cre­ated us­ing the –tree com­mand-line op­tion to the export” com­mand, like this:

The test1.dir di­rec­tory will con­tain 100,000 files with names like 000000”, 000001″, 000002” and so forth but the test1.tree di­rec­tory will con­tain the same files in sub­di­rec­to­ries like 00/00/00″, 00/00/01”, and so on. The test1.dir and test1.test di­rec­to­ries take up ap­prox­i­mately the same amount of space, though test1.test is very slightly larger due to the ex­tra di­rec­tory en­tries.

All of the ex­per­i­ments that fol­low op­er­ate the same with ei­ther test1.dir” or test1.tree”. Very lit­tle per­for­mance dif­fer­ence is mea­sured in ei­ther case, re­gard­less of op­er­at­ing sys­tem.

Measure the per­for­mance for read­ing blobs from the data­base and from in­di­vid­ual files us­ing these com­mands:

Depending on your hard­ware and op­er­at­ing sys­tem, you should see that reads from the test1.db data­base file are about 35% faster than reads from in­di­vid­ual files in the test1.dir or test1.tree fold­ers. Results can vary sig­nif­i­cantly from one run to the next due to caching, so it is ad­vis­able to run tests mul­ti­ple times and take an av­er­age or a worst case or a best case, de­pend­ing on your re­quire­ments.

The –blob-api op­tion on the data­base read test causes kvtest to use the sqlite3_blob_read() fea­ture of SQLite to load the con­tent of the blobs, rather than run­ning pure SQL state­ments. This helps SQLite to run a lit­tle faster on read tests. You can omit that op­tion to com­pare the per­for­mance of SQLite run­ning SQL state­ments. In that case, the SQLite still out-per­forms di­rect reads, though by not as much as when us­ing sqlite3_blob_read(). The –blob-api op­tion is ig­nored for tests that read from in­di­vid­ual disk files.

Measure write per­for­mance by adding the –update op­tion. This causes the blobs are over­writ­ten in place with an­other ran­dom blob of ex­actly the same size.

The writ­ing test above is not com­pletely fair, since SQLite is do­ing

power-safe trans­ac­tions whereas the di­rect-to-disk writ­ing is not. To put the tests on a more equal foot­ing, add ei­ther the –nosync op­tion to the SQLite writes to dis­able call­ing fsync() or FlushFileBuffers() to force con­tent to disk, or us­ing the –fsync op­tion for the di­rect-to-disk tests to force them to in­voke fsync() or FlushFileBuffers() when up­dat­ing disk files.

By de­fault, kvtest runs the data­base I/O mea­sure­ments all within a sin­gle trans­ac­tion. Use the –multitrans op­tion to run each blob read or write in a sep­a­rate trans­ac­tion. The –multitrans op­tion makes SQLite much slower, and un­com­pet­i­tive with di­rect disk I/O. This op­tion proves, yet again, that to get the most per­for­mance out of SQLite, you should group as much data­base in­ter­ac­tion as pos­si­ble within a sin­gle trans­ac­tion.

There are many other test­ing op­tions, which can be seen by run­ning the com­mand:

The chart be­low shows data col­lected us­ing

kvtest.c on five dif­fer­ent sys­tems:

All ma­chines use SSD ex­cept Win7 which has a hard-drive. The test data­base is 100K blobs with sizes uni­formly dis­trib­uted be­tween 8K and 12K, for a to­tal of about 1 gi­ga­byte of con­tent. The data­base page size is 4KiB. The -DSQLITE_DIRECT_OVERFLOW_READ com­pile-time op­tion was used for all of these tests. Tests were run mul­ti­ple times. The first run was used to warm up the cache and its tim­ings were dis­carded.

The chart be­low shows av­er­age time to read a blob di­rectly from the filesys­tem ver­sus the time needed to read the same blob from the SQLite data­base. The ac­tual tim­ings vary con­sid­er­ably from one sys­tem to an­other (the Ubuntu desk­top is much faster than the Galaxy S3 phone, for ex­am­ple). This chart shows the ra­tio of the times needed to read blobs from a file di­vided by the time needed to from the data­base. The left-most col­umn in the chart is the nor­mal­ized time to read from the data­base, for ref­er­ence.

In this chart, an SQL state­ment (“SELECT v FROM kv WHERE k=?1”) is pre­pared once. Then for each blob, the blob key value is bound to the ?1 pa­ra­me­ter and the state­ment is eval­u­ated to ex­tract the blob con­tent.

The chart shows that on Windows10, con­tent can be read from the SQLite data­base about 5 times faster than it can be read di­rectly from disk. On Android, SQLite is only about 35% faster than read­ing from disk.

Chart 1: SQLite read la­tency rel­a­tive to di­rect filesys­tem reads.

100K blobs, avg 10KB each, ran­dom or­der us­ing SQL

The per­for­mance can be im­proved slightly by by­pass­ing the SQL layer and read­ing the blob con­tent di­rectly us­ing the

sqlite3_blob_read() in­ter­face, as shown in the next chart:

Further per­for­mance im­proves can be made by us­ing the

mem­ory-mapped I/O fea­ture of SQLite. In the next chart, the en­tire 1GB data­base file is mem­ory mapped and blobs are read (in ran­dom or­der) us­ing the sqlite3_blob_read() in­ter­face. With these op­ti­miza­tions, SQLite is twice as fast as Android or MacOS-X and over 10 times faster than Windows.

The third chart shows that read­ing blob con­tent out of SQLite can be twice as fast as read­ing from in­di­vid­ual files on disk for Mac and Android, and an amaz­ing ten times faster for Windows.

Writes are slower. On all sys­tems, us­ing both di­rect I/O and SQLite, write per­for­mance is be­tween 5 and 15 times slower than reads.

Write per­for­mance mea­sure­ments were made by re­plac­ing (overwriting) an en­tire blob with a dif­fer­ent blob. All of the blobs in these ex­per­i­ment are ran­dom and in­com­press­ible. Because writes are so much slower than reads, only 10,000 of the 100,000 blobs in the data­base are re­placed. The blobs to be re­placed are se­lected at ran­dom and are in no par­tic­u­lar or­der.

The di­rect-to-disk writes are ac­com­plished us­ing fopen()/​fwrite()/​fclose(). By de­fault, and in all the re­sults shown be­low, the OS filesys­tem buffers are never flushed to per­sis­tent stor­age us­ing fsync() or FlushFileBuffers(). In other words, there is no at­tempt to make the di­rect-to-disk writes trans­ac­tional or power-safe. We found that in­vok­ing fsync() or FlushFileBuffers() on each file writ­ten causes di­rect-to-disk stor­age to be about 10 times or more slower than writes to SQLite.

The next chart com­pares SQLite data­base up­dates in WAL mode

against raw di­rect-to-disk over­writes of sep­a­rate files on disk. The PRAGMA syn­chro­nous set­ting is NORMAL. All data­base writes are in a sin­gle trans­ac­tion. The timer for the data­base writes is stopped af­ter the trans­ac­tion com­mits, but be­fore a check­point is run. Note that the SQLite writes, un­like the di­rect-to-disk writes, are trans­ac­tional and power-safe, though be­cause the syn­chro­nous set­ting is NORMAL in­stead of FULL, the trans­ac­tions are not durable.

Chart 4: SQLite write la­tency rel­a­tive to di­rect filesys­tem writes.

10K blobs, avg size 10KB, ran­dom or­der,

WAL mode with syn­chro­nous NORMAL,

ex­clu­sive of check­point time

The an­droid per­for­mance num­bers for the write ex­per­i­ments are omit­ted be­cause the per­for­mance tests on the Galaxy S3 are so ran­dom. Two con­sec­u­tive runs of the ex­act same ex­per­i­ment would give wildly dif­fer­ent times. And, to be fair, the per­for­mance of SQLite on an­droid is slightly slower than writ­ing di­rectly to disk.

The next chart shows the per­for­mance of SQLite ver­sus di­rect-to-disk when trans­ac­tions are dis­abled (PRAGMA jour­nal_­mode=OFF) and PRAGMA syn­chro­nous is set to OFF. These set­tings put SQLite on an equal foot­ing with di­rect-to-disk writes, which is to say they make the data prone to cor­rup­tion due to sys­tem crashes and power fail­ures.

In all of the write tests, it is im­por­tant to dis­able anti-virus soft­ware prior to run­ning the di­rect-to-disk per­for­mance tests. We found that anti-virus soft­ware slows down di­rect-to-disk by an or­der of mag­ni­tude whereas it im­pacts SQLite writes very lit­tle. This is prob­a­bly due to the fact that di­rect-to-disk changes thou­sands of sep­a­rate files which all need to be checked by anti-virus, whereas SQLite writes only changes the sin­gle data­base file.

The -DSQLITE_DIRECT_OVERFLOW_READ com­pile-time op­tion causes SQLite to by­pass its page cache when read­ing con­tent from over­flow pages. This helps data­base reads of 10K blobs run a lit­tle faster, but not all that much faster. SQLite still holds a speed ad­van­tage over di­rect filesys­tem reads with­out the SQLITE_DIRECT_OVERFLOW_READ com­pile-time op­tion.

Other com­pile-time op­tions such as us­ing -O3 in­stead of -Os or us­ing -DSQLITE_THREADSAFE=0 and/​or some of the other

rec­om­mended com­pile-time op­tions might help SQLite to run even faster rel­a­tive to di­rect filesys­tem reads.

The size of the blobs in the test data af­fects per­for­mance. The filesys­tem will gen­er­ally be faster for larger blobs, since the over­head of open() and close() is amor­tized over more bytes of I/O, whereas the data­base will be more ef­fi­cient in both speed and space as the av­er­age blob size de­creases.

SQLite is com­pet­i­tive with, and usu­ally faster than, blobs stored in sep­a­rate files on disk, for both read­ing and writ­ing.

SQLite is much faster than di­rect writes to disk on Windows when anti-virus pro­tec­tion is turned on. Since anti-virus soft­ware is and should be on by de­fault in Windows, that means that SQLite is gen­er­ally much faster than di­rect disk writes on Windows.

Reading is about an or­der of mag­ni­tude faster than writ­ing, for all sys­tems and for both SQLite and di­rect-to-disk I/O.

I/O per­for­mance varies widely de­pend­ing on op­er­at­ing sys­tem and hard­ware. Make your own mea­sure­ments be­fore draw­ing con­clu­sions.

Some other SQL data­base en­gines ad­vise de­vel­op­ers to store blobs in sep­a­rate files and then store the file­name in the data­base. In that case, where the data­base must first be con­sulted to find the file­name be­fore open­ing and read­ing the file, sim­ply stor­ing the en­tire blob in the data­base gives much faster read and write per­for­mance with SQLite. See the Internal Versus External BLOBs ar­ti­cle for more in­for­ma­tion.

The kvtest pro­gram is com­piled and run on Android as fol­lows. First in­stall the Android SDK and NDK. Then pre­pare a script named android-gcc” that looks ap­prox­i­mately like this:

Make that script ex­e­cutable and put it on your $PATH. Then com­pile the kvtest pro­gram as fol­lows:

Next, move the re­sult­ing kvtest-an­droid ex­e­cutable to the Android de­vice:

Finally use adb shell” to get a shell prompt on the Android de­vice, cd into the /data/local/tmp di­rec­tory, and be­gin run­ning the tests as with any other unix host.

This page last mod­i­fied on 2023-12-05 14:43:20 UTC

...

Read the original on sqlite.org »

2 285 shares, 14 trendiness

Europe Is in Danger of Regulating Its Tech Market Out of Existence

In June, Apple an­nounced a new prod­uct called Apple Intelligence. It’s be­ing sold as a new suite of fea­tures for the iPhone, iPad, and Mac that will use ar­ti­fi­cial in­tel­li­gence to help you write and edit emails, cre­ate new pic­tures and emo­jis, and gen­er­ally ac­com­plish all kinds of tasks. There’s just one prob­lem if you’re a European user ea­ger to get your hands on it: Apple won’t be re­leas­ing it in Europe.

In June, Apple an­nounced a new prod­uct called Apple Intelligence. It’s be­ing sold as a new suite of fea­tures for the iPhone, iPad, and Mac that will use ar­ti­fi­cial in­tel­li­gence to help you write and edit emails, cre­ate new pic­tures and emo­jis, and gen­er­ally ac­com­plish all kinds of tasks. There’s just one prob­lem if you’re a European user ea­ger to get your hands on it: Apple won’t be re­leas­ing it in Europe.

In June, Apple an­nounced a new prod­uct called Apple Intelligence. It’s be­ing sold as a new suite of fea­tures for the iPhone, iPad, and Mac that will use ar­ti­fi­cial in­tel­li­gence to help you write and edit emails, cre­ate new pic­tures and emo­jis, and gen­er­ally ac­com­plish all kinds of tasks. There’s just one prob­lem if you’re a European user ea­ger to get your hands on it: Apple won’t be re­leas­ing it in Europe.

The com­pany said in a state­ment that an en­tire suite of new prod­ucts and fea­tures in­clud­ing Apple Intelligence, SharePlay screen shar­ing, and iPhone screen mir­ror­ing would not be re­leased in European Union coun­tries be­cause of the reg­u­la­tory re­quire­ments im­posed by the EUs Digital Markets Act (DMA). European Commission Executive Vice President Margrethe Vestager called the de­ci­sion a stunning de­c­la­ra­tion” of anti-com­pet­i­tive be­hav­ior.

Vestager’s state­ment is ridicu­lous on its face: A tech gi­ant choos­ing not to re­lease a prod­uct in­vites more com­pe­ti­tion, not less, and more im­por­tantly, this is ex­actly what you’d ex­pect to hap­pen given Europe’s reg­u­la­tory stance.

The econ­o­mist Albert Hirschman once de­scribed the two op­tions in an un­fa­vor­able en­vi­ron­ment as voice” and exit.” The most com­mon op­tion is voice—at­tempt to ne­go­ti­ate, re­pair the sit­u­a­tion, and com­mu­ni­cate to­ward bet­ter con­di­tions. But the more dras­tic op­tion is exit—choos­ing to leave the un­fa­vor­able en­vi­ron­ment en­tirely. That’s more com­mon for peo­ple or po­lit­i­cal move­ments, but it’s grow­ing in­creas­ingly rel­e­vant to tech­nol­ogy in Europe.

Apple’s de­ci­sion is­n’t the first time that poorly de­signed reg­u­la­tions have pushed tech com­pa­nies to block fea­tures or ser­vices in spe­cific coun­tries. Last year, Facebook re­moved all news con­tent in Canada in re­sponse to the coun­try’s Online News Act, which re­sulted in smaller news out­lets los­ing busi­ness. In 2014, Google News with­drew from Spain over a link tax,” caus­ing lower traf­fic for Spanish news sites, re­turn­ing only when the law was changed. Numerous tech­nol­ogy firms have left China due to the power the Chinese Communist Party ex­erts over for­eign cor­po­ra­tions.

Adult sites are block­ing users in a va­ri­ety of U. S. states over age ver­i­fi­ca­tion laws. Meta de­layed the EU roll­out of its Twitter (now X) com­peti­tor Threads over reg­u­la­tory con­cerns, though it did even­tu­ally launch there. The firm, in a move that mir­rors the Apple Intelligence de­ci­sion, has also de­clined to re­lease its cut­ting-edge Llama AI mod­els in the EU, cit­ing regulatory un­cer­tainty.” Technology com­pa­nies have tra­di­tion­ally in­vested large amounts of money in voice strate­gies, lob­by­ing of­fi­cials and try­ing to im­prove poorly writ­ten laws. But they are in­creas­ingly aware of their abil­ity to exit, es­pe­cially in the European con­text. And Europe’s reg­u­la­tory ap­proach risks cre­at­ing a balka­nized splinternet,” where in­ter­na­tional tech gi­ants may choose to with­draw from the European con­ti­nent.

If that seems far-fetched, con­sider other re­cent cases. Europe re­cently charged Meta with breach­ing EU reg­u­la­tions over its pay or con­sent” plan. Meta’s busi­ness is built around per­son­al­ized ads, which are worth far more than non-per­son­al­ized ads. EU reg­u­la­tors re­quired that Meta pro­vide an op­tion that did not in­volve track­ing user data, so Meta cre­ated a paid model that would al­low users to pay a fee for an ad-free ser­vice.

This was al­ready a sig­nif­i­cant con­ces­sion—per­son­al­ized ads are so valu­able that one an­a­lyst es­ti­mated paid users would bring in 60 per­cent less rev­enue. But EU reg­u­la­tors are now in­sist­ing this model also breaches the rules, say­ing that Meta fails to pro­vide a less per­son­al­ized but equiv­a­lent ver­sion of Meta’s so­cial net­works. They’re de­mand­ing that Meta pro­vide free full ser­vices with­out per­son­al­ized ads or a monthly fee for users. In a very real sense, the EU has ruled that Meta’s core busi­ness model is il­le­gal. Non-personalized ads can­not eco­nom­i­cally sus­tain Meta’s ser­vices, but it’s the only so­lu­tion EU reg­u­la­tors want to ac­cept.

Or con­sider the re­cent charges the EU levied against X. Under Elon Musk’s own­er­ship, any­one can now pur­chase a blue check with a paid sub­scrip­tion, whereas blue checks were pre­vi­ously re­served for no­table fig­ures. EU reg­u­la­tors sin­gled out the new sys­tem for blue checks as a de­cep­tive busi­ness prac­tice that vi­o­lates the bloc’s Digital Services Act.

These charges are ab­surd. For one, the change in the blue check sys­tem was widely ad­ver­tised and dom­i­nated head­lines for months—as well as dom­i­nat­ing dis­cus­sion on the site it­self. The idea that users have been de­ceived by one of the loud­est and most dis­cussed prod­uct changes in the site’s his­tory is silly. And be­yond that, the EUs po­si­tion is es­sen­tially X can­not change the mean­ing of the blue check fea­ture—it is per­ma­nently bound to the EUs in­ter­pre­ta­tion of what a blue check should mean.” This goes far be­yond com­pe­ti­tion or pri­vacy con­cerns; this is the EU straight­for­wardly mak­ing prod­uct de­ci­sions on be­half of a com­pany.

A fi­nal ex­am­ple comes from France, where reg­u­la­tors are prepar­ing to charge Nvidia with anti-com­pet­i­tive prac­tices re­lated to its CUDA soft­ware. CUDA is a free soft­ware sys­tem de­vel­oped by Nvidia to run on its chips that al­lows other pro­grams to more ef­fi­ciently uti­lize GPUs in cal­cu­la­tions. It’s one of the main rea­sons Nvidia has been so suc­cess­ful—the soft­ware makes its chips more pow­er­ful, and no com­peti­tor has de­vel­oped com­pa­ra­ble tech­nol­ogy. It’s ex­actly the kind of in­no­v­a­tive re­search that should be re­warded, but French reg­u­la­tors seem to view Nvidia’s decades-long in­vest­ment in CUDA as a crime.

These ex­am­ples all share a few key fea­tures. They’re all ac­tions aimed at suc­cess­ful for­eign tech com­pa­nies—not sur­pris­ing since the EUs rules all but en­sure there are no com­pa­ra­bly suc­cess­ful European com­pa­nies. They’re all in­stances of reg­u­la­tory over­reach, where the EU is try­ing to dic­tate prod­uct de­ci­sions or rule en­tire busi­ness strate­gies il­le­gal. And cru­cially, the sizes of the pos­si­ble fines in play are so large that they may end up scar­ing com­pa­nies off the con­ti­nent.

EU pol­icy al­lows for fines of up to 10 per­cent of global rev­enue. Analyst Ben Thompson re­ports that Meta only gets 10 per­cent of its rev­enue from the EU and Apple only 7 per­cent. Nvidia does not pro­vide ex­act re­gional num­bers, but it’s likely that the EU pro­vides less than 10 per­cent of its rev­enue as well. And this is rev­enue, not profit. A sin­gle fine of that mag­ni­tude would be more profit than these com­pa­nies make in the EU in sev­eral years and de­stroy the eco­nomic ra­tio­nale for op­er­at­ing there. With global-sized pun­ish­ments for inane lo­cal is­sues, Europe is much closer than it re­al­izes to sim­ply dri­ving tech com­pa­nies away.

Europe’s reg­u­la­tors may in­sist that if com­pa­nies sim­ply fol­lowed the rules, they’d be able to make their prof­its with­out the threat of fines. This is patently un­true in the case of Meta, where the EU has ruled out every prac­ti­cal busi­ness strat­egy for fund­ing its op­er­a­tions. But it’s also im­pos­si­ble writ large be­cause the EU of­ten does­n’t write clear rules in ad­vance. Instead, the DMA re­quires busi­nesses to meet ab­stract goals, and reg­u­la­tors de­cide af­ter­ward whether the com­pany is in com­pli­ance or not. The bur­den does not ex­ist on the EU to write con­crete rules with spe­cific re­quire­ments but on the com­pa­nies to read the reg­u­la­tory tea leaves and de­ter­mine what steps to take. It’s an ar­bi­trary and poorly de­signed sys­tem, and com­pa­nies can hardly be blamed for look­ing to the exit.

Ultimately, Europe needs to fig­ure out what it wants from the world’s tech­nol­ogy in­dus­try. At times, it seems as if Europe has given up on try­ing to in­no­vate or suc­ceed in the tech sec­tor. The con­ti­nent takes more pride in be­ing a leader in reg­u­la­tion than a leader in in­no­va­tion, and its tech in­dus­try is a round­ing er­ror com­pared with that in the United States or China.

What few suc­cess sto­ries it has, such as France’s Mistral, risk be­ing stran­gled by reg­u­la­tory ac­tions. How would Mistral, a lead­ing AI firm, sur­vive if Nvidia ex­its the French mar­ket due to reg­u­la­tory con­cerns? There is no sub­sti­tute for Nvidia’s cut­ting-edge chips.

Europeans could end up liv­ing in an on­line back­wa­ter with out-of-date phones, cut off from the rest of the world’s search en­gines and so­cial me­dia sites, un­able to even ac­cess high-per­for­mance com­puter chips.

As a sov­er­eign body, the EU is within its rights to leg­is­late tech as ar­bi­trar­ily and harshly as it would like. But politi­cians such as Vestager don’t get to then act shocked and out­raged when tech com­pa­nies choose to leave. Right now, most tech com­pa­nies are still at­tempt­ing to work within the sys­tem and make Europe’s reg­u­la­tions more ra­tio­nal. But if voice fails over and over, exit is all that’s left. And in Europe, it’s an in­creas­ingly ra­tio­nal choice.

...

Read the original on foreignpolicy.com »

3 269 shares, 15 trendiness

The New Internet

Avery Pennarun is the CEO and co-founder of Tailscale. A ver­sion of this post was orig­i­nally pre­sented at a com­pany all-hands.

We don’t talk a lot in pub­lic about the big vi­sion for Tailscale, why we’re re­ally here. Usually I pre­fer to fo­cus on what ex­ists right now, and what we’re go­ing to do in the next few months. The fu­ture can be dis­tract­ing.

But in­creas­ingly, I’ve found com­pa­nies are start­ing to buy Tailscale not just for what it does now, but for the big things they ex­pect it’ll do in the fu­ture. They’re right! Let’s look at the biggest of big pic­tures for a change.

But first, let’s go back to where we started.

David Crawshaw’s first post that laid out what we were do­ing, long long ago in the late twenty-teens, was called Remembering the LAN, about his ex­pe­ri­ence do­ing net­work­ing back in the 1990s.

I have bad news: if you re­mem­ber do­ing LANs back in the 1990s, you are prob­a­bly old. Quite a few of us here at Tailscale re­mem­ber do­ing LANs in the 1990s. That’s an age gap com­pared to a lot of other star­tups. That age gap makes Tailscale un­usual.

Anything un­usual about a startup can be an ad­van­tage or a dis­ad­van­tage, de­pend­ing what you do with it.

Here’s an­other word for old” but with dif­fer­ent con­no­ta­tions.

I’m a per­son that likes look­ing on the bright side. There are dis­ad­van­tages to be­ing old, like I maybe can’t do a 40-hour cod­ing binge like I used to when I wrote my first VPN, called Tunnel Vision, in 1997. But there are ad­van­tages, like maybe we have enough ex­pe­ri­ence to do things right the first time, in fewer hours. Sometimes. If we’re lucky.

And maybe, you know, if you’re old enough, you’ve seen the tech cy­cle go round a few times and you’re start­ing to see a few pat­terns.

That was us, me and the Davids, when we started Tailscale. What we saw was, a lot of things have got­ten bet­ter since the 1990s. Computers are lit­er­ally mil­lions of times faster. 100x as many peo­ple can be pro­gram­mers now be­cause they aren’t stuck with just C++ and as­sem­bly lan­guage, and many, many, many more peo­ple now have some kind of com­puter. Plus app stores, pay­ment sys­tems, graph­ics. All good stuff.

But, also things have got­ten worse. A lot of day-to-day things that used to be easy for de­vel­op­ers, are now hard. That was un­ex­pected. I did­n’t ex­pect that. I ex­pected I’d be out of a job by now be­cause pro­gram­ming would be so easy.

Instead, the tech in­dus­try has evolved into an ab­solute mess. And it’s get­ting worse in­stead of bet­ter! Our tower of com­plex­ity is now so tall that we se­ri­ously con­sider slather­ing LLMs on top to write the in­com­pre­hen­si­ble code in the in­com­pre­hen­si­ble frame­works so we don’t have to.

And you know, we old peo­ple are the ones who have the con­text to see that.

It’s all fix­able. It does­n’t have to be this way.

Before I can tell you a vi­sion for the fu­ture I have to tell you what I think went wrong.

Programmers to­day are im­pa­tient for suc­cess. They start plan­ning for a bil­lion users be­fore they write their first line of code. In fact, nowa­days, we train them to do this with­out even know­ing they’re do­ing it. Everything they’ve ever been taught re­volves around scal­ing.

We’ve been falling into this trap all the way back to when com­puter sci­en­tists started teach­ing big-O no­ta­tion. In big-O no­ta­tion, if you use it wrong, a hash table is sup­pos­edly faster than an ar­ray, for vir­tu­ally any­thing you want to do. But in re­al­ity, that’s not al­ways true. When you have a bil­lion en­tries, maybe a hash table is faster. But when you have 10 en­tries, it al­most never is.

People have a hard time with this idea. They keep pick­ing the al­go­rithms and ar­chi­tec­tures that can scale up, even when if you don’t scale up, a dif­fer­ent thing would be thou­sands of times faster, and also eas­ier to build and run.

Even I can barely be­lieve I just said thou­sands of times eas­ier and I was­n’t ex­ag­ger­at­ing.

I read a post re­cently where some­one bragged about us­ing ku­ber­netes to scale all the way up to 500,000 page views per month. But that’s 0.2 re­quests per sec­ond. I could serve that from my phone, on bat­tery power, and it would spend most of its time asleep.

In mod­ern com­put­ing, we tol­er­ate long builds, and then docker builds, and up­load­ing to con­tainer stores, and multi-minute de­ploy times be­fore the pro­gram runs, and even longer times be­fore the log out­put gets up­loaded to some­where you can see it, all be­cause we’ve been tricked into this idea that every­thing has to scale. People get ex­cited about de­ploy­ing to the lat­est up­start con­tainer host­ing ser­vice be­cause it only takes tens of sec­onds to roll out, in­stead of min­utes. But on my slow com­puter in the 1990s, I could run a perl or python pro­gram that started in mil­lisec­onds and served way more than 0.2 re­quests per sec­ond, and printed logs to stderr right away so I could edit-run-de­bug over and over again, mul­ti­ple times per minute.

How did we get here?

We got here be­cause some­times, some­one re­ally does need to write a pro­gram that has to scale to thou­sands or mil­lions of back­ends, so it needs all that… stuff. And wish­ful think­ing makes peo­ple imag­ine even the lowli­est dash­board could be that pop­u­lar one day.

The truth is, most things don’t scale, and never need to. We made Tailscale for those things, so you can spend your time scal­ing the things that re­ally need it. The long tail of jobs that are 90% of what every de­vel­oper spends their time on. Even de­vel­op­ers at com­pa­nies that make stuff that scales to bil­lions of users, spend most of their time on stuff that does­n’t, like dash­boards and meme gen­er­a­tors.

As an in­dus­try, we’ve spent all our time mak­ing the hard things pos­si­ble, and none of our time mak­ing the easy things easy.

Programmers are all stuck in the mud. Just lis­ten to any pro­fes­sional de­vel­oper, and ask what per­cent­age of their time is spent ac­tu­ally solv­ing the prob­lem they set out to work on, and how much is spent on junky over­head.

It’s true here too. Our de­vel­oper ex­pe­ri­ence at Tailscale is bet­ter than av­er­age. But even we have largely the same ex­pe­ri­ence. Modern soft­ware de­vel­op­ment is mostly junky over­head.

In fact, we did­n’t found Tailscale to be a net­work­ing com­pany. Networking did­n’t come into it much at all at first.

What re­ally hap­pened was, me and the Davids got to­gether and we said, look. The prob­lem is de­vel­op­ers keep scal­ing things they don’t need to scale, and their lives suck as a re­sult. (For most pro­gram­mers you can imag­ine the wiping your tears with a hand­ful of dol­lar bills” meme here.) We need to fix that. But how?

We looked at a lot of op­tions, and talked to a lot of peo­ple, and there was an un­der­ly­ing cause for all the prob­lems. The Internet. Things used to be sim­ple. Remember the LAN? But then we con­nected our LANs to the Internet, and there’s been more and more fire­walls and at­tack­ers every­where, and things have slowly been de­grad­ing ever since.

When we ex­plore the world of over-com­plex­ity, most of it has what we might call, no es­sen­tial com­plex­ity. That is, the prob­lems can be solved with­out com­plex­ity, but for some rea­son the so­lu­tions we use are com­pli­cated any­way. For ex­am­ple, log­ging sys­tems. They just stream text from one place to an­other, but some­how it takes 5 min­utes to show up. Or or­ches­tra­tion sys­tems: they’re pro­grams whose only job is to run other pro­grams, which Unix ker­nels have done just fine, within mil­lisec­onds, for decades. People layer on piles of goop. But the goop can be re­moved.

You can’t build mod­ern soft­ware with­out net­work­ing. But the Internet makes every­thing hard. Is it be­cause net­work­ing has es­sen­tial com­plex­ity?

Well, maybe. But maybe it’s only com­plex when you built it on top of the wrong as­sump­tions, that re­sult in the wrong prob­lems, that you then have to pa­per over. That’s the Old Internet.

Instead of adding more lay­ers at the very top of the OSI stack to try to hide the prob­lems, Tailscale is build­ing a new OSI layer 3 — a New Internet — on top of new as­sump­tions that avoid the prob­lems in the first place.

If we fix the Internet, a whole chain of domi­noes can come falling down, and we reach the next stage of tech­nol­ogy evo­lu­tion.

If you want to know the bot­tle­neck in any par­tic­u­lar eco­nomic sys­tem, look for who gets to charge rent. In the tech world, that’s AWS. Sure, Apple’s there sell­ing pop­u­lar lap­tops, but you could buy a dif­fer­ent lap­top or a dif­fer­ent phone. And Microsoft was the gate­keeper for every­thing, once, but you don’t have Windows lock-in any­more, un­less you choose to. All those the web is the new op­er­at­ing sys­tem” peo­ple of the early 2000s fi­nally won, we just for­got to cel­e­brate.

But the lib­er­a­tion did­n’t last long. If you de­ploy soft­ware, you prob­a­bly pay rent to AWS.

Why is that? Compute, right? AWS pro­vides scal­able com­put­ing re­sources.

Well, you’d think so. But lots of peo­ple sell com­put­ing re­sources way cheaper. Even a mid-range Macbook can do 10x or 100x more trans­ac­tions per sec­ond on its SSD than a sup­pos­edly fast cloud lo­cal disk, be­cause cloud providers sell that disk to 10 or 100 peo­ple at once while charg­ing you full price. Why would you pay ex­or­bi­tant fees in­stead of host­ing your mis­sion-crit­i­cal web­site on your su­per fast Macbook?

We all know why:

Location, lo­ca­tion, lo­ca­tion. You pay ex­or­bi­tant rents to cloud providers for their com­put­ing power be­cause your own com­puter is­n’t in the right place to be a de­cent server.

It’s be­hind a fire­wall and a NAT and a dy­namic IP ad­dress and prob­a­bly an asym­met­ric net­work link that drops out just of­ten enough to make you ner­vous.

You could fix the net­work link. You could re­con­fig­ure the fire­wall, and port for­ward through the NAT, I guess, and if you’re lucky you could pay your ISP an ex­or­bi­tant rate for a sta­tic IP, and maybe get a re­dun­dant Internet link, and I know some of my cowork­ers ac­tu­ally did do all that stuff on a rack in their garage. But it’s all a lot of work, and re­quires ex­per­tise, and it’s far away from build­ing the stu­pid dash­board or blog or cat video web­site you wanted to build in the first place. It’s so much eas­ier to just pay a host­ing provider who has all the IP ad­dresses and net­work band­width money can buy.

And then, if you’re go­ing to pay some­one, and you’re a se­ri­ous com­pany, you’d bet­ter buy it from some­one se­ri­ous, be­cause now you have to host your stuff on their equip­ment which means they have ac­cess to… every­thing, so you need to trust them not to mis­use that ac­cess.

You know what, no­body ever got fired for buy­ing AWS.

That’s an IBM anal­ogy. We used to say, no­body ever got fired for buy­ing IBM. I doubt that’s true any­more. Why not?

IBM main­frames still ex­ist, and they prob­a­bly al­ways will, but IBM used to be able to charge rent on every as­pect of busi­ness com­put­ing, and now they can’t. They started los­ing in­flu­ence when Microsoft ar­rived, steal­ing fire from the gods of cen­tral­ized com­put­ing and bring­ing it back to in­di­vid­u­als us­ing com­par­a­tively tiny un­der­pow­ered PCs on every desk, in every home, run­ning Microsoft soft­ware.

I credit Microsoft with build­ing the first wide­spread dis­trib­uted com­put­ing sys­tems, even though all the early net­works were some vari­ant of sneak­er­net.

I think we can agree that we’re now in a post-Mi­crosoft, web-first world. Neat. Is this world a cen­tral­ized one like IBM, or a dis­trib­uted one like Microsoft?

[When I did this as a talk, I took a poll: it was about 50/50]

So, bad news. The pen­du­lum has swung back the other way. IBM was cen­tral­ized, then Microsoft was dis­trib­uted, and now the cloud+phone world is cen­tral­ized again.

We’ve built a gi­ant cen­tral­ized com­puter sys­tem, with a few megaproviders in the mid­dle, and a bunch of dumb ter­mi­nals on our desks and in our pock­ets. The dumb ter­mi­nals, even our smart watches, are all su­per­com­put­ers by the stan­dards of 20 years ago, if we used them that way. But they’re not much bet­ter than a VT100. Turn off AWS, and they’re all bricks.

It’s easy to fool our­selves into think­ing the over­all sys­tem is dis­trib­uted. Yes, we build fancy dis­trib­uted con­sen­sus sys­tems and our servers have mul­ti­ple in­stances. But all that runs cen­trally on cloud providers.

This is­n’t new. IBM was do­ing multi-core com­put­ing and vir­tual ma­chines back in the 1960s. It’s the same thing over again now, just with 50 years of Moore’s Law on top. We still have a big mo­nop­oly that gets to charge every­one rent be­cause they’re the gate­keeper over the only thing that re­ally mat­ters.

Everyone’s at­ti­tude is still stuck in the 1990s, when op­er­at­ing sys­tems mat­tered. That’s how Microsoft stole the fire from IBM and ruled the world, be­cause writ­ing portable soft­ware was so hard that if you wanted to… in­ter­con­nect… one pro­gram to an­other, if you wanted things to be com­pat­i­ble at all, you had to run them on the same com­puter, which meant you had to stan­dard­ize the op­er­at­ing sys­tem, and that op­er­at­ing sys­tem was DOS, and then Windows.

The web un­did that mo­nop­oly. Now javascript mat­ters more than all the op­er­at­ing sys­tems put to­gether, and there’s a new el­e­ment that con­trols whether two pro­grams can talk to each other: HTTPS. If you can HTTPS from one thing to an­other, you can in­ter­con­nect. If you can’t, for­get it.

And HTTPS is fun­da­men­tally a cen­tral­ized sys­tem. It has a client, and a server. A dumb ter­mi­nal, and a thing that does the work. The server has a sta­tic IP ad­dress, a DNS name, a TLS cer­tifi­cate, and an open port. A client has none of those things. A server can keep do­ing what­ever it wants if all the clients go away, but if the servers go away, a client does noth­ing.

We did­n’t get here on pur­pose, mostly. It was just path de­pen­dence. We had se­cu­rity prob­lems and an IPv4 ad­dress short­age, so we added fire­walls and NATs, so con­nec­tions be­came one way from client ma­chines to server ma­chines, and so there was no point putting cer­tifi­cates on clients, and nowa­days there are 10 dif­fer­ent rea­sons a client can’t be a server, and every­one is used to it, so we de­sign every­thing around it. Dumb ter­mi­nals and cen­tral­ized servers.

Once that hap­pened, of course some com­pany popped up to own the cen­ter of the hub-and-spoke net­work. AWS does that cen­ter bet­ter than every­one else, fair and square. Someone had to. They won.

Okay, fast for­ward. We’ve spent the last 5 years mak­ing Tailscale the so­lu­tion to that prob­lem. Every de­vice gets a cert. Every de­vice gets an IP ad­dress and a DNS name and end-to-end en­cryp­tion and an iden­tity, and safely by­passes fire­walls. Every de­vice can be a peer. And we do it all with­out adding any la­tency or over­head.

That’s the New Internet. We built it! It’s the fu­ture, it’s just un­evenly dis­trib­uted, so far. For peo­ple with Tailscale, we’ve al­ready sliced out 10 lay­ers of non­sense. That’s why de­vel­op­ers re­act so vis­cer­ally once they get it. Tailscale makes the Internet work how you thought the Internet worked, be­fore you learned how the Internet works.

I like to use Taildrop as an ex­am­ple of what that makes pos­si­ble. Taildrop is a lit­tle fea­ture we spent a few months on back when we were tiny. We should spend more time pol­ish­ing to make it even eas­ier to use. But at its core, it’s a demo app. As long as you have Tailscale al­ready, Taildrop is just one HTTP PUT op­er­a­tion. The sender makes an HTTP re­quest to the re­ceiver, says here’s a file named X”, and sends the file. That’s it. It’s the most ob­vi­ous thing in the world. Why would you do it any other way?

Well, be­fore Tailscale, you did­n’t have a choice. The re­ceiver is an­other client de­vice, not a server. So it was be­hind a fire­wall, with no open ports and no iden­tity. Your only op­tion was to up­load the file to the cloud and then down­load it again, even if the sender and re­ceiver are side by side on the same wifi. But that means you pay cloud fees for net­work egress, and stor­age, and the CPU time for run­ning what­ever server pro­gram is man­ag­ing all that stuff. And if you up­load the file and no­body down­loads it, you need a rule for when to delete it from stor­age. And also you pay fees just in case to keep the server on­line, even when you’re not us­ing it at all. Also, cloud em­ploy­ees can the­o­ret­i­cally ac­cess the file un­less you en­crypt it. But you can’t en­crypt it with­out ex­chang­ing en­cryp­tion keys some­how be­tween sender and re­cip­i­ent. And how does the re­ceiver even know a file is there wait­ing to be re­ceived in the first place? Do we need a push no­ti­fi­ca­tion sys­tem? For every client plat­form? And so on. Layers, and lay­ers, and lay­ers of gunk.

And all that gunk means rent to cloud providers. Transferring files — one of the first things peo­ple did on the Internet, for no ex­tra charge, via FTP — now has to cost money, be­cause some­body has got to pay that rent.

With Taildrop, it does­n’t cost money. Not be­cause we’re gen­er­ously drain­ing our bank ac­counts to make file trans­fers free. It’s be­cause the cost over­head is gone al­to­gether, be­cause it’s not built on the same de­volved Internet every­one else has been us­ing.

Taildrop is just an ex­am­ple, a triv­ial one, but it’s an ex­is­tence proof for a whole class of pro­grams that can be 10x eas­ier just be­cause Tailscale ex­ists.

The chain of domi­noes starts with con­nec­tiv­ity. Lack of con­nec­tiv­ity is why we get cen­tral­iza­tion, and cen­tral­iza­tion is why we pay rent for every tiny lit­tle pro­gram we want to run and why every­thing is slow and te­dious and com­pli­cated and hard to de­bug like an IBM batch job. And we’re about to start those domi­noes falling.

The glimpse at these pos­si­bil­i­ties is why our users get ex­cited about Tailscale, more than they’ve ever been ex­cited about some VPN or proxy, be­cause there’s some­thing un­der­neath our kind of VPN that you can’t get any­where else. We’re re­mov­ing lay­ers, and lay­ers, and lay­ers of com­plex­ity, and mak­ing it eas­ier to work on what you wanted to work on in the first place. Not every­body sees it yet, but they will. And when they do, they’re go­ing to be able to in­vent things we could never imag­ine in the old cen­tral­ized world, just like the Windows era of dis­trib­uted com­put­ing made things pos­si­ble that were un­think­able on a main­frame.

But there’s one catch. If we’re go­ing to un­tan­gle the hair­ball of con­nec­tiv­ity, that con­nec­tiv­ity has to ap­ply to…

There’s go­ing to be a new world of haves and have-nots. Where in 1970 you had or did­n’t have a main­frame, and in 1995 you had or did­n’t have the Internet, and to­day you have or don’t have a TLS cert, to­mor­row you’ll have or not have Tailscale. And if you don’t, you won’t be able to run apps that only work in a post-Tailscale world.

And if not enough peo­ple have Tailscale, no­body will build those apps. That’s called a chicken-and-egg prob­lem.

This is why our com­pany strat­egy sounds so odd at first glance. It’s why we spend so much ef­fort giv­ing Tailscale away for free, but also so much ef­fort get­ting peo­ple to bring it to work, and so much ef­fort do­ing tan­gen­tial en­ter­prise fea­tures so ex­ec­u­tives can eas­ily roll it out to whole Fortune 500 com­pa­nies.

The Internet is for every­one. You know, there were in­ter­net­works (lowercase) be­fore the Internet (capitalized). They all lost, be­cause the Internet was the most di­verse and in­clu­sive of all. To the peo­ple build­ing the Internet, noth­ing mat­tered but get­ting every­one con­nected. Adoption was slow at first, then fast, then re­ally fast, and to­day, if I buy a wrist­watch and it does­n’t have an Internet link, it’s bro­ken.

We won’t have built a New Internet if nerds at home can’t play with it. Or nerds at uni­ver­si­ties. Or em­ploy­ees at en­ter­prises. Or, you know, even­tu­ally every per­son every­where.

There re­main a lot of steps be­tween here and there. But, let’s save those de­tails for an­other time. Meanwhile, how are we do­ing?

Well, about 1 in 20,000 peo­ple in the world uses the New Internet (that’s Tailscale). We’re not go­ing to stop un­til it’s all of them.

I’m old enough to re­mem­ber when peo­ple made fun of Microsoft for their thing about putting a com­puter on every desk. Or when TCP/IP was an op­tional add-on you had to buy from a third party.

You know, all that was less than 30 years ago. I’m old, but come to think of it, I’m not that old. The tech world changes fast. It can change for the bet­ter. We’re just get­ting started.

...

Read the original on tailscale.com »

4 242 shares, 15 trendiness

Courts close the loophole letting the feds search your phone at the border

The Fourth Amendment still ap­plies at the bor­der, de­spite the feds’ in­sis­tence that it does­n’t.

For years, courts have ruled that the gov­ern­ment has the right to con­duct rou­tine, war­rant­less searches for con­tra­band at the bor­der. Customs and Border Protection (CBP) has taken ad­van­tage of that loop­hole in the Fourth Amendment’s pro­tec­tion against un­rea­son­able searches and seizures to force trav­el­ers to hand over data from their phones and lap­tops.

But on Wednesday, Judge Nina Morrison in the Eastern District of New York ruled that cell­phone searches are a nonroutine” search, more akin to a strip search than scan­ning a suit­case or pass­ing a trav­eler through a metal de­tec­tor.

Although the in­ter­ests of stop­ping con­tra­band are undoubtedly served when the gov­ern­ment searches the lug­gage or pock­ets of a per­son cross­ing the bor­der car­ry­ing ob­jects that can only be in­tro­duced to this coun­try by be­ing phys­i­cally moved across its bor­ders, the ex­tent to which those in­ter­ests are served when the gov­ern­ment searches data stored on a per­son’s cell phone is far less clear,” the judge de­clared.

Morrison noted that reviewing the in­for­ma­tion in a per­son’s cell phone is the best ap­prox­i­ma­tion gov­ern­ment of­fi­cials have for min­dread­ing,” so search­ing through cell­phone data has an even heav­ier pri­vacy im­pact than rum­mag­ing through phys­i­cal pos­ses­sions. Therefore, the court ruled, a cell­phone search at the bor­der re­quires both prob­a­ble cause and a war­rant. Morrison did not dis­tin­guish be­tween scan­ning a phone’s con­tents with spe­cial soft­ware and man­u­ally flip­ping through it.

And in a vic­tory for jour­nal­ists, the judge specif­i­cally ac­knowl­edged the First Amendment im­pli­ca­tions of cell­phone searches too. She cited re­port­ing by The Intercept and VICE about CPB search­ing jour­nal­ists’ cell­phones based on these jour­nal­ists’ on­go­ing cov­er­age of po­lit­i­cally sen­si­tive is­sues” and warned that those phone searches could put con­fi­den­tial sources at risk.

Wednesday’s rul­ing adds to a stream of cases re­strict­ing the feds’ abil­ity to search trav­el­ers’ elec­tron­ics. The 4th and 9th Circuits, which cover the mid-At­lantic and Western states, have ruled that bor­der po­lice need at least reasonable sus­pi­cion” of a crime to search cell­phones. Last year, a judge in the Southern District of New York also ruled that the gov­ern­ment may not copy and search an American cit­i­zen’s cell phone at the bor­der with­out a war­rant ab­sent ex­i­gent cir­cum­stances.”

Wednesday’s rul­ing in­volves de­fend­ing the rights of an un­sym­pa­thetic char­ac­ter. U. S. cit­i­zen Kurbonali Sultanov al­legedly down­loaded a sketchy Russian porn trove, in­clud­ing sev­eral im­ages of child sex abuse, which landed him on a gov­ern­ment watch list. When Sultanov was on the way back from vis­it­ing his fam­ily in Uzbekistan, agents from the Department of Homeland Security pulled him aside at the air­port and searched his phone, find­ing the im­ages.

Morrison sup­pressed the ev­i­dence from the phone search but not Sultanov’s spontaneous” state­ment ad­mit­ting to down­load­ing the videos. And her or­der would not have pre­vented the po­lice from get­ting Sultanov’s phone the old-fash­ioned way. Sultanov had al­legedly down­loaded the porn while in the United States, and his name popped up on the watch list two months be­fore his re­turn flight. And, in fact, the feds did ob­tain a court or­der to search Sultanov’s spare phone.

The Southern District of New York rul­ing last year also in­volved an un­sym­pa­thetic char­ac­ter. Jatiek Smith, a mem­ber of the Bloods gang, was be­ing in­ves­ti­gated for a violent and ex­tor­tion­ate takeover” of New York’s fire mit­i­ga­tion in­dus­try. When Smith flew home from a va­ca­tion in Jamaica, the FBI took ad­van­tage of the op­por­tu­nity to search Smith’s phone at the bor­der.

A judge sup­pressed the ev­i­dence from the phone search, but Smith was con­victed any­way. In both cases, the feds could have got­ten a war­rant for the sus­pects’ phones; they saw the bor­der loop­hole as a way to skip that step.

In fact, CBP Officer Marves Pichardo ad­mit­ted that these searches are of­ten war­rant­less fish­ing ex­pe­di­tions. CBP searches U. S. cit­i­zen’s phones if they’re com­ing from countries that have po­lit­i­cal dif­fi­cul­ties at this point in time and that we’re cur­rently look­ing at for in­tel­li­gence and stuff like that,” Pichardo tes­ti­fied dur­ing an ev­i­dence sup­pres­sion hear­ing. He as­serted that CBP agents can look at pretty much any­thing that’s stored on the phone” and that pas­sen­gers are usu­ally very com­pli­ant.”

Because of the pow­ers the gov­ern­ment was claim­ing, civil lib­er­tar­i­ans in­ter­vened in the Sultanov case. The Knight First Amendment Institute at Columbia University and the Reporters Committee for Freedom of the Press filed an am­i­cus brief in October 2023 ar­gu­ing that war­rant­less phone searches are a grave threat to the Fourth Amendment right to pri­vacy as well as the First Amendment free­doms of the press, speech, and as­so­ci­a­tion.” Morrison heav­ily cited that brief in her rul­ing.

As the court rec­og­nized, let­ting bor­der agents freely ri­fle through jour­nal­ists’ work prod­uct and com­mu­ni­ca­tions when­ever they cross the bor­der would pose an in­tol­er­a­ble risk to press free­dom,” Grayson Clary, staff at­tor­ney at the Reporters Committee for Freedom of the Press, said in a state­ment sent to re­porters. This thor­ough opin­ion pro­vides pow­er­ful guid­ance for other courts grap­pling with this is­sue, and makes clear that the Constitution would re­quire a war­rant be­fore search­ing a re­porter’s elec­tronic de­vices.”

...

Read the original on reason.com »

5 229 shares, 6 trendiness

Stripe acquires Lemon Squeezy • Lemon Squeezy

Woohoo! We’re ex­cited and hum­bled to an­nounce that @stripe has ac­quired @lmsqueezy.

In 2020, when the world gave us lemons, we de­cided to make lemon­ade. We imag­ined a world where sell­ing dig­i­tal prod­ucts would be as sim­ple as open­ing a lemon­ade stand. We dreamed of a plat­form that would take the pain out of sell­ing glob­ally.

Tax headaches, fraud pre­ven­tion, han­dling charge­backs, li­cense key man­age­ment, and file de­liv­ery, among other things, are com­pli­cated.

We be­lieved it should be sim­ple.

We be­lieved it should be easy-peasy.

As founders, we’ve spent a decade sell­ing dig­i­tal prod­ucts, and so we cre­ated a so­lu­tion that met our own needs. But what started as an idea to solve the day-to-day prob­lems of sell­ing dig­i­tal prod­ucts evolved into some­thing much big­ger. Nine months af­ter our pub­lic launch in 2021, we sur­passed $1M in ARR and never looked back.

We worked tire­lessly through grow­ing pains while also cel­e­brat­ing ma­jor mile­stones along the way. Each step re­in­forced that we were onto some­thing re­mark­able.

Along the way, we re­ceived many ac­qui­si­tion of­fers and (Series A) term sheets from in­vestors. But de­spite the al­lure of these op­por­tu­ni­ties, we knew that what we had built was truly spe­cial and needed the right part­ner to take it to the next level.

We’re proud to say that we’ve found that part­ner in Stripe and have gone from idea to ac­qui­si­tion in un­der three years.

Stripe con­tin­ues to set the bar in the pay­ments in­dus­try with its world-class de­vel­oper ex­pe­ri­ence, API stan­dards, and ded­i­ca­tion to beauty and craft. It’s no se­cret that we (like many) have al­ways ad­mired Stripe.

When we be­gan dis­cus­sions about a po­ten­tial ac­qui­si­tion, it was im­me­di­ately ap­par­ent that our val­ues and mis­sion were per­fectly aligned.

Lemon Squeezy and Stripe share a deep love for our cus­tomers and a com­mit­ment to mak­ing sell­ing ef­fort­less.

Now imag­ine com­bin­ing every­thing you love about Lemon Squeezy and Stripe — we be­lieve it’s a match made in heaven.

Lemon Squeezy is now packed with 1,000% more juice.

Lemon Squeezy has been pro­cess­ing pay­ments on Stripe since our in­cep­tion. This ac­qui­si­tion marks the cul­mi­na­tion of years of ef­fort and cel­e­brates our close part­ner­ship with Stripe and our shared sense of pur­pose.

Going for­ward, our mis­sion re­mains the same: make sell­ing dig­i­tal prod­ucts easy-peasy.

With Stripe’s help, we’ll con­tinue to im­prove the mer­chant of record of­fer­ing, bol­ster­ing billing sup­port, build­ing an even more in­tu­itive cus­tomer ex­pe­ri­ence, and more.

We’re in­cred­i­bly ex­cited about the pos­si­bil­i­ties that lie ahead with the Lemon Squeezy and Stripe teams join­ing forces. The fu­ture is bright.

Rest as­sured, we’ll con­tinue de­liv­er­ing the same fan­tas­tic prod­uct and re­li­a­bil­ity you’ve come to trust. We’ll be in touch as we work through this process with any up­dates as they come along. We’re ex­cited about find­ing the best ways to com­bine Lemon Squeezy and Stripe.

At Lemon Squeezy, you (our won­der­ful cus­tomers) are at the heart of every­thing we do. We pride our­selves on cre­at­ing in­tu­itive, cus­tomer-fo­cused prod­ucts backed by top-notch cus­tomer ser­vice.

We re­main as com­mit­ted as ever.

Over the years, our com­mu­nity has grown ex­po­nen­tially. This growth is a tes­ta­ment to the trust and sup­port you’ve shown us, and we could­n’t be more grate­ful.

We owe a huge thank you to our team, com­mu­nity, and sup­port­ers. Thousands of com­pa­nies con­tinue to choose to sell glob­ally through Lemon Squeezy, and we’ll never take that for granted.

Thank you for be­ing part of our jour­ney. We look for­ward to all the fan­tas­tic things we will achieve to­gether with Stripe.

...

Read the original on www.lemonsqueezy.com »

6 227 shares, 11 trendiness

How a 30 Year Old Idea Allows for New Tricks

When I re­cently in­ter­viewed Mike Clark, he told me, …you’ll see the ac­tual foun­da­tional lift play out in the fu­ture on Zen 6, even though it was re­ally Zen 5 that set the table for that.” And at that same Zen 5 ar­chi­tec­ture event,  AMDs Chief Technology Officer Mark Papermaster said, Zen 5 is a ground-up re­design of the Zen ar­chi­tec­ture,” which has brought nu­mer­ous and im­pact­ful changes to the de­sign of the core.

The most sub­stan­tial of these changes may well be the brand-new 2-Ahead Branch Predictor Unit, an ar­chi­tec­tural en­hance­ment with roots in pa­pers from three decades ago. But be­fore div­ing into this both old yet new idea, let’s briefly re­visit what branch pre­dic­tors do and why they’re so crit­i­cal in mod­ern mi­cro­proces­sor cores.

Ever since com­put­ers be­gan op­er­at­ing on pro­grams stored in pro­gram­ma­ble, ran­domly ac­ces­si­ble mem­ory, ar­chi­tec­tures have been split into a front end that fetches in­struc­tions and a back end re­spon­si­ble for per­form­ing those op­er­a­tions. A front end must also sup­port ar­bi­trar­ily mov­ing the point of cur­rent pro­gram ex­e­cu­tion to al­low ba­sic func­tion­al­ity like con­di­tional eval­u­a­tion, loop­ing, and sub­rou­tines.

If a proces­sor could sim­ply per­form the en­tire task of fetch­ing an in­struc­tion, ex­e­cut­ing it, and se­lect­ing the next in­struc­tion lo­ca­tion in uni­son, there would be lit­tle else to dis­cuss here. However, in­ces­sant de­mands for per­for­mance have dic­tated that proces­sors per­form more op­er­a­tions in the same unit time with the same amount of cir­cuitry, tak­ing us from 5 kHz with ENIAC to the 5+ GHz of some con­tem­po­rary CPUs like Zen 5, and this has ne­ces­si­tated pipelined logic. A proces­sor must ac­tu­ally main­tain in par­al­lel the in­cre­men­tally com­pleted par­tial states of log­i­cally chrono­log­i­cally dis­tinct op­er­a­tions.

Keeping this pipeline filled is im­me­di­ately chal­lenged by the ex­is­tence of con­di­tional jump­ing within a pro­gram. How can the front end know what in­struc­tions to be­gin fetch­ing, de­cod­ing, and dis­patch­ing when a jump’s con­di­tion might be a sub­stan­tial num­ber of clock cy­cles away from fin­ish­ing eval­u­a­tion? Even un­con­di­tional jumps with a sta­t­i­cally known tar­get ad­dress pre­sent a prob­lem when fetch­ing and de­cod­ing an in­struc­tion needs more than a sin­gle pipeline stage.

The two ul­ti­mate re­sponses of this prob­lem are to ei­ther sim­ply wait when the need is de­tected or to make a best ef­fort guess at what to do next and be able to un­wind dis­cov­ered mis­takes. Unwinding bad guesses must be done by flush­ing the pipeline of work con­tin­gent on the bad guess and restart­ing at the last known good point. A stall taken on a branch con­di­tion is ef­fec­tively un­mit­i­ga­ble and pro­por­tional in size to the num­ber of stages be­tween the in­struc­tion fetch and the branch con­di­tion eval­u­a­tion com­ple­tion in the pipeline. Given this and the com­pet­i­tive pres­sures to not waste through­put, proces­sors have lit­tle choice but to at­tempt guess­ing pro­gram in­struc­tion se­quences as ac­cu­rately as pos­si­ble.

Imagine for a mo­ment that you are a de­liv­ery dri­ver with­out a map or GPS who must lis­ten to on-the-fly nav­i­ga­tion from col­leagues in the back of the truck. Now fur­ther imag­ine that your win­dows are com­pletely blacked out and that your bud­dies only tell you when you were sup­posed to turn 45 sec­onds past the in­ter­sec­tion you could­n’t even see. You can start to em­pathize and be­gin to un­der­stand the strug­gles of the in­struc­tion fetcher in a pipelined proces­sor. The art of branch pre­dic­tion is the uni­verse of strate­gies that are avail­able to re­duce the rate that this woe­fully af­flicted dri­ver has to stop and back up.

Naive strate­gies like al­ways tak­ing short back­wards jumps (turning on to a cir­cu­lar drive) can and his­tor­i­cally did pro­vide sub­stan­tial ben­e­fit over al­ways fetch­ing the next largest in­struc­tion mem­ory ad­dress (just keep dri­ving straight). However, if some small amount of state is al­lowed to be main­tained, much bet­ter re­sults in real pro­grams can be achieved. If the blinded truck anal­ogy has­n’t worn too thin yet, imag­ine the dri­ver keep­ing a small set of notes of re­cent turns taken or skipped and hand-drawn scrib­bles of how roads dri­ven in the last few min­utes were arranged and what in­ter­sec­tions were passed. These are equiv­a­lent to things like branch his­tory and ad­dress records, and struc­tures in the 10s of kilo­bytes have yielded branch pre­dic­tion per­cent­ages in the up­per 90s. This ar­ti­cle will not at­tempt to cover the enor­mous space of re­search and com­mer­cial so­lu­tions here, but un­der­stand­ing at least the be­gin­nings of the mo­ti­va­tions here is valu­able.

The 2-Ahead Branch Predictor is a pro­posal that dates back to the early 90s. Even back then the chal­lenge of scal­ing out ar­chi­tec­tural widths of 8 or more was be­ing talked about and a 2-Ahead Branch Predictor was one of the meth­ods that acad­e­mia put forth in or­der to con­tinue squeez­ing more and more per­for­mance out of a sin­gle core.

But as com­mer­cial ven­dors moved from a sin­gle core CPU to multi-core CPUs, the size of each in­di­vid­ual core started to be­come a big­ger and big­ger fac­tor in CPU core de­sign so acad­e­mia started fo­cus­ing on more area ef­fi­cient meth­ods to in­crease per­for­mance with the biggest de­vel­op­ment be­ing the TAGE pre­dic­tor. The TAGE pre­dic­tor is much more area ef­fi­cient com­pared to older branch pre­dict­ing meth­ods so again acad­e­mia fo­cused on im­prov­ing TAGE pre­dic­tors.

But with logic nodes al­low­ing for more and more tran­sis­tors in a sim­i­lar area along with mov­ing from dual and quad core CPUs to CPUs with hun­dreds of out of or­der CPUs, we have started to fo­cus more and more on sin­gle core per­for­mance rather than just scal­ing fur­ther and fur­ther up. So while some of these ideas are quite old, older than I in fact, they are start­ing to resur­face as com­pa­nies try and fig­ure out ways to in­crease the per­for­mance of a sin­gle core.

It is worth ad­dress­ing an as­pect of x86 that al­lows it to ben­e­fit dis­pro­por­tion­ately more from 2-ahead branch pre­dic­tion than some other ISAs might. Architectures with fixed-length in­struc­tions, like 64-bit Arm, can triv­ially de­code ar­bi­trary sub­sets of an in­struc­tion cache line in par­al­lel by sim­ply repli­cat­ing de­coder logic and slic­ing up the in­put data along guar­an­teed in­struc­tion byte bound­aries. On the far op­po­site end of the spec­trum sits x86, which re­quires pars­ing in­struc­tion bytes lin­early to de­ter­mine where each sub­se­quent in­struc­tion bound­ary lies. Pipelining (usually par­tially de­cod­ing length-de­ter­min­ing pre­fixes first) makes a par­al­leliza­tion of some de­gree tractable, if not cheap, which re­sulted in 4-wide de­cod­ing be­ing com­mon­place in per­for­mance-ori­ented x86 cores for nu­mer­ous years.

While in­creas­ing logic den­sity with newer fab nodes has even­tu­ally made so­lu­tions like Golden Cove’s 6-wide de­cod­ing com­mer­cially vi­able, the area and power costs of mono­lithic par­al­lel x86 de­cod­ing are most def­i­nitely su­per-lin­ear with width, and there is not any­thing re­sem­bling an easy path for­ward with con­tin­ued ex­pan­sions here. It is per­haps mer­ci­ful for Intel and AMD that typ­i­cal ap­pli­ca­tion in­te­ger code has a sub­stan­tial branch den­sity, on the or­der of one every five to six in­struc­tions, which di­min­ishes the mo­ti­va­tion to pur­sue par­al­lelized de­coders much wider than that.

The es­cape valve that x86 front ends need more than any­thing is for the in­her­ently non-par­al­leliz­able por­tion of de­cod­ing, i.e., the de­ter­mi­na­tion of the in­struc­tion bound­aries. If only there was some way to eas­ily skip ahead in the de­cod­ing and be mag­i­cally guar­an­teed you landed on an even in­struc­tion bound­ary…

Starting with the pa­per ti­tled Multiple-block ahead branch pre­dic­tors” by Seznec et al., it lays out the why and how of the rea­son­ing and im­ple­men­ta­tion needed to make a 2-Ahead Branch Predictor.

Looking into the pa­per, you’ll see that im­ple­ment­ing a branch pre­dic­tor that can deal with mul­ti­ple taken branches per cy­cle is not as sim­ple as just hav­ing a branch pre­dic­tor that can deal with mul­ti­ple taken branches. To be able to use a 2-Ahead Branch Predictor to its fullest, with­out ex­plod­ing area re­quire­ments, Seznac et al. rec­om­mended dual-port­ing the in­struc­tion fetch.

When we look at Zen 5, we see that dual port­ing the in­struc­tion fetch and the op cache is ex­actly what AMD has done. AMD now has two 32 Byte per cy­cle fetch pipes from the 32KB L1 in­struc­tion cache, each feed­ing its own 4-wide de­code clus­ter. The Op Cache is now a dual-ported 6 wide de­sign which can feed up to 12 operands to the Op Queue.

Now, Seznac et al. also rec­om­mends dual port­ing the Branch Target Buffer (BTB). A dual-ported L1 BTB could ex­plain the mas­sive 16K en­tries that the L1 BTB has ac­cess to. As for the L2 BTB, it’s not quite as big as the L1 BTB at only 8K en­tries but AMD is us­ing it in a man­ner sim­i­lar to how a vic­tim cache would be used. So en­tries that get evicted out of the L1 BTB, end up in the L2 BTB.

With all these changes, Zen 5 can now deal with 2 taken branches per cy­cle across a non-con­tigu­ous block of in­struc­tions.

This should re­duce the hit to fetch band­width when Zen 5 hits a taken branch as well as al­low­ing AMD to pre­dict past the 2 taken branches.

Zen 5 can look far­ther for­ward in the in­struc­tion stream be­yond the 2nd taken branch and as a re­sult Zen 5 can have 3 pre­dic­tion win­dows where all 3 win­dows are use­ful in pro­duc­ing in­struc­tions for de­cod­ing. The way that this works is that a 5 bit length field is at­tached to the 2nd pre­dic­tion win­dow which pre­vents the over sub­scrip­tion of the de­code or op cache re­sources. This 5 bit length field while smaller than a pointer does give you the start of the 3rd pre­dic­tion win­dow. One ben­e­fit of this is that if the 3rd win­dow crosses a cache line bound­ary, the pre­dic­tion lookup in­dex does­n’t need to store ex­tra state for the next cy­cle. However a draw­back is that if the 3rd pre­dic­tion win­dow is in the same cache line as the 1st or 2nd pre­dic­tion win­dow, that par­tial 3rd win­dow is­n’t as ef­fec­tive as hav­ing a 3rd full pre­dic­tion win­dow.

Now when Zen 5 has two threads ac­tive, the de­code clus­ters and the ac­com­pa­ny­ing fetch pipes are sta­t­i­cally par­ti­tioned. This means that to act like a dual fetch core, Zen 5 will have to fetch out of both the L1 in­struc­tion cache as well as out of the Op Cache. This maybe the rea­son why AMD dual-ported the op cache so that they can bet­ter in­sure that they can keep the dual fetch pipeline go­ing.

In the end, this new 2-Ahead Branch Predictor is a ma­jor shift for the Zen fam­ily of CPU ar­chi­tec­tures mov­ing for­ward and is go­ing to give new branch pre­dic­tion ca­pa­bil­i­ties that will likely serve the fu­ture de­vel­op­ments of the Zen core in good stead as they re­fine and im­prove this branch pre­dic­tor.

If you like our ar­ti­cles and jour­nal­ism, and you want to sup­port us in our en­deav­ors, then con­sider head­ing over to our Pa­treon or our Pay­Pal if you want to toss a few bucks our way. If you would like to talk with the Chips and Cheese staff and the peo­ple be­hind the scenes, then con­sider join­ing our Dis­cord.

If you want to learn more about how mul­ti­ple fetch proces­sors work then I would highly rec­om­mend the pa­pers be­low as they helped with my un­der­stand­ing of how this whole sys­tem works:

...

Read the original on chipsandcheese.com »

7 218 shares, 11 trendiness

Why does the chromaticity diagram look like that?

Why does the chro­matic­ity di­a­gram look like that?

I’ve al­ways wanted to un­der­stand color the­ory, so I started read­ing about the XYZ color space which looked like it was the mother of all color spaces. I had no idea what that meant, but it was cre­ated in 1931 so study­ing 93-year old re­search seemed like a good place to start.

When read­ing about the XYZ color space, this cursed im­age keeps pop­ping up:

I say cursed” be­cause I have no idea what that means. What the heck is that shape??

I could­n’t find any rea­son­ably clear an­swer to my ques­tion. It’s ob­vi­ously not a for­mula like x = func(y). Why is it that shape, and where did the col­ors come from? Obviously the edges are wave­lengths which have a spe­cific color, but how did the im­age above com­pute every pixel?

I be­came ob­sessed with this ques­tion. Below is the path I took to try to an­swer it.

I’ll spoil the an­swer but it might not make sense un­til you read this ar­ti­cle: the shape comes from how our eyes per­ceive red, green, and blue rel­a­tive to each other. Skip to the last sec­tion if you want to see some di­rect ex­am­ples.

The fill col­ors in­side the shape are an­other story, but a sim­ple ex­pla­na­tion is there is some math to cal­cu­late the mix­ture of col­ors and we can draw the above by sam­pling mil­lions of points in the space and ren­der­ing them onto the 2d im­age.

The first place to start is color match­ing func­tions. These func­tions de­ter­mine the strength of spe­cific wave­lengths (color) to con­tribute so that our eyes per­ceive a tar­get wave­length (color). We have 3 color match­ing func­tions for red, green, and blue (at wave­lengths 700, 546, and 435 re­spec­tively), and these func­tions spec­ify how to mix RGB to so that we vi­su­ally see a spec­tral color.

More sim­ply put: imag­ine that you have red, green, and blue light sources. What is the in­ten­sity of each one so that the re­sult­ing light matches a spe­cific color on the spec­trum?

Note that these are spec­tral col­ors: mono­chro­matic light with a sin­gle wave­length. Think of col­ors on the rain­bow. Many col­ors are not spec­tral, and are a mix of many spec­tral col­ors.

The CIE 1931 color space de­fines these RGB color match­ing func­tions. The red, green, and blue lines rep­re­sent the in­ten­sity of each RGB light source:

Note: this plot uses the table from the orig­i­nal study. This raw data must not be used any­more be­cause I could­n’t find it any­where. I had to ex­tract it my­self from an ap­pen­dix in the orig­i­nal re­port.

Given a wave­length on the X axis, you can see how to mix” the RGB wave­lengths to pro­duce the tar­get color.

How did they come up with these? They sci­en­tif­i­cally stud­ied how our eyes mix RGB col­ors by sit­ting peo­ple down in a room with mul­ti­ple light sources. One light source was the tar­get color, and the other side had red, green, and blue light sources. People had to ad­just the strength of the RGB sources un­til it matched the tar­get color. They lit­er­ally had peo­ple man­u­ally ad­just lights and recorded the val­ues! There’s a great ar­ti­cle that ex­plains the ex­per­i­ments in more de­tail.

There’s a big prob­lem with the above func­tions. Can you see it? What do you think a neg­a­tive red light source means?

It’s non­sense! That means with this model, given pure RGB lights, there are cer­tain spec­tral col­ors that are im­pos­si­ble to recre­ate. However, this data is still in­cred­i­bly use­ful and we can trans­form it into some­thing mean­ing­ful.

Introducing the XYZ color match­ing func­tions. The XYZ color space is sim­ply the RGB color space, but mul­ti­plied with a ma­trix to trans­form it a bit. The im­por­tant part is this is a lin­ear trans­form: it’s lit­er­ally the same thing, just re­shaped a lit­tle.

I found a raw table for the XYZ color match­ing func­tions here and this is what it looks like. The CIE 1931 XYZ color match­ing func­tions:

Wikipedia de­fines the RGB ma­trix trans­form as this:

ma­trix = [

2.364613, -0.89654, -0.468073,

-0.515166, 1.426408, 0.088758,

0.005203, -0.014408, 1.009204

[R, G, B] = ma­trix * [X, Y, Z]

We can take the XYZ table and trans­form it with the above ma­trix, and do­ing so pro­duces this graph. Look fa­mil­iar? This is ex­actly what the RGB graph above looks like (plotted di­rectly from the data table)!

Wikipedia also doc­u­ments an an­a­lyt­i­cal ap­prox­i­ma­tion of this data, which means we can use math­e­mat­i­cal func­tions to gen­er­ate the data in­stead of us­ing ta­bles. Press view source” to see the al­go­rithm:

Ok, so we have these color match­ing func­tions. When dis­play­ing these col­ors with RGB lights though, we can’t even show all of the spec­tral col­ors. Transforming it into XYZ space, where every­thing is pos­i­tive, fixes the num­bers but what’s the point if we still can’t phys­i­cally show them?

The XYZ space de­scribes all col­ors, even col­ors that are im­pos­si­ble to dis­play. It’s be­come a stan­dard space to en­code col­ors in a de­vice-in­de­pen­dent way, and it’s up to a spe­cific de­vice to in­ter­pret them into a space that it can phys­i­cally pro­duce. This is nice be­cause we have a stan­dard way to en­code color in­for­ma­tion with­out re­strict­ing the pos­si­bil­i­ties of the fu­ture — as de­vices be­come bet­ter at dis­play­ing more and more col­ors, they can au­to­mat­i­cally start dis­play­ing them with­out re­quir­ing any in­fra­struc­ture changes.

Now let’s get back to that cursed shape. That’s ac­tu­ally a chro­matic­ity di­a­gram, which is objective spec­i­fi­ca­tion of the qual­ity of a color re­gard­less of its lu­mi­nance”.

We can de­rive the chro­matic­ity for a color by tak­ing the XYZ val­ues for it di­vid­ing each by the to­tal:

const x = X / (X + Y + Z)

const y = Y / (X + Y + Z)

const z = Z / (X + Y + Z) = 1 - x - y

We don’t ac­tu­ally need z be­cause we can de­rive it given x and y. Hence we have the xy chro­matic­ity di­a­gram”. Remember how I said it’s a 3d curve pro­jected onto a 2d space? We’ve done that by just drop­ping z.

If we want to go back to XYZ from xy, we need the Y value. This is called the xyY color space and is an­other way to en­code col­ors.

Alright, let’s try this out. Let’s take the RGB table we ren­dered above, and plot the chro­matic­ity. We do this by us­ing the above func­tions, and plot­ting the x and y points (the col­ors are a ba­sic es­ti­ma­tion):

Hey! Look at that! That looks fa­mil­iar. Why is it so slanted though? If you look at the x axis, it ac­tu­ally goes into neg­a­tive! That’s be­cause the RGB data is rep­re­sent­ing im­pos­si­ble col­ors.

Let’s use an RGB to XYZ ma­trix to trans­form it into XYZ space (the op­po­site of what we did be­fore, where we trans­formed XYZ into RGB) space. If we ren­der the same data but trans­formed, it looks like this:

Now that’s look­ing re­ally fa­mil­iar!

Just to dou­ble-check, let’s ren­der the chro­matic­ity of the XYZ table data. Note that we have more gran­u­lar data here, so there are more points, but it matches:

Ok, so what about col­ors? How do we fill the mid­dle part with all the col­ors? Note: this is where I re­ally start to get out my league, but here’s my best at­tempt.

What if we it­er­ate over every sin­gle pixel in the can­vas and try to plot a color for it? The ques­tion is given x and y, how do we get a color?

Here are some steps:

We scale each x and y point in the can­vas to a value be­tween 0 and 1

Remember above I said we need the Y value to trans­form back into XYZ space? Turns out that the XYZ space in­ten­tion­ally made Y map to the lu­mi­nance value of a color, so that means we can… make it up?

What if we just try to use a lu­mi­nance value of 1?

That lets us gen­er­ate XYZ val­ues, which we then trans­late into sRGB space (don’t worry about the s there, it’s just RGB space with some gamma cor­rec­tion)

One im­me­di­ate prob­lem you hit this pro­duces many in­valid col­ors. We also want to ex­per­i­ment with dif­fer­ent val­ues of Y. The demo be­low has con­trols to cus­tomize its be­hav­ior: change Y from 0 to 1, and hide col­ors with el­e­ments be­low 0 or 255.

That’s neat! We’re get­ting some­where, and are ob­vi­ously con­strained by the RGB space. By de­fault, it clips col­ors with neg­a­tive val­ues and that pro­duces this tri­an­gle. Feels like the dots are start­ing to con­nect: the above im­age is clearly show­ing con­nec­tions be­tween XYZ/RGB and lim­i­ta­tions of rep­re­sentable col­ors.

Even more in­ter­est­ing is if you turn on clip col­ors max”. You only see a small slice of color, and you need to move the Y slider morph the shape to fill” the tri­an­gle. Almost like we’re mov­ing through 3d space.

For each point, there must be a dif­fer­ent Y value that is the most op­ti­mal rep­re­sen­ta­tion of that color. For ex­am­ple, blues are rich when Y is low, but greens are only rich when Y is higher.

I’m still con­fused how to fill that space within the chro­matic­ity di­a­gram, so let’s take a break.

Let’s cre­ate a spec­trum. Take the orig­i­nal color match­ing func­tion. Since that is telling us the XYZ val­ues needed to cre­ate a spec­tral color, should­n’t we be able to it­er­ate over the wave­lengths of vis­i­ble col­ors (400-720), get the XYZ val­ues for each one, and con­vert them to RGB and ren­der a spec­trum?

This looks pretty bad, but why? I found a nice ar­ti­cle about ren­der­ing spec­tra which seems like an­other deep hole. My prob­lems aren’t even close to that kind of ac­cu­racy; the above is­n’t re­motely close.

Turn out I need to con­vert XYZ to sRGB be­cause that’s what the rgb() color func­tion is as­sum­ing when ren­der­ing to can­vas. The main dif­fer­ence is gamma cor­rec­tion which is an­other topic.

We’ve learned that sRGB can only ren­der a sub­set of all col­ors, and turns out there are other color spaces we can use to tell browsers to ren­der more col­ors. The p3 wide gamut color space is larger than sRGB, and many browsers and dis­plays sup­port it now, so let’s test it.

You spec­ify this color space by us­ing the color func­tion in CSS, for ex­am­ple: color(dis­play-p3 r, g, b). I ran into the same prob­lems where the col­ors were all wrong, which was sur­pris­ing be­cause every­thing I read im­plied it was lin­ear. Turns out the p3 color space in browsers has the same gamma cor­rec­tion as sRGB, so I needed to in­clude that to get it to work:

If you are see­ing this on a wide gamut com­pat­i­ble browser and dis­play, you will see more in­tense col­ors. I love that this is a thing, and the idea that so many users are us­ing apps that could be more richly dis­played if they sup­ported p3.

I started hav­ing an ex­is­ten­tial cri­sis around this point. What are my eyes ac­tu­ally see­ing? How do dis­plays… ac­tu­ally work? Looking at the wide gamut spec­trum above, what hap­pens if I take a screen­shot of it in ma­cOS and send it to a user us­ing a dis­play that does­n’t sup­port p3?

To test this I started a zoom chat with a friend and shared my screen and showed them the wide gamut spec­trum and asked if they could see a dif­fer­ence (the top and bot­tom should look dif­fer­ent). Turns out they could! I have no idea if ma­cOS, zoom, or some­thing else is trans­lat­ing it into sRGB (thus downgrading” the col­ors) or ac­tu­ally trans­mit­ting p3. (Also, PNG sup­ports p3, but what do mon­i­tors that don’t sup­port it do?)

The sheer com­plex­ity of ab­strac­tions be­tween my eyes and pix­els is over­whelm­ing. There are so many lay­ers which han­dle read­ing and writ­ing the in­di­vid­ual pix­els on my screen, and mak­ing it all work across zoom chats, screen­shots, and every­thing is mak­ing my mind melt.

A lit­tle ques­tion: why does print­ing use the CMY color sys­tem with the pri­maries of cyan, ma­genta, and yel­low, while dig­i­tal dis­plays build pix­els with the pri­maries of reg, green, and blue? If cyan, ma­genta, and yel­low al­low a wider range of col­ors via mix­ing why is RGB bet­ter dig­i­tally? Answer: be­cause RGB is an ad­di­tive color sys­tem and CMY is a sub­trac­tive color sys­tem. Materials ab­sorb light, while dig­i­tal dis­plays emit light.

We’re not giv­ing up on fig­ur­ing out the col­ors of the chro­matic­ity di­a­gram yet.

I found this in­cred­i­ble ar­ti­cle about how to pop­u­late chro­matic­ity di­a­grams. I still have no idea if this is how the orig­i­nal ones were gen­er­ated. After all, the col­ors shown are just an ap­prox­i­ma­tion (your screen can’t ac­tu­ally dis­play the true col­ors near the edges), so maybe there’s some other kind of for­mula.

So that I can get back to my daily life and be pre­sent with my fam­ily, I’m ac­cept­ing that this is how those im­ages are gen­er­ated. Let’s try do it our­selves.

There’s no way to go from an x, y point in the can­vas to a color. There’s no for­mula tells us if it’s a valid point in space or how to ap­prox­i­mate a color for it.

We need to do the op­po­site: start with an value in the XYZ color space, com­pute an ap­prox­i­mate color, and plot it at the right point by con­vert­ing it into xy space. But how do we even find valid XYZ val­ues? Not all points are valid in­side that space (between 0 and 1 on all three axes). To do that we have to take an­other step back.

I got this tech­nique from the in­cred­i­ble ar­ti­cle linked above. What we’re try­ing to is ren­der all col­ors in ex­is­tence. Obviously we can’t ac­tu­ally do that, so we need an ap­prox­i­ma­tion. Here’s the ap­proach we’ll take:

First, we need to gen­er­ate an ar­bi­trary color. The only way to do this is to gen­er­ate a spec­tral line shape. Basically it’s a line across all wave­lengths (the X axis) that de­fines how much each wave­length con­tributes to the color.

To get the xy, co­or­di­nate on the can­vas, we need to get the XYZ val­ues for the color. To do that, we mul­ti­ply the XYZ color match­ing func­tions with the spec­tral line, and then take the in­te­gral of each line to get the fi­nal XYZ val­ues.

We do the same for the RGB color. We mul­ti­ply the RGB color match­ing func­tions with the spec­tral line and take the in­te­gral of each one for the fi­nal RGB color. (We’ll talk about the col­ors more later)

I don’t know if that made any sense, but here’s a demo which might help. The graph in the bot­tom left is the spec­tral line we are gen­er­at­ing. This rep­re­sents a spe­cific color, which is shown in the top left. Finally, on the right we plot the color on the chro­matic­ity di­a­gram by sum­ming up the area of the spec­tral line mul­ti­plied by the XYZ color match­ing func­tions.

We gen­er­ated the spec­tral line graph with two sim­ple sine curves with a spe­cific width and off­set. You can change the off­set of each curve with the slid­ers be­low. You can see that mov­ing those curves, which gen­er­ates a dif­fer­ent spec­tral line (and thus color) which plots dif­fer­ent points on the di­a­gram.

By ad­just­ing the slid­ers, you are ba­si­cally paint­ing the chro­matic­ity di­a­gram!

You can see how all of this works but press­ing view source” to see the code.

Obviously this is a very poor rep­re­sen­ta­tion of the chro­matic­ity di­a­gram. It’s dif­fi­cult to cover the whole area; ad­just­ing the off­set of the curves only al­lows you to walk through a sub­set of the en­tire space. We would need to change how we are gen­er­at­ing spec­tral lines to fully walk through the space.

Here’s a demo which at­tempts to au­to­mate this. It’s us­ing the same code as above, ex­cept it’s chang­ing both off­set and width of the curves and walk­ing through the space bet­ter:

I cre­ated an iso­lated code­pen if you want to play with this your­self. If you let this run for a while, you’ll end up with a shape like this:

We’re still not walk­ing through the full space, but it’s not bad! It at least… vaguely re­sem­bles the orig­i­nal di­a­gram?

Our col­or­ing is­n’t quite right. It’s miss­ing the white spot in the mid­dle and it’s too dark in cer­tain places. Let me ex­plain a lit­tle more how we gen­er­ated these col­ors.

After all, did­n’t we gen­er­ate RGB col­ors? If so, why weren’t they clipped and show­ing a tri­an­gle like be­fore? Or at least we should see more maxing out” of col­ors near the edges.

My first at­tempts at the above did show this. Here’s a pic­ture where I only took the in­te­gral to find the XYZ val­ues, and then took those val­ues and used XYZ_to_sRGB to trans­form them into RGB col­ors:

We do get more of the bright white spot in the mid­dle, but the col­ors are far too sat­u­rated. It’s clear that many of these col­ors are ac­tu­ally in­valid (they are not in be­tween 0 and 255).

Another tech­nique I learned from the in­cred­i­ble ar­ti­cle is to avoid us­ing the XYZ points to find the color, and in­stead do the same in­te­gra­tion over the RGB color map­ping func­tions. So we take our spec­tral line, mul­ti­ple by each of the RGB func­tions, and then take the sum of each re­sult to find the in­di­vid­ual RGB val­ues.

Even though this still pro­duces in­valid col­ors, in­tu­itively I can see how it more di­rectly maps onto the RGB space and pro­vides a bet­ter in­ter­po­la­tion.

That’s about as far as I got. I wish I had a bet­ter an­swer for how to gen­er­ate the col­ors here, and maybe you know? If so, give me a shout! I’m sat­is­fied with how far I got, and I bet the fi­nal an­swer uses slightly dif­fer­ent color match­ing func­tions or some­thing, but it does­n’t feel far off.

If you have ideas to im­prove this, please do so in this demo! I’d love to see any im­prove­ments.

I want to drive home that my above im­ple­men­ta­tion is still gen­er­at­ing in­valid col­ors. For ex­am­ple, if I add clip­ping and avoid ren­der­ing any col­ors with el­e­ments out­side of the 0-255 range, I get the fa­mil­iar sRGB tri­an­gle:

It turns out that even though col­ors out­side the tri­an­gle aren’t ren­der­ing ac­cu­rately, we’re still able to rep­re­sent a change of color be­cause only 1 or 2 of the RGB chan­nels have maxed out. If green maxes out, changes in the red and blue chan­nels will still show up.

But re­ally, why that spe­cific shape? I know it de­rives from how we per­ceive red, green, and blue rel­a­tive to each other. Let’s look at the XYZ color match­ing func­tions again:

The shape is de­rived from these shapes. To ren­der chro­matic­ity, you walk through each wave­length above and cal­cu­late the per­cent­age of each XYZ value of the to­tal. So there’s a di­rect re­la­tion­ship.

Let’s drive this home by gen­er­at­ing our own ran­dom color match­ing func­tions. We gen­er­ate them with some sim­ple sine waves (view source to see the code):

Now let’s ren­der the chro­matic­ity ac­cord­ing to our non­sen­si­cal color match­ing func­tions:

The shape is very dif­fer­ent! So that’s it: the shape is due to the XYZ color match­ing func­tions, which were de­rived from ex­per­i­ments that stud­ied how our eyes per­ceive red, green, and blue light. That’s why the chro­matic­ity di­a­gram rep­re­sents some­thing mean­ing­ful: it’s how our eyes per­ceive color.

Looking for old ar­ti­cles? See archive.jlong­ster.com

...

Read the original on jlongster.com »

8 210 shares, 1 trendiness

Effortless networking for your next great connection.

...

Read the original on www.moreoverlap.com »

9 203 shares, 13 trendiness

Reverse-engineering my speakers' API to get reasonable volume control

I got some fancy new speak­ers last week.

They’re pow­ered speak­ers and they have stream­ing ser­vice in­te­gra­tions built in, un­like the 35-year-old pas­sive speak­ers I’m up­grad­ing from. Overall they’re great! But they’re so loud that it’s dif­fi­cult to make small vol­ume ad­just­ments within the range of safe vol­ume lev­els for my apart­ment.

To solve that, I’m build­ing a cus­tom vol­ume knob for them them that will give me more pre­cise con­trol within the range I like to lis­ten in.

The speak­ers sound great, but they’re way louder than I need. I typ­i­cally use about 10% of the vol­ume range they’re ca­pa­ble of.

That makes it dif­fi­cult to set the vol­ume lev­els I pre­fer us­ing the meth­ods most con­ve­nient for me, which are ei­ther the reg­u­lar vol­ume con­trols on my phone or com­puter if I’m us­ing AirPlay or the vol­ume slider in Spotify if I’m us­ing Spotify Connect. Those meth­ods ei­ther give me a tiny slider that I can only use 10% of or about 15 steps where the jump from step 3 to step 4 takes the speak­ers from a bit too quiet” to definitely both­er­ing the neigh­bors” lev­els.

The amp that I used to use was over­pow­ered for my room too, but that was­n’t an is­sue for me be­cause those vol­ume con­trol meth­ods at­ten­u­ated the out­put of a mu­sic streamer that I had con­nected to the amp. With that sys­tem, I could set the am­pli­fier’s ana­log vol­ume knob such that the max vol­ume out of the streamer cor­re­sponded to my ac­tual max­i­mum pre­ferred lis­ten­ing vol­ume, giv­ing me ac­cess to the full range of Spotify or AirPlay’s vol­ume con­trols.

Some pow­ered speak­ers solve this is­sue by pro­vid­ing con­trol over the max vol­ume, ei­ther by a phys­i­cal knob or by a soft­ware set­ting, but un­for­tu­nately these JBLs do not.

While think­ing about this prob­lem, I re­mem­bered that some other net­work-con­nected au­dio de­vices I’ve en­coun­tered ex­pose un­doc­u­mented web in­ter­faces. I was cu­ri­ous if these speak­ers did, so I found their lo­cal IP ad­dress via my router and nav­i­gated to that IP in my browser.

Lo and be­hold, they do have one!

Sadly, the vol­ume slider there was still not as con­ve­nient as I’d like.

After ex­plor­ing that web in­ter­face for a few min­utes with my browser’s net­work dev tools, I found that the speak­ers ex­pose a pretty straight­for­ward HTTP API, in­clud­ing GET /api/getData and POST /api/setData, which al­low me to read and write the cur­rent vol­ume level, among other things.

I tried to find some doc­u­men­ta­tion of this API on­line, but the clos­est I could find was the source code for a Hombridge plu­gin for KEF

speak­ers. It seems like KEFs and JBLs net­work-con­nected speak­ers share some code, which is­n’t too sur­pris­ing given that they’re both owned by

Harman

(which is ap­par­ently owned by Samsung as of

2017!)

After that, I found that the speak­ers’ web in­ter­face has a page that al­lows me to down­load sys­tem logs, which turned out to in­clude a copy of the part of the filesys­tem that stores the cur­rent set­tings!

That helped me track down two spe­cific con­fig­u­ra­tion paths that looked promis­ing: player/​at­ten­u­a­tion and hostlink/​maxVol­ume.

Sadly, nei­ther of those turned out to be what I was look­ing for.

player/​at­ten­u­a­tion turned out to be an­other in­ter­face to the main vol­ume, more-or-less an alias of player:vol­ume.

hostlink/​maxVol­ume sounded like it could be ex­actly what I was hop­ing to find. Unfortunately, chang­ing it does­n’t seem to af­fect any­thing that I’ve no­ticed, and the API re­sponse im­plies that it has to do with

Arcam (yet an­other Samsung/Harman sub­sidiary):

If I could­n’t set a max vol­ume in­side the speaker, I could at least build my­self a cus­tom slider that only cov­ers the range of vol­umes I’m in­ter­ested in lis­ten­ing at.

To do that, I put to­gether a lit­tle web page with noth­ing but a full-width slider for set­ting the vol­ume, and I fi­nally have a way to choose rea­son­able lev­els!

I tried to do that in a sin­gle HTML file, but ran into CORS is­sues when send­ing re­quests to the speak­ers, so I put to­gether a tiny server us­ing Bun. With that, I was able to keep it down to a sin­gle TypeScript file with no de­pen­den­cies other than Bun it­self:

The web server here is pretty small. It just serves the page with the slider and for­wards re­quests to the speak­ers.

I’m us­ing that tagged

tem­plate

just to get nicer syn­tax high­light­ing in my ed­i­tor for the em­bed­ded HTML.

This works al­right for now, but what I re­ally want is a phys­i­cal vol­ume knob that I can place wher­ever it’s con­ve­nient in my apart­ment.

In the next post in this se­ries, I’ll talk about build­ing that, prob­a­bly us­ing some­thing like an ESP32 board with a ro­tary en­coder, a nice en­clo­sure and a nice feel­ing knob, maybe with some kind of hap­tic feed­back for the stepped vol­ume changes?

I haven’t ac­tu­ally worked with those com­po­nents be­fore, and it’s been a while since I last worked on a hard­ware elec­tron­ics pro­ject, but I’m ex­cited to!

...

Read the original on jamesbvaughan.com »

10 201 shares, 6 trendiness

Scaling One Million Checkboxes to 650,000,000 checks

On June 26th 2024 I launched a web­site called One Million Checkboxes (OMCB). It had one mil­lion global check­boxes on it - check­ing a box checked it for every­one on the site, im­me­di­ately.

I built the site in 2 days. I thought I’d get a few hun­dred users, max. That is not what hap­pened.

Instead, within hours of launch­ing, tens of thou­sands of users checked mil­lions of boxes. They piled in from Hacker News, /r/InternetIsBeautiful, Mastodon and Twitter. A few days later OMCB ap­peared in the Washington Post and the New York Times.

Here’s what ac­tiv­ity looked like on the first day (I launched at 11:30 AM EST).

I don’t have logs for checked boxes from the first few hours be­cause I orig­i­nally only kept the lat­est 1 mil­lion logs for a given day(!)

I was­n’t pre­pared for this level of ac­tiv­ity. The site crashed a lot. But by day 2 I started to sta­bi­lize things and peo­ple checked over 50 mil­lion boxes. We passed 650 mil­lion be­fore I sun­set the site 2 weeks later.

Let’s talk about how I kept the site (mostly) on­line!

Here’s the gist of the orig­i­nal ar­chi­tec­ture:

Our check­box state is just one mil­lion bits (125KB). A bit is 1” if the cor­re­spond­ing check­box is checked and 0” oth­er­wise.

Clients store the bits in a bit­set (an ar­ray of bytes that makes it easy to store, ac­cess, and flip raw bits) and ref­er­ence that bit­set when ren­der­ing check­boxes. Clients tell the server when they check a box; the server flips the rel­e­vant bit and broad­casts that fact to all con­nected clients.

To avoid throw­ing a mil­lion el­e­ments into the DOM, clients only ren­der the check­boxes in view (plus a small buffer) us­ing re­act-win­dow.

I could have done this with a sin­gle process,I wanted an ar­chi­tec­ture that I could scale (and an ex­cuse to use Redis for the first time in years). So the ac­tual server setup looked like this:

Clients hit ng­inx for sta­tic con­tent, and then make a GET for the bit­set state and a web­socket con­nec­tion (for up­dates); ng­inx (acting as a re­verse proxy) for­wards those re­quests to one of two Flask servers (run via gu­ni­corn).

State is stored in Redis, which has good prim­i­tives for flip­ping in­di­vid­ual bits. Clients tell Flask when they check a box; Flask up­dates the bits in Redis and writes an event to a pub­sub (message queue). Both Flask servers read from that pub­sub and no­tify con­nected clients when check­boxes are checked/​unchecked.

We need the pub­sub be­cause we’ve got two Flask in­stances; a Flask in­stance can’t just broad­cast box 2 was checked” to its own clients.

Finally, the Flask servers do sim­ple rate-lim­it­ing (on re­quests per ses­sion and new ses­sions per IP - fool­ishly stored in Redis!) and reg­u­larly send full state snap­shots to con­nected clients (in case a client missed an up­date be­cause, say, the tab was back­grounded).

This code is­n’t great! It’s not even async. I haven’t shipped pro­duc­tion Python in like 8 years! But I was fine with that. I did­n’t think the pro­ject would be pop­u­lar. This was good enough.

I changed a lot of OMCB but the ba­sic ar­chi­tec­ture - ng­inx re­verse proxy, API work­ers, Redis for state and mes­sage queues - re­mained.

Before I talk about what changed, let’s look at the prin­ci­ples I had in mind while scal­ing.

I needed to be able to math out an up­per bound on my costs. I aimed to let things break when they broke my ex­pec­ta­tions in­stead of go­ing server­less and scal­ing into bank­ruptcy.

I as­sumed the site’s pop­u­lar­ity was fleet­ing. I took on tech­ni­cal debt and aimed for ok so­lu­tions that I could hack out in hours over great so­lu­tions that would take me days or weeks.

I’m used to run­ning my own servers. I like to log into boxes and run com­mands. I tried to only add de­pen­den­cies that I could run and de­bug on my own.

I op­ti­mized for fun, not money. Scaling the site my way was fun. So was say­ing no to ad­ver­tis­ers.

The magic of the site was jump­ing any­where and see­ing im­me­di­ate changes. So I did­n’t want to scale by, for ex­am­ple, send­ing clients a view of only the check­boxes they were look­ing at.

Within 30 min­utes of launch, ac­tiv­ity looked like this:

The site was still up, but I knew it would­n’t tol­er­ate the load for much longer.

The most ob­vi­ous im­prove­ment was more servers. Fortunately this was easy - ng­inx could eas­ily re­verse-proxy to Flask in­stances on an­other VM, and my state was al­ready in Redis. I started spin­ning up more boxes.

I spun up the sec­ond server around 12:30 PM. Load im­me­di­ately hit 100%

I orig­i­nally as­sumed an­other server or two would be suf­fi­cient. Instead traf­fic grew as I scaled. I hit #1 on Hacker News; ac­tiv­ity on my tweet sky­rock­eted. I looked for big­ger op­ti­miza­tions.

My Flask servers were strug­gling. Redis was run­ning out of con­nec­tions (did you no­tice I was­n’t us­ing a con­nec­tion pool?). My best idea was to batch up­dates - I hacked some­thing in that looked like this:

I did­n’t bother with back­wards com­pat­i­bil­ity. I fig­ured folks were used to the site break­ing and would just re­fresh.

I also added a con­nec­tion pool. This def­i­nitely did not play nicely with gu­ni­corn and Flask, but it did seem to re­duce the num­ber of con­nec­tions to Redis.

I also beefed up my Redis box - easy to do since I was us­ing Digital Ocean’s man­aged Redis - from a tiny (1 shared CPU; 2 GB RAM) in­stance to a box with 4 ded­i­cated CPUs and 32 GB of RAM (I did this af­ter Redis mys­te­ri­ously went down). The re­siz­ing took about 30 min­utes; the server came back up.

And then things got trick­ier.

At around 4:30 PM I ac­cepted it: I had plans. I had spent June at a camp at ITP - a school at NYU. And the night of the 26th was our fi­nal show. I had signed up to dis­play a face-con­trolled Pacman game and in­vited some friends - I had to go!

I brought an iPad and put OMCB on it. I spun up servers while my friend Uri and my girl­friend Emma kindly stepped in to ex­plain what I was do­ing to strangers when they came by my booth.

I had no au­toma­tion for spin­ning up servers (oops) so my nam­ing con­ven­tions evolved as I worked.

My servers. I ended up with 8 worker VMs

I got home from the show around mid­night. I was tired. But there was still more work to do, like:

* Reducing the num­ber of Flask processes on each box (I orig­i­nally had more work­ers than the num­ber of cores on a box; this did­n’t work well)

* Increasing the batch size of my up­dates - I found that dou­bling the batch size sub­stan­tially re­duced load. I tried dou­bling it again. This ap­peared to help even more. I don’t know how to pick a prin­ci­pled num­ber here.

I pushed the up­dates. I was feel­ing good! And then I got a text from my friend Greg Technology.

I re­al­ized I had­n’t thought hard enough about band­width. Digital Ocean’s band­width pric­ing is pretty sane ($0.01/GB af­ter a pretty gen­er­ous per-server com­pound­ing free al­lowance). I had a TB of free band­width from past work and (pre-launch) did­n’t think OMCB would make a dent.

I did back of the en­ve­lope math. I send state snap­shots (1 mil­lion bits; 1 Mbit) every 30 sec­onds. With 1,000 clients that’s al­ready 2GB a minute! Or 120GB an hour. And we’re prob­a­bly gonna have more clients than that. And we haven’t even started to think about up­dates.

It was 2 AM. I was very tired. I did some bad math - maybe I con­fused GB/hour with GB/minute? - and freaked out. I thought I was al­ready on the hook for thou­sands of dol­lars!

So I did a cou­ple of things:

* Frantically texted Greg, who helped me re­al­ize that my math was way off.

* Ran ip -s link show dev eth0 on my ng­inx box to see how many bytes I had sent, con­firm­ing that my math was way off.

* Started think­ing about how to re­duce band­width - and how to cap my costs.

I im­me­di­ately re­duced the fre­quency of my state snap­shots, and then (with some help from Greg) pared down the size of the in­cre­men­tal up­dates I sent to clients.

I moved from stuff­ing a bunch of dicts into a list to send­ing two ar­rays of in­dices with true and false im­plied. This was five times shorter than my orig­i­nal im­ple­men­ta­tion!

And then I used lin­ux’s tc util­ity to slam a hard cap on the amount of data I could send per sec­ond. tc is fa­mously hard to use, so I wrote my con­fig­u­ra­tion script with Claude’s help.

This just lim­its traf­fic flow­ing over eth0 (my pub­lic in­ter­face) to 250Mbit a sec­ond. That’s a lot of band­width - ~2GB/min, or just un­der 3 TB a day. But it let me rea­son about my costs, and at $0.01/GB I knew I would­n’t go bank­rupt overnight.

At around 3:30 AM I got in bed.

My server was pegged at my 250 Mb/s limit for much of the night. I orig­i­nally thought I was lucky to add lim­its when I did; I now re­al­ize some­one prob­a­bly saw my tweet about re­duc­ing band­width and tried to give me a huge bill.

Blue is traf­fic from my work­ers to ng­inx, pur­ple is ng­inx out to the world. The tim­ing is sus­pi­cious

I woke up a few hours later. The site was down. I had­n’t been val­i­dat­ing in­put prop­erly.

The site did­n’t pre­vent folks from check­ing boxes above 1 mil­lion. Someone had checked boxes in the hun­dred mil­lion range! This let them push the count of checked boxes to 1 mil­lion, trick­ing the site into think­ing things were over.

Redis had also added mil­lions of 0s (between bit one mil­lion and bit one hun­dred mil­lion), which 100x’d the data I was send­ing to clients.

This was em­bar­rass­ing - I’m new to build­ing for the web but like…I know you should val­i­date your in­puts! But it was a quick fix. I stopped ng­inx, copied the first mil­lion bits of my old bit­set to a new trun­cated bit­set (I wanted to keep the old one for de­bug­ging), taught my code to ref­er­ence the new bit­set, and added proper val­i­da­tion.

Not too bad! I brought the site back up.

The site was slow. The num­ber of checked boxes per hour quickly ex­ceeded the day 1 peak.

The biggest prob­lem was the ini­tial page load. This made sense - we had to hit Redis, which was un­der a lot of load (and we were mak­ing too many con­nec­tions to it due to bugs in my con­nec­tion pool­ing).

I was tired and did­n’t feel equipped to de­bug my con­nec­tion pool is­sues. So I em­braced the short term and spun up a Redis replica to take load off the pri­mary and spread my con­nec­tions out.

But there was a prob­lem - af­ter spin­ning up the replica, I could­n’t find its pri­vate IP!

I got my Redis in­stance’s pri­vate IP by prepend­ing private-” to its DNS en­try

To con­nect to my pri­mary, I used a DNS record - there were records for its pub­lic and pri­vate IPs. Digital Ocean told me to prepend replica- to those records to get my replica IP. This worked for the pub­lic one, but did­n’t ex­ist for the pri­vate DNS record! And I re­ally wanted the pri­vate IP.

I thought send­ing traf­fic to a pub­lic IP would risk tra­vers­ing the pub­lic in­ter­net, which would mean be­ing billed for way more band­width.

Since I could­n’t fig­ure out how to find the repli­ca’s pri­vate IP in an of­fi­cial way (I’m sure you can! Tell me how!), I took a dif­fer­ent ap­proach and start­ing mak­ing con­nec­tions to pri­vate IPs close to the IPs of my Redis pri­mary and my other servers. This worked on the third or fourth try.

Then I hard­coded that IP as my replica IP!

My Flask processes kept crash­ing, re­quir­ing me to babysit the site. The crashes seemed to be from run­ning out of Redis con­nec­tions. I’m winc­ing as I type this now, but I still did­n’t want to de­bug what was go­ing on there - it was late and the prob­lem was fuzzy.

So I wrote a script that looked at the num­ber of run­ning Flask processes and bounced my sys­temd unit if too many were down.

I threw that into the crontab on my boxes and up­dated my ng­inx con­fig to briefly take servers out of ro­ta­tion if they were down (I should have done this sooner!). This ap­peared to work pretty well. The site sta­bi­lized.

At around 12:30 AM I posted some stats on Twitter and got ready to go to bed. And then a user re­ported an is­sue:

To keep client check­box state syn­chro­nized, I did two things:

* Sent clients in­cre­men­tal up­dates when check­boxes were checked or unchecked

* Sent clients oc­ca­sional full-state snap­shots in case they missed an up­date

These up­dates did­n’t have time­stamps. A client could re­ceive a new full-state snap­shot and then ap­ply an old in­cre­men­tal up­date - re­sult­ing in them hav­ing a to­tally wrong view of the world un­til the next full-state snap­shot.

I was em­bar­rassed by this - I’ve writ­ten a whole lot of state ma­chine code and know bet­ter. It was al­most 1 AM and I had barely slept the night be­fore; it was a strug­gle to write code that I (ironically) thought I could write in my sleep. But I:

* Timestamped each up­date writ­ten to my Redis pub­sub

* Added the max time­stamp of each in­cre­men­tal up­date in the batches I sent to clients

* Taught clients to drop up­date batches if their time­stamp was be­hind the time­stamp of the last full-state snap­shot

This is­n’t per­fect (clients can ap­ply a batch of mostly-stale up­dates as long as one up­date is new) but it’s sub­stan­tially bet­ter.

me to claude, 1 AM

I ran my changes by Claude be­fore ship­ping to prod. Claude’s sug­ges­tions weren’t ac­tu­ally su­per help­ful, but talk­ing through why they were wrong gave me more con­fi­dence.

I woke up the next morn­ing and the site was still up! Hackily restart­ing your servers is great. This was great tim­ing - the site was at­tract­ing more main­stream me­dia at­ten­tion (I woke up to an email from the Washington Post).

I moved my at­ten­tion from keep­ing the site up to think­ing about how to wind it down. I was still con­fi­dent folks would­n’t be in­ter­ested in the site for­ever, and I wanted to pro­vide a real end­ing be­fore every­one moved on.

I came up with a plan - I’d make checked boxes freeze if they weren’t unchecked quickly. I was­n’t sure that my cur­rent setup could han­dle this - it might re­sult in a spike of ac­tiv­ity plus I’d be ask­ing my servers to do more work.

So (after tak­ing a break for a day) I got brunch with my friend Eliot - a su­per tal­ented per­for­mance en­gi­neer - and asked if he was down to give me a hand. He was, and from around 2 PM to 2 AM on Sunday we dis­cussed im­ple­men­ta­tions of my sun­set­ting plan and then rewrote the whole back­end in go!

The go rewrite was straight­for­ward; we ported with­out many changes. Lots of our stick­ing points were things like finding a go sock­e­tio li­brary that sup­ports the lat­est ver­sion of the pro­to­col.”

Things were ac­tu­ally so much faster that we ended up need­ing to add bet­ter rate-lim­it­ing; orig­i­nally we scaled too well and bots on the site were able to push ab­surd amounts of traf­fic through the site.

The site was DDOS’d on Sunday night, but ad­dress­ing this was pretty sim­ple - I just threw the site be­hind CloudFlare and up­dated my ng­inx con­figs a bit.

The site was rock-solid af­ter the go rewrite. I spent the next week do­ing in­ter­views, en­joy­ing the at­ten­tion, and try­ing to re­lax.

And then I got to work on sun­set­ting. Checked boxes would freeze if they weren’t unchecked quickly, which would even­tu­ally leave the site to­tally frozen. The ar­chi­tec­ture here ended up be­ing pretty sim­ple - mostly some more state in Redis:

I added a hashtable that tracked the last time that a box was checked (this would be too much state to pass to clients, but was fine to keep in Redis), along with a time to freeze” value. When try­ing to uncheck a box, we’d first check whether now - last_checked > time_­to_freeze - if it is, we don’t uncheck the box and in­stead up­date frozen_bit­set to note that the rel­e­vant check­box is now frozen.

I dis­trib­uted frozen_bit­set state to clients the same way that I dis­trib­uted which boxes were checked, and taught clients to dis­able a check­box if it was in the frozen bit­set. And I added a job to pe­ri­od­i­cally search for bits that should be frozen (but weren’t yet be­cause no­body had tried to uncheck them) and freeze those.

Redis made it soooo easy to avoid race con­di­tions with this im­ple­men­ta­tion - I put all the rel­e­vant logic into a Lua script, mean­ing that it all ran atom­i­cally! Redis is great.

I rolled the sun­set­ting changes 2 weeks and 1 day af­ter I launched OMCB. Box 491915 was checked at 4:35 PM Eastern on July 11th, clos­ing out the site.

Well, a lot. This was the sec­ond time that I’d put a server with a real’ back­end on the pub­lic in­ter­net, and the last one barely counted. Learning in a high-in­ten­sity but low-stakes en­vi­ron­ment is great.

Building the site in two days with lit­tle re­gard for scale was a good choice. It’s so hard to know what will do well on the in­ter­net - no­body I ex­plained the site to seemed that ex­cited about it - and I doubt I would have launched at all if I spent weeks think­ing about scale. Having a bunch of eyes on the site en­er­gized me to keep it up and helped me fo­cus on what mat­tered.

...

Read the original on eieio.games »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.