10 interesting stories served every morning and every evening.




1 682 shares, 26 trendiness

Zigbook – Learn the Zig Programming Language

Learning Zig is not just about adding a lan­guage to your re­sume. It is about fun­da­men­tally chang­ing how you think about soft­ware.Ready to trans­form how you think about soft­ware?

...

Read the original on www.zigbook.net »

2 643 shares, 23 trendiness

Coinbase Data Breach Timeline Doesn't Add Up

Coinbase Data Breach Timeline Doesn’t Add Up: I Have Recordings & Emails Proving Attacks Started Months Before Their Discovery’

How I was tar­geted by a so­phis­ti­cated phish­ing at­tack in January 2025—four months be­fore Coinbase pub­licly dis­closed they had been breached.

← Back to Blog

The Call That Changed Everything

On January 7, 2025, at 5:02 PM, I re­ceived an email with a sub­ject line that im­me­di­ately caught my at­ten­tion:

Order N54HJG3V: Withdrawal of 2.93 ETH ini­ti­ated. A rep­re­sen­ta­tive will be in touch shortly be­fore we mark the pay­ment com­pleted”

Minutes later, my phone rang. The caller ID showed 1-805-885-0141. An American-sounding woman who iden­ti­fied her­self as a Coinbase fraud pre­ven­tion rep­re­sen­ta­tive said some­one had ini­ti­ated a large trans­fer from my ac­count and she was call­ing to con­firm.

What hap­pened next was chill­ing: She knew my so­cial se­cu­rity num­ber. She knew my Bitcoin bal­ance down to the dec­i­mal point. She knew per­sonal de­tails that should have been im­pos­si­ble for a scam­mer to pos­sess.

This was­n’t just an­other phish­ing at­tempt. This was some­thing far more so­phis­ti­cated.

January 7, 2025: I was at­tacked by scam­mers with de­tailed per­sonal in­for­ma­tion

January 7, 2025 (same day): Brett Farmer, Head of Trust & Safety, re­sponded: This re­port is su­per ro­bust and gives us a lot to look into. We are in­ves­ti­gat­ing this scam­mer now.”

January 13, 2025: I asked Coinbase: How did the at­tacker know the bal­ance of my bit­coin hold­ings?” (No re­sponse)

January 17, 2025: I fol­lowed up again, ask­ing for a re­ply (No re­sponse)

January 22, 2025: Still no an­swer to my crit­i­cal ques­tion (No re­sponse)

January 29, 2025: I asked again: Could I please get a re­sponse?” (No re­sponse)

May 11, 2025: Coinbase says they be­came aware of the breach (when at­tack­ers de­manded $20M ran­som)

For four months, I had con­crete ev­i­dence that at­tack­ers pos­sessed de­tailed Coinbase cus­tomer data. For four months, I re­peat­edly asked Coinbase to ex­plain how this was pos­si­ble. And for four months, my ques­tions went unan­swered.

Coinbase never replied to a sin­gle fol­low-up email af­ter Brett Farmer’s ini­tial re­sponse. Despite his promise that they were investigating this scam­mer,” the most im­por­tant ques­tion—how the at­tacker ob­tained my pri­vate ac­count data—was met with com­plete si­lence.

What Coinbase Disclosed in May 2025

In May 2025, Coinbase fi­nally dis­closed what had hap­pened: cy­ber­crim­i­nals had bribed over­seas cus­tomer sup­port con­trac­tors—par­tic­u­larly em­ploy­ees at TaskUs in India—to steal sen­si­tive cus­tomer data.

Last four dig­its of Social Security num­bers

Coinbase es­ti­mated the fi­nan­cial im­pact at $180-400 mil­lion, af­fect­ing less than 1% of their cus­tomer base. Over 200 TaskUs em­ploy­ees were ul­ti­mately ter­mi­nated.

But here’s the cru­cial ques­tion: If at­tack­ers were ac­tively us­ing stolen data to tar­get cus­tomers in January, when did the ac­tual breach oc­cur? And when did Coinbase first be­come aware that some­thing was wrong?

What I Sent Coinbase on January 7, 2025

On January 7, 2025, im­me­di­ately af­ter rec­og­niz­ing the at­tack, I sent a com­pre­hen­sive se­cu­rity re­port to Coinbase’s se­cu­rity team. This was­n’t a vague com­plaint—it was a de­tailed tech­ni­cal analy­sis that should have raised im­me­di­ate red flags about a data breach.

Full email head­ers: Complete tech­ni­cal head­ers show­ing the email was routed through Amazon SES (a32-86.smtp-out.amazonses.com), not Coinbase’s own mail servers, de­spite ap­pear­ing to come from com­merce@coin­base.com

DKIM sig­na­ture analy­sis: Documentation that while the email passed DKIM val­i­da­tion for both coin­base.com and ama­zonses.com, the ac­tual send­ing in­fra­struc­ture was sus­pi­cious

The phish­ing email con­tent: The com­plete HTML email with its fake suspicious ac­tiv­ity” warn­ing and fraud­u­lent trans­ac­tion de­tails (Order N54HJG3V, 2.93 ETH with­drawal)

Phone num­ber used: 1-805-885-0141 (later con­firmed to be a Google Voice num­ber)

Voice record­ing: An au­dio record­ing of my sec­ond call with the scam­mer, cap­tur­ing the en­tire con­ver­sa­tion where she demon­strated knowl­edge of my per­sonal in­for­ma­tion

Specific data the at­tacker pos­sessed: A de­tailed list of what the scam­mer knew, in­clud­ing:

The amount of the fab­ri­cated suspicious trans­fer”

Attack method­ol­ogy: Description of their so­cial en­gi­neer­ing tac­tics, in­clud­ing the at­tempt to get me to move funds to a cold wal­let” by down­load­ing Coinbase Wallet

Red flags I iden­ti­fied: The in­abil­ity of the caller to au­then­ti­cate her­self, the Google Voice call­back num­ber, the lack of any no­ti­fi­ca­tions in my ac­tual Coinbase ac­count

Post-call SMS flood­ing: Documentation that im­me­di­ately af­ter the call, I re­ceived hun­dreds of spam text mes­sages for ran­dom ser­vice signups—a po­ten­tial at­tempt to hide le­git­i­mate 2FA codes or se­cu­rity alerts in the noise

This was­n’t a typ­i­cal phish­ing re­port. I specif­i­cally high­lighted that the at­tacker had ac­cess to non-pub­lic ac­count in­for­ma­tion that should have been im­pos­si­ble to ob­tain with­out ei­ther a de­vice com­pro­mise on my end or a data breach at Coinbase.

Brett Farmer, Coinbase’s Head of Trust & Safety, re­sponded the same day, call­ing it a super ro­bust” re­port. But when I fol­lowed up with the key ques­tion—“How did the at­tacker know the bal­ance of my bit­coin hold­ings?“—the con­ver­sa­tion ended. That crit­i­cal ques­tion was never an­swered.

Here’s what made the email and call feel con­vinc­ing in the mo­ment—and what ul­ti­mately stopped me from go­ing through with what the caller wanted.

The phish­ing email I re­ceived looked com­pletely le­git­i­mate at first glance:

DKIM sig­na­tures: Passed val­i­da­tion for both coin­base.com and ama­zonses.com

Convincing nar­ra­tive: We de­tected sus­pi­cious ac­tiv­ity on your ac­count. A fraud pre­ven­tion rep­re­sen­ta­tive will be in touch shortly…”

The email even in­cluded what ap­peared to be a ver­i­fi­ca­tion code (96841) and as­signed me a spe­cific case agent (“Sarah Schueler”). This level of de­tail gave it tremen­dous cred­i­bil­ity.

But when I ex­am­ined the email head­ers more care­fully, I found some­thing sus­pi­cious. Here are the key header fields from the phish­ing email:

From: Coinbase Commerce

To: [REDACTED]@[REDACTED]

Subject: Order N54HJG3V: Withdrawal of 2.93 ETH ini­ti­ated

Date: Tue, 7 Jan 2025 17:02:14 +0000

Message-ID:

Return-Path:

Received: from a32-86.smtp-out.ama­zonses.com (a32-86.smtp-out.amazonses.com. [54.240.32.86])

by mx.google.com with ESMTPS id [REDACTED]

for

The email was sent through Amazon SES (Simple Email Service), not Coinbase’s own mail servers. Here’s what made this sus­pi­cious:

Return-Path mis­match: The re­turn path used @amazonses.com, not @coinbase.com

Message-ID from Amazon: The Message-ID clearly shows @email.amazonses.com as the ori­gin

Dual DKIM sig­na­tures: While both ama­zonses.com and coin­base.com DKIM checks passed, this is ex­actly how phish­ing works—at­tack­ers can con­fig­ure Amazon SES to send from” coin­base.com

SPF pass for wrong do­main: SPF val­i­dated that Amazon SES was au­tho­rized to send for ama­zonses.com, not that it was au­tho­rized to send as coin­base.com

While Coinbase might le­git­i­mately use Amazon SES for some emails, a se­cu­rity-crit­i­cal fraud alert should come through more con­trolled chan­nels with stronger sender ver­i­fi­ca­tion. The dual DKIM setup is a clas­sic tech­nique: at­tack­ers reg­is­ter with Amazon SES, con­fig­ure it to send as” the tar­get do­main (which Amazon al­lows), and rely on re­cip­i­ents not check­ing the ac­tual send­ing in­fra­struc­ture.

The Phone Call: Even More Convincing (And Recorded)

When the woman called me, she was pro­fes­sional, in­tel­li­gent, and sounded ex­actly like a le­git­i­mate cus­tomer ser­vice rep­re­sen­ta­tive. She had all the per­sonal in­for­ma­tion I’d al­ready men­tioned, plus more de­tails that seemed im­pos­si­ble for a scam­mer to pos­sess.

During our con­ver­sa­tion, I asked her to au­then­ti­cate her­self. Here’s where things got in­ter­est­ing.

How I Detected It Was a Scam

Despite the so­phis­ti­ca­tion of the at­tack, sev­eral red flags even­tu­ally con­vinced me this was fraud­u­lent:

I asked the caller to prove she was from Coinbase. She of­fered to read me my per­sonal in­for­ma­tion—but I al­ready knew she had that in­for­ma­tion. That’s not au­then­ti­ca­tion; that’s just prov­ing she has stolen data.

When I sug­gested she send me an email from a ver­i­fied Coinbase ad­dress that I could re­ply to, she claimed she did­n’t have ac­cess to per­sonal email ad­dresses and could only use generic sup­port chan­nels. A fraud pre­ven­tion spe­cial­ist with­out the abil­ity to send ver­i­fied emails? That did­n’t add up.

When I asked if I could call her back, she said I could­n’t reach her be­cause she was in the fraud de­part­ment.” After the call, I tried call­ing the num­ber back: it was a Google Voice num­ber.

Legitimate fi­nan­cial in­sti­tu­tions al­ways pro­vide call­back num­bers that route to their main sys­tems. A Google Voice num­ber is a mas­sive red flag.

When I chal­lenged the au­then­tic­ity of the email sender, the caller in­sisted that Amazon was just Coinbase’s service provider” and that the DKIM sig­na­tures proved le­git­i­macy. But when pressed, she could­n’t ex­plain away the anom­alies in a sat­is­fac­tory way.

The con­ver­sa­tion on the recorded call shows my grow­ing skep­ti­cism:

Me: I don’t think there is enough in­for­ma­tion pro­vided for me to au­then­ti­cate you.”

Caller: I’m not sure what you would like me to do…”

Me: There are just too many red flags.”

Red Flag #4: No Notifications in My Account

After the call, I logged into my ac­tual Coinbase ac­count. There were:

No no­ti­fi­ca­tions about the al­leged trans­fer

No ac­tual trans­ac­tion match­ing what the caller de­scribed

If this were real, there would have been no­ti­fi­ca­tions every­where.

Red Flag #5: The Pressure to Use Coinbase Wallet

The caller wanted me to move my cryp­tocur­rency to a cold wal­let” and started walk­ing me through down­load­ing Coinbase Wallet. This is a clas­sic so­cial en­gi­neer­ing tac­tic—get the vic­tim to move funds to an ad­dress con­trolled by the at­tacker.

I did­n’t fol­low through, so I never dis­cov­ered ex­actly how they planned to steal the funds, but the in­tent was clear.

Here’s where things got even more con­cern­ing. Immediately af­ter I ended the call with the scam­mer, my phone was bom­barded with hun­dreds of text mes­sages—ran­dom ser­vice signups, ver­i­fi­ca­tion codes, newslet­ters, every­thing imag­in­able.

At first, I thought this was just a vin­dic­tive FU from the scam­mer. But the tim­ing and vol­ume sug­gest some­thing more cal­cu­lated: SMS flood­ing is a known tech­nique to hide le­git­i­mate se­cu­rity alerts in noise.

The at­tack works like this:

While you’re over­whelmed, they at­tempt ac­count takeovers on var­i­ous ser­vices

Real 2FA codes and se­cu­rity alerts get buried in the flood

You miss the crit­i­cal warn­ings be­cause they’re hid­den among hun­dreds of spam mes­sages

This could have been an at­tempt to:

Hide real alerts from Coinbase or other ser­vices

Overwhelm me while they at­tempted unau­tho­rized ac­cess to var­i­ous ac­counts

Create con­fu­sion and dis­tract from their next moves

This was­n’t just a phish­ing call—it was a co­or­di­nated, multi-vec­tor at­tack.

Coinbase’s han­dling of this breach raises se­ri­ous ques­tions:

Customer sup­port agents at third-party con­trac­tors had ac­cess to ex­tremely sen­si­tive data: Social Security num­bers, ac­count bal­ances, trans­ac­tion his­to­ries, and per­sonal doc­u­ments. Why were over­seas con­trac­tors given such priv­i­leged ac­cess?

The eco­nom­ics of the bribery scheme tell the story: at­tack­ers likely paid rel­a­tively small amounts to con­trac­tors earn­ing mod­est wages to ac­cess data they then used to at­tempt thefts worth po­ten­tially mil­lions.

Coinbase claims they discovered” the breach on May 11, 2025, when at­tack­ers at­tempted to ex­tort $20 mil­lion. But my case—and likely many oth­ers—proves the breach was be­ing ac­tively ex­ploited months ear­lier.

What mon­i­tor­ing sys­tems failed to de­tect that cus­tomer data was be­ing used in so­phis­ti­cated phish­ing at­tacks? I re­ported this in January with spe­cific de­tails about how the at­tacker knew my in­for­ma­tion. That re­port should have trig­gered alarm bells.

Despite Brett Farmer’s ini­tial ac­knowl­edg­ment that my re­port was super ro­bust” and war­ranted in­ves­ti­ga­tion, my fol­low-up emails ask­ing how at­tack­ers ob­tained my ac­count data went com­pletely unan­swered.

My ques­tion was spe­cific and tech­ni­cal. It went to the heart of what should have been a mas­sive red flag: How did at­tack­ers have ac­cess to non-pub­lic ac­count data?

Had Coinbase in­ves­ti­gated this se­ri­ously in January, they might have dis­cov­ered the in­sider threat months ear­lier and pre­vented ad­di­tional vic­tims.

Coinbase says they became aware” of the breach on May 11, 2025. But my January at­tack proves the breach was ac­tive at least four months ear­lier. This raises un­com­fort­able ques­tions:

...

Read the original on jonathanclark.com »

3 533 shares, 105 trendiness

I Built a Synth for My Daughter

TLDR: I built a portable step-se­quencer syn­the­sizer for my daugh­ter’s third birth­day. It has four slid­ers that con­trol four notes in a loop­ing se­quence. Slide up = higher pitch, slide down = lower.

It’s a child-friendly, tac­tile mu­sic toy. Here’s the pink edi­tion in ac­tion:

My daugh­ter re­ceived a Montessori ac­tiv­ity board full of switches and LEDs for her first birth­day. Watching her twist knobs and flip the switches re­minded me of the con­trol sur­face of a synth, and I won­dered if I could build a mu­si­cal ver­sion - some­thing sim­ple, tac­tile, and cre­ative that did­n’t re­quire hold­ing down but­tons to keep the sound go­ing. A year later I fi­nally de­cided to build it. I had no prior hard­ware ex­pe­ri­ence, so this be­came an ex­cuse to learn about mi­cro­con­trollers, CAD, PCB de­sign, and 3D print­ing.

I started the pro­ject with a 15 year old Arduino Inventors Kit and only a vague idea about how to use it. The first goal was sim­ple: build a ba­sic MIDI con­troller on a bread­board. If I could get some po­ten­tiome­ter read­ings, map them to 12 dis­crete val­ues - one for each note in an oc­tave - and emit MIDI mes­sages, I would have taken a small step in the right di­rec­tion. Adding an on­board synth mod­ule and de­sign­ing a pretty box to put it in could wait un­til later.

Reading the po­ten­tiome­ter in­puts and turn­ing them into the MIDI mes­sages us­ing the Arduino MIDI li­brary was easy enough. To hear the out­put, I wrote a small Python script that in­ter­cepted the MIDI mes­sages and for­warded them to my Mac’s de­fault MIDI de­vice, which Logic Pro could pick up. That let me play” the bread­board through soft­ware in­stru­ments.

Once I had the hang of wiring up po­ten­tiome­ters and ro­tary en­coders, the next step was to move the au­dio syn­the­sis from Logic to my bread­board. For this I used a lit­tle $12.95 SAM2695 syn­the­siser mod­ule with an in­te­grated am­pli­fier and speaker. Its in­ner work­ings re­main a mys­tery to me but it does what I need it to and I was happy re­duce the amount of time to get a func­tion­ing pro­to­type into my daugh­ter’s hands. I also moved to an Elegoo Nano here due to its low cost and in­creased num­ber of ana­log pins.

Next, I added small OLED screen to pro­vide some vi­sual feed­back and char­ac­ter and used the handy u8g2 graph­ics li­brary. This was trick­ier than I ex­pected: the Nano has so lit­tle RAM that I could­n’t buffer a full frame. I had to up­date the screen in small patches, and large up­dates were slow enough that they oc­ca­sion­ally in­ter­fered with en­coder reads, and caused laggy notes at faster tem­pos. I’ve still got some work to do to iron out block­ing screen up­dates, but for now I pushed through and ac­cepted a bit of lag. I added a lit­tle danc­ing panda that I adapted from one I found in a pixel art tu­to­r­ial which I can no longer find - if you’re the orig­i­nal cre­ator, please let me know so I can credit you!

For de­vel­op­ing on-the-go, I dis­cov­ered the Wokwimicrocontroller sim­u­la­tor. It let me build a vir­tual schematic and test code with­out lug­gin around the my frag­ile pro­to­type. They have a free on­line sim­u­la­tor and a paid VS Code plu­gin that lets you cre­ate your di­a­grams in the IDE.

Once I had a func­tional cir­cuit it was time to move on to de­sign­ing an en­clo­sure and as­sem­bling a com­plete ver­sion of the syn­the­siser that my daugh­ter could play with.

After wiring up the bread­board, the next hur­dle was fig­ur­ing out how to build a proper en­clo­sure. I looked for off-the-shelf cases, but noth­ing matched the size I needed, and every­thing seemed to come in ei­ther black or beige. So I de­cided to learn some ba­sic CAD and 3D-print the en­clo­sure on a friend’s Bambu Labs A1 Mini.

I down­loaded Fusion 360 and started fol­low­ing tu­to­ri­als. With only an hour or two to spare in the evenings, progress was slow at first. I’d never used any CAD soft­ware be­fore, so I was con­stantly switch­ing be­tween learn­ing the soft­ware and try­ing to make ac­tual progress on the de­sign. For other be­gin­ners, I highly rec­om­mend Product Design Online’s Learn Fusion 360 in 30 Days and this ex­cel­lent video by wermy.

After a few weeks of trial-and-er­ror, I fi­nally had a de­sign I could print:

Thank you Tom for print­ing these! A year’s sup­ply of fil­a­ment com­ing your way.

Moving the cir­cuit to a proper PCB felt daunt­ing, so for the first ver­sion I hand-wired every­thing on a sol­der­able bread­board. The good: hang­ing out and drink­ing some wine with my friend, who kindly of­fered to help with the sol­der­ing. The bad: when the time fi­nally came to close the two halves of the en­clo­sure, stuff­ing the rats nest of wires in­side ended up putting pres­sure on a bunch of the del­i­cate sol­dered joints and break­ing them. My daugh­ter could play around a bit with it - enough for me to con­vince my­self that she’d gen­uinely en­joy us­ing it - but it was frag­ile. I also wanted to make a few units for friends, which meant I needed some­thing more ro­bust and faster to as­sem­ble. Time to de­sign a PCB.

Romain, I def­i­nitely owe you a bot­tle or two…

Once again I was back on YouTube and fum­bling my way through an un­fa­mil­iar work­flow, though I stuck with Fusion 360 which has its own elec­tron­ics de­sign suite. For my first at­tempt I de­cided that I’d fo­cus on sur­face-mount­ing the var­i­ous com­po­nents and save in­te­grat­ing the mi­cro­con­troller into the board for a fu­ture pro­ject. A large chunk of the time here was spent read­ing datasheets, sourc­ing parts and im­port­ing their foot­prints/​mod­els into Fusion 360. Once I had learned the ba­sics, I was able to route the cir­cuit on a 2-layer board. One of the nice things about Fusion is that you get a full 3D model of the as­sem­bled PCB, which makes de­sign­ing the en­clo­sure much eas­ier.

When I was fin­ished, I ex­ported the PCB de­sign file and up­loaded it to JLCPCB. Five boards (the min­i­mum or­der) cost £35.41 in­clud­ing ship­ping, and they ar­rived five days later. It blows my mind that this is pos­si­ble.

For my first ver­sion I had de­cided to use 4 AA bat­ter­ies and use the Arduino’s built-in volt­age con­verter to pro­vide a steady 5 volts. Something I over­looked, how­ever, is that the Arduino’s VIN pin that pro­vides a reg­u­lated 5V to the board re­quires 7-12V in­put, while my bat­ter­ies will pro­vide, at best, 6V when new. The board seemed to work OK at this volt­age but it would be vul­ner­a­ble to ran­dom re­sets as the volt­age starts to sag and a short bat­tery life.

For the next it­er­a­tion I de­cided to get rid of one of the bat­ter­ies and in­tro­duce an Adafruit Miniboost to pro­vide a reg­u­lated 5V power sup­ply to the Arduino from the com­bine 4.5V from the three AA bat­ter­ies. This al­lowed me to re­duce the weight a lit­tle bit and pro­vide the synth with a sta­ble sup­ply of power for a longer du­ra­tion.

Finally, I up­dated the en­clo­sure so that I could se­curely at­tach the PCB and added a neat lit­tle bat­tery com­part­ment. I also added a small bezel to raise the height of OLED dis­play.

It’s been just over a week since my daugh­ter un­wrapped her new synth. It now lives on the shelf with her other toys, and so far it gets reg­u­lar use and is hold­ing up well. One of my goals was to make some­thing fun to fid­dle with at a su­per­fi­cial level, but with enough depth to stay in­ter­est­ing as she gets older. The first part seems true, and I’ll see how the sec­ond plays out over the com­ing months. There are still a few kinks to iron out, such as the lag when up­dat­ing the screen. I’m also plan­ning to up­grade the Elegoo Nano to an ESP32, which should sim­plify the firmware and open up more op­tions for fun dis­play graph­ics.

After watch­ing a few chil­dren and adults (musical and non-mu­si­cal) play with it, I think there might be the germ of a real prod­uct here. With a bet­ter synth en­gine, au­dio out­puts, and a way to chain mul­ti­ple units to­gether, it could be a play­ful in­tro­duc­tion to elec­tronic mu­sic for older kids - maybe even adults. However, adding fea­tures is one thing, but ac­tu­ally bring­ing a prod­uct to mar­ket is an­other. The chal­lenges aren’t just tech­ni­cal: they’re reg­u­la­tory and fi­nan­cial. Safety cer­ti­fi­ca­tion (UKCA/CE, and FCC in the US) can cost £5-10K or more. Manufacturing is an­other hur­dle. A 3D-printed en­clo­sure is fine for a pro­to­type, but a real prod­uct likely needs in­jec­tion-molded parts, which re­quire ex­pen­sive tool­ing. Even a small pro­duc­tion run would need more up­front cap­i­tal than I can sen­si­bly in­vest right now.

For the mo­ment I’m treat­ing it as a learn­ing pro­ject, but the re­sponse so far has been en­cour­ag­ing. A more pol­ished open-source ver­sion for mak­ers, or pos­si­bly a small Kickstarter cam­paign, might be vi­able next steps. If any­one read­ing this has ex­pe­ri­ence bring­ing small-run hard­ware to mar­ket, I’d love to hear from you.

...

Read the original on bitsnpieces.dev »

4 288 shares, 12 trendiness

I finally understand Cloudflare Zero Trust tunnels

Access Policies: Protecting Who Can Access WhatAllow pub­lic ac­cess to every­one log­ging into your net­workDe­ploy­ing the Warp client and en­rolling into Zero Trust

A while ago, af­ter frus­tra­tion with Tailscale in en­vi­ron­ments where it could­n’t prop­erly pen­e­trate NAT/firewall and get a p2p con­nec­tion, I de­cided to in­vest some time into learn­ing some­thing new: Cloudflare Zero Trust + Warp.

There are so many new con­cepts, but af­ter way too long, I can fi­nally say that I un­der­stand Cloudflare Zero Trust Warp now. I am a full-on Cloudflare Zero Trust with Warp con­vert, and while I still have Tailscale run­ning in par­al­lel, al­most every­thing I do now is go­ing through Zero Trust tun­nels.

This post is an ex­pla­na­tion of the ba­sic con­cepts, be­cause I’m sure oth­ers will have sim­i­lar is­sues wrap­ping their head around it.

Why would you even sink so much time into learn­ing this? What does it give you?

Argo tun­nels through Zero Trust al­low you to do a bunch of re­ally cool things:

Connect pri­vate net­works to­gether - can be home net­works, can be ku­ber­netes clus­ters, you can cre­ate tun­nels to and from every in­fra­Ex­pose pri­vate ser­vices to the pub­lic, on pub­lic host­names, no mat­ter where they are run­ning. You could even put your router run­ning at 192.168.1.1 on the in­ter­net, ac­ces­si­ble to every­one, no Warp client re­quired­Cre­ate fully pri­vate net­works with pri­vate IPs (10.x.x.x) that only re­solve when Warp is con­nected, to ser­vices you spec­i­fyQuickly ex­pose a pub­lic route to any ser­vice run­ning lo­cally or on any server, for quick de­vel­op­ment, test­ing web­hooks or giv­ing cowork­ers a quick pre­view­Cre­ate a fully pri­vate net­work run­ning at home that’s only avail­able when you’re con­nected to the Warp VPN client, or only to you, reach­able any­whereNo wor­ries about NAT, every­thing goes through the Cloudflare net­work, no di­rect p2p con­nec­tion re­quiredAdd very gran­u­lar ac­cess poli­cies on who can ac­cess what - what lo­gin method does the user need, which email ad­dresses are al­lowed. Allow bots and server-to-server ex­cep­tions with ser­vice ac­cess to­kens. Does the user need to have Warp run­ning? Does he need to be en­rolled in Zero Trust? Does he need some spe­cial per­mis­sion flag?Au­then­ti­cate to SSH servers through Zero Trust ac­cess poli­cies with­out the need of SSH keys. Just con­nect Warp, type ssh host and you’re logged in­Close pub­lic SSH ports com­pletely to only al­low lo­gin through WarpGet the ben­e­fits of Cloudflare VPN edge rout­ing on top (similar to 1.1.1.1 Warp+)

To get this out of the way:

Tailscale: peer-to-peer, uses NAT and fire­wall pen­e­tra­tion meth­ods to es­tab­lish p2p con­nec­tions. If not pos­si­ble, it goes through cen­tral re­lay servers. Absolute best speed and la­tency if a con­nec­tion is es­tab­lished. Cloudflare: All traf­fic (with the ex­cep­tion of warp-to-warp rout­ing, which is p2p) goes through Cloudflare’s edge net­work. So even SSH-ing into your lo­cal router will hop through Cloudflare servers. This adds la­tency, but no is­sues with NAT at all.

Cloudflare has 2 tools avail­able: Warp Client and Cloudflared. They in­ter­act with each other and have sim­i­lar­i­ties in some ar­eas but are not the same.

The tool that con­nects you to the Cloudflare net­work. This is the thing that you con­fig­ure to add clients into your Zero Trust net­work and en­forces poli­cies.

Usually this runs on clients, but can also run on servers.

Warp client also sup­ports warp-to-warp rout­ing which is a true p2p con­nec­tion sim­i­lar to Tailscale.

The thing that cre­ates a tun­nel and adds it to the Zero Trust net­work.

Most com­monly you run this on servers to ex­pose tun­nels into your net­work, but you can also run it on clients.

On the client side you can use cloud­flared ac­cess to es­tab­lish a con­nec­tion with other things in your Zero Trust net­work.

Can also cre­ate one-time-use tun­nels that aren’t con­nected to the Zero Trust net­work. Good for test­ing.

This took me the longest to un­der­stand. Zero Trust al­lows you to con­fig­ure Tunnels, Routes and Targets; here’s how they in­ter­play.

The most im­por­tant part of your setup. Tunnels are de­ployed through cloud­flared and are sim­ply an exit for traf­fic. Think of it as a lit­eral tun­nel that has its end some­where.

Tunnels are de­ployed to in­fra­struc­ture in the tar­get net­work. So if you have a home net­work with 192.168.1.1/24, you want to de­ploy cloud­flared on any ma­chine that’s al­ways on and within that net­work. It can be your router, or your Raspi, it does­n’t mat­ter.

For server-hosted ser­vices, you can have a tun­nel on your main dev server, on a server, or on a pod in your Kubernetes clus­ter.

Now you have an open­ing into these net­works through Warp/Argo tun­nels.

You can ei­ther con­fig­ure tun­nels through the Zero Trust UI by adopting” them, or con­fig­ure them in the /etc/cloudflared/config.yml con­fig on the ma­chine it­self. Personal pref­er­ence, I usu­ally con­fig­ure them on the ma­chine it­self.

The con­fig spec­i­fies where a re­quest should get routed to when it ar­rives at the tun­nel. So the tun­nel knows what to do with it.

In this con­fig we tell cloud­flared to route traf­fic ar­riv­ing at this tun­nel for host­name git­lab.wid­get­corp.tech to lo­cal­host:80, and git­lab-ssh to the lo­cal SSH server.

The con­fig alone does­n’t do any­thing. It just ex­poses a tun­nel, and that’s it. What we need now are routes and tar­gets.

Exposing a pri­vate net­work to the pub­lic with tun­nels quickly#

Quick ad­di­tion, as this is a su­per com­mon use case. If you want to just ex­pose some­thing in your home net­work to the in­ter­net, you can add a con­fig like this:

Then go into Cloudflare DNS set­tings and map the do­main home­as­sis­tant.my­do­main.com to the tun­nel:

Now all traf­fic go­ing to this do­main will go through the cloud­flared tun­nel, which is con­fig­ured to route home­as­sis­tant.my­do­main.com to 192.168.1.3. No Warp client needed, Argo tun­nel does every­thing for us.

Note: If you adopted the tun­nels and don’t use con­fig.yaml, you can au­to­mat­i­cally cre­ate match­ing DNS records in the Cloudflare UI and don’t need to do this man­u­ally.

A route de­fines where to di­rect traf­fic to.

Let’s say your home­as­sis­tant runs on 192.168.1.3 at home and you want to reach it from out­side. Just above we de­ployed a cloud­flared tun­nel on our router at 192.168.1.3, and added a con­fig point­ing the do­main to the Argo tun­nel, so home­as­sis­tant.my­do­main.com is al­ready avail­able to the pub­lic. However, 192.168.1.3 is­n’t, as it’s a pri­vate net­work IP.

A route like 192.168.1.1/24 point­ing at your tun­nel, to route ALL traf­fic to the full IP range through that tun­nel (so even 192.168.1.245 will go through your tun­nel)Or a more spe­cific route like 192.168.1.3/32 point­ing at your tun­nel, to ONLY route traf­fic to 192.168.1.3 through that tun­nel.

When con­fig­ured, once your user con­nects their Warp client that’s set up with your Zero Trust net­work, the Warp client will see re­quests to 192.168.1.3 and route it through the Cloudflare net­work to reach your spe­cific tun­nel. Like a lit­tle po­lice helper di­rect­ing cars where to go.

If the Warp client is not con­nected, 192.168.1.3 will just re­solve in your cur­rent lo­cal net­work. If con­nected, it will re­solve to the tun­nel.

The routed IP does­n’t need to ex­ist! So you could, for ex­am­ple, route a ran­dom IP you like (e.g., 10.128.1.1) to your tun­nel, the tun­nel then for­wards it based on your routes, for ex­am­ple to 192.168.1.1. This is ex­tremely pow­er­ful be­cause it al­lows you to build your own fully vir­tual net­work.

That’s all it does, what hap­pens af­ter­wards is up to the tun­nel con­fig that we cre­ated above. The tun­nel de­cides where to point the in­com­ing re­quest to, whether that’s lo­cal­host or some­where else.

To sum­ma­rize, the route tells the Warp client where to route traf­fic to.

Now we have 2 things work­ing:

home­as­sis­tant.my­do­main.com - goes through a Cloudflare DNS record point­ing at an Argo tun­nel, which then for­wards to 192.168.1.3. This works with­out Warp con­nected as it’s on the DNS level, pub­lic to every­one.192.168.1.3 - The Warp client sees the re­quest and routes it through the Argo tun­nel, which then for­wards it to 192.168.1.3 within that net­work. This needs Warp con­nected to work, and is only vis­i­ble to peo­ple in your Zero Trust org.

This one took me a while.

Targets are needed to de­fine a piece of in­fra­struc­ture that you want to pro­tect through Zero Trust. They are like a pointer point­ing to some­thing in your net­work. This goes hand-in-hand with routes, but is­n’t al­ways needed.

Let’s say you have 192.168.1.3 (homeassistant) ex­posed through a Cloudflare tun­nel. By de­fault, any­one in your net­work that is part of your Zero Trust org and has Warp client in­stalled can now ac­cess your home­as­sis­tant at 192.168.1.3.

We can change that with tar­gets. For ex­am­ple, defin­ing a tar­get with host­name = home­as­sis­tant.my­do­main.com to the route 192.168.1.3/32 al­lows us to add ac­cess poli­cies to it. We can also put an en­tire net­work into the tar­get by spec­i­fy­ing 192.168.1.3/24 to con­trol ac­cess. This also works with vir­tual IPs like 10.128.1.1!

Targets alone won’t do any­thing, they just point to the ser­vice or net­work. Hey, here is home­as­sis­tant”, or hey, here is my home net­work”.

Access Policies: Protecting Who Can Access What#

Continuing the ex­am­ple from above:

we have a tun­nel run­ning on our home net­work that routes home­as­sis­tant.my­do­main.com to 192.168.1.3we set up pub­lic DNS records to point home­as­sis­tant.my­do­main.com to the Argo tun­nel in Cloudflarewe cre­ated a route 192.168.1.3 to go through the same tun­nelwe also cre­ated a tar­get point­ing to 192.168.1.3

When users ac­cess ei­ther 192.168.1.3 or home­as­sis­tant.my­do­main.com, the Warp client will route the re­quest through the tun­nel, which then for­wards the re­quest to 192.168.1.3. Homeassistant loads and every­thing is fine.

But do we want that?

With ac­cess poli­cies, we can leave things in the pub­lic but pro­tect them with Cloudflare Zero Trust ac­cess. So while 192.168.1.3 is only avail­able if Warp is con­nected (so rout­ing to it works), we can add se­cu­rity to our pub­lic home­as­sis­tant.my­do­main.com.

Go to Access -> Applications -> Add an Application -> Self-hosted.

Here we can de­fine what should be pro­tected, and how.

Going with our pre­vi­ous ex­am­ple, we can add a pub­lic host­name home­as­sis­tant.my­do­main.com or an IP like 192.168.1.3 (or both), then at­tach poli­cies of who should be able to ac­cess it.

You can spec­ify Include (“OR) and Require (“AND) se­lec­tors.

Require rules must al­ways be met, on top of in­clude rules, to grant ac­ces­sAny of the Include rules must match to grant ac­cess

Then there are Actions:

Allow - when the pol­icy matches, al­low ac­cess­Deny - when the pol­icy matches, deny ac­cess. aka block­ing some­thing. Bypass - when the pol­icy matches, by­pass Zero Trust com­pletely. No more check­ing.Ser­vice Auth - when the pol­icy matches, al­low au­then­ti­ca­tion to the ser­vice with a ser­vice to­ken header (good for server-to-server, or bots). Check Access -> Service Auth to cre­ate these to­kens.

Allow pub­lic ac­cess to every­one log­ging into your net­work#

The most com­mon use case: home­as­sis­tant.my­do­main.com is pub­lic. We want to keep it pub­lic, but add an ex­tra layer of se­cu­rity.

Add an in­clude pol­icy, pick any of the email se­lec­tors, add the email of the user you want to al­low ac­cess to. Now only peo­ple au­then­ti­cated with your Zero Trust org with the spec­i­fied emails can ac­cess your home­as­sis­tant, with­out need­ing to have Warp run­ning.

We can harden this by adding re­quire rules: Add a Login Method se­lec­tor rule, pick a spe­cific lo­gin method like GitHub. Now only peo­ple with spe­cific emails that have au­then­ti­cated through GitHub can ac­cess your home­as­sis­tant, with­out need­ing to have Warp run­ning.

Another pol­icy I like hav­ing is to skip the lo­gin screen en­tirely when con­nected through Warp. If a user is al­ready en­rolled into my Zero Trust org and has the Warp client pro­vi­sioned, then there’s no need to ask them to au­then­ti­cate again.

We can add a sep­a­rate pol­icy (don’t edit the one we just cre­ated above), pick the Gateway se­lec­tor and set it to Allow or Bypass.

Don’t use Warp’ - the Warp se­lec­tor will match any­one that has Warp run­ning, in­clud­ing the con­sumer 1.1.1.1 app. Gateway, on the other hand, matches only if some­one is con­nect­ing through your Gateway, be that DNS or a pro­vi­sioned Warp client.

Warp through Zero Trust is run­ning on a ma­chine: No lo­gin screenNo Warp run­ning (public ac­cess): Prompt for lo­gin screen, but only al­low spe­cific emails that au­then­ti­cated through GitHub

This setup makes it very con­ve­nient to reach home­as­sis­tant, no mat­ter if con­nected through Warp or not.

Deploying the Warp client and en­rolling into Zero Trust#

Are you still with me?

Our net­work is ba­si­cally done. We have a lo­gin-pro­tected home­as­sis­tant.my­do­main.com that routes through our tun­nel into our pri­vate net­work and ter­mi­nates at 192.168.1.3, and we have a di­rect route to 192.168.1.3 that only works when con­nected with Warp.

We also have lo­gin poli­cies to make sure only spe­cific users (logged in with GitHub and cer­tain email ad­dresses) can ac­cess home­as­sis­tant.

So how do we de­ploy the dang Warp client?

Actually the same: We cre­ate some poli­cies.

In Enrollment Permissions, we spec­ify the same poli­cies for who can en­roll. For ex­am­ple, [email protected]” when au­then­ti­cated through GitHub is al­lowed to en­roll. In the Login Methods we can spec­ify what lo­gin meth­ods are avail­able when some­one tries to en­roll into our Zero Trust org.

Toggle WARP au­then­ti­ca­tion iden­tity set­tings to make the Gateway se­lec­tor avail­able in poli­cies, ef­fec­tively al­low­ing the con­fig­ured WARP client to be used as a lo­gin method.

Careful here, once some­one is en­rolled, they are ba­si­cally in your Zero Trust net­work through Warp. Make sure you harden this.

Then, in Profile set­tings, we de­fine how the WARP client be­haves. These are things like pro­to­col: MASQUE or WireGuard, ser­vice mode, what IPs and do­mains to ex­clude from WARP rout­ing (e.g., the lo­cal net­work should never go through WARP), set­ting it to ex­clude or in­clude mode and so on.

Install CA to sys­tem cer­tifi­cate store - in­stalls the Cloudflare CA cer­tifi­cate au­to­mat­i­cally when en­rolled. Override lo­cal in­ter­face IP - as­signs a unique CGNAT pri­vate IP to the client. This is needed for warp-to-warp rout­ing.De­vice Posture - what checks the WARP client should per­form for the org. E.g., check the OS ver­sion, some OS files on disk, etc. I have this set to WARP and Gateway be­cause I want the client to pro­vide in­for­ma­tion on whether the user is con­nected through WARP and Gateway, for skip­ping cer­tain lo­gin pages.

Once done, just open the Warp client (https://​de­vel­op­ers.cloud­flare.com/​warp-client/), and log in to your net­work. This should open the lo­gin pages you spec­i­fied in the Device Enrollment screen, and check all the en­roll­ment poli­cies you spec­i­fied.

Once passed, con­grat­u­la­tions, your WARP client is now con­nected to your Zero Trust net­work. The client will then go ahead and start rout­ing 192.168.1.3 through your tun­nels, as spec­i­fied in your tun­nel and route set­tings.

If you fol­lowed this guide, here is what we built:

Login meth­ods to con­nect the Warp client to your Zero Trust org through GitHub and spe­cific email ad­dress­esA tun­nel within your pri­vate net­work that­For­wards any re­quest com­ing in with host home­as­sis­tant.my­do­main.com to 192.168.1.3A route that for­wards all traf­fic for 192.168.1.3 to the tun­nel in your pri­vate net­work, which will ter­mi­nate it at 192.168.1.3, which will only work when con­nected through Warp to route the re­questA DNS name home­as­sis­tant.my­do­main.com that points to the Argo tun­nel, and will al­low every­one (even if not con­nected through Warp) to ac­cess home­as­sis­tant which runs at 192.168.1.3Access poli­cies that willAsk users that are not con­nected to Zero Trust through Warp to log in with GitHub and spe­cific email, so every­one can ac­cess it if they can log inA pol­icy that skips the lo­gin screen com­pletely and just shows home­as­sis­tant if the user con­nects through Zero Trust Warp client (enrolled into our org)

You don’t need the pub­lic do­main and you don’t need the route to 192.168.1.3. These are 2 dif­fer­ent op­tions that you can use to ex­pose home­as­sis­tant when you’re not at home. One is us­ing a pub­lic do­main name every­one can see, one is ex­plic­itly re­quir­ing con­nect­ing through en­rolled Warp.

What I did­n’t cover in this post:

Creating and as­sign­ing fully pri­vate IPs that only ex­ist within your Zero Trust net­workSSH au­then­ti­ca­tion through Zero Trust ac­cess poli­cies (that’s what we need Targets for)The other ap­pli­ca­tion types be­sides Self-Hosted

I’m happy to ex­pand on it if there’s in­ter­est. Let me know on X or Bluesky.

...

Read the original on david.coffee »

5 278 shares, 10 trendiness

The fate of “small” open source

By far the most pop­u­lar npm pack­age I’ve ever writ­ten is blob-util, which is ~10 years old and still gets 5+ mil­lion weekly down­loads.

It’s a small col­lec­tion of util­i­ties for work­ing with Blobs in JavaScript. I wrote it be­cause I found that PouchDB users were end­lessly con­fused about how to work with Blobs and how to con­vert them to strings, ArrayBuffers, etc.

Given that some 80% of de­vel­op­ers are now us­ing AI in their reg­u­lar work, blob-util is al­most cer­tainly the kind of thing that most de­vel­op­ers would just hap­pily have an LLM gen­er­ate for them. Sure, you could use blob-util, but then you’d be tak­ing on an ex­tra de­pen­dency, with un­known per­for­mance, main­te­nance, and sup­ply-chain risks.

And sure enough, Claude will hap­pily spit out what­ever Blob util­i­ties you need when prompted:

> Write me a util­ity func­tion in TypeScript to con­vert a Blob to an ArrayBuffer. It should re­turn a Promise.

func­tion blobToAr­ray­Buffer(blob: Blob): Promise

Claude’s ver­sion is pretty close to the blob-util ver­sion (unsurprising, since it was prob­a­bly trained on it!). Although it’s much more ver­bose, un­nec­es­sar­ily check­ing if readAsAr­ray­Buffer ac­tu­ally gives you an ArrayBuffer (although this does make TypeScript happy). To be fair, it also im­proves on my im­ple­men­ta­tion by di­rectly re­ject­ing with an er­ror rather than the more awk­ward on­error event.

Note: for any­one won­der­ing, yes Claude did sug­gest the new Blob.arrayBuffer() method, but it also gen­er­ated the above for older en­vi­ron­ments.”

I sup­pose some peo­ple would see this as progress: fewer de­pen­den­cies, more ro­bust code (even if it’s a bit more ver­bose), quicker turn­around time than the old search npm, find a pack­age, read the docs, in­stall it” ap­proach.

I don’t have any ex­ces­sive pride in this li­brary, and I don’t par­tic­u­larly care if the down­load num­bers go up or down. But I do think some­thing is lost with the AI ap­proach. When I wrote blob-util, I took a teacher’s men­tal­ity: the README has a cutesy and whim­si­cal tu­to­r­ial fea­tur­ing Kirby, in all his blobby glory. (I had a thing for putting Nintendo char­ac­ters in all my stuff at the time.)

The goal was­n’t just to give you a util­ity to solve your prob­lem (although it does that) — the goal was also to teach peo­ple how to use JavaScript ef­fec­tively, so that you’d have an un­der­stand­ing of how to solve other prob­lems in the fu­ture.

I don’t know which di­rec­tion we’re go­ing in with AI (well, ~80% of us; to the re­main­ing hold­outs, I salute you and wish you god­speed!), but I do think it’s a fu­ture where we prize in­stant an­swers over teach­ing and un­der­stand­ing. There’s less rea­son to use some­thing like blob-util, which means there’s less rea­son to write it in the first place, and there­fore less rea­son to ed­u­cate peo­ple about the prob­lem space.

Even now there’s a move­ment to­ward putting doc­u­men­ta­tion in an llms.txt file, so you can just point an agent at it and save your brain cells the ef­fort of de­ci­pher­ing English prose. (Is this even doc­u­men­ta­tion any­more? What is doc­u­men­ta­tion?)

I still be­lieve in open source, and I’m still do­ing it (in fits and starts). But one thing has be­come clear to me: the era of small, low-value li­braries like blob-util is over. They were al­ready on their way out thanks to Node.js and the browser tak­ing on more and more of their func­tion­al­ity (see node:glob, struc­tured­Clone, etc.), but LLMs are the fi­nal nail in the cof­fin.

This does mean that there’s less op­por­tu­nity to use these li­braries as a spring­board for user ed­u­ca­tion (Underscore.js also had this phi­los­o­phy), but maybe that’s okay. If there’s no need to find a li­brary to, say, group the items in an ar­ray, then maybe learn­ing about the me­chan­ics of such li­braries is un­nec­es­sary. Many soft­ware de­vel­op­ers will ar­gue that ask­ing a can­di­date to re­verse a bi­nary tree is point­less, since it never comes up in the day-to-day job, so maybe the same can be said for util­ity li­braries.

I’m still try­ing to fig­ure out what kinds of open source are worth writ­ing in this new era (hint: ones that an LLM can’t just spit out on com­mand), and where ed­u­ca­tion is the most lack­ing. My cur­rent think­ing is that the most value is in big­ger pro­jects, more in­ven­tive pro­jects, or in more niche top­ics not cov­ered in an LLMs train­ing data. For ex­am­ple, I look back on my work on fuite and var­i­ous mem­ory-leak-hunt­ing blog posts, and I’m pretty sat­is­fied that an LLM could­n’t re­pro­duce this, be­cause it re­quires novel re­search and cre­ative tech­niques. (Although who knows: maybe some­day an agent will be able to just bang its head against Chrome heap snap­shots un­til it finds the leak. I’ll be­lieve it when I see it.)

There’s been a lot of hand-wring­ing lately about where open source fits in in a world of LLMs, but I still see peo­ple push­ing the bound­aries. For ex­am­ple, a lot of naysay­ers think there’s no point in writ­ing a new JavaScript frame­work, since LLMs are so heav­ily trained on React, but then there goes the in­de­fati­ga­ble Dominic Gannaway writ­ing Ripple.js, yet an­other JavaScript frame­work (and with some new ideas, to boot!). This is the kind of thing I like to see: hu­mans laugh­ing in the face of the ma­chine, go­ing on with their hu­man thing.

So if there’s a con­clu­sion to this me­an­der­ing blog post (excuse my squishy hu­man brain; I did­n’t use an LLM to write this), it’s just that: yes, LLMs have made some kinds of open source ob­so­lete, but there’s still plenty of open source left to write. I’m ex­cited to see what kinds of novel and un­ex­pected things you all come up with.

...

Read the original on nolanlawson.com »

6 241 shares, 18 trendiness

Building a Simple Search Engine That Actually Works

Look, I know what you’re think­ing. Why not just use Elasticsearch?” or What about Algolia?” Those are valid op­tions, but they come with com­plex­ity. You need to learn their APIs, man­age their in­fra­struc­ture, and deal with their quirks.

Sometimes you just want some­thing that:

* Is easy to un­der­stand and de­bug

That’s what I built. A search en­gine that uses your ex­ist­ing data­base, re­spects your cur­rent ar­chi­tec­ture, and gives you full con­trol over how it works.

The con­cept is sim­ple: to­k­enize every­thing, store it, then match to­kens when search­ing.

Indexing: When you add or up­date con­tent, we split it into to­kens (words, pre­fixes, n-grams) and store them with weights

Searching: When some­one searches, we to­k­enize their query the same way, find match­ing to­kens, and score the re­sults

Scoring: We use the stored weights to cal­cu­late rel­e­vance scores

The magic is in the to­k­eniza­tion and weight­ing. Let me show you what I mean.

We need two sim­ple ta­bles: in­dex_­to­kens and in­dex_en­tries.

This table stores all unique to­kens with their to­k­enizer weights. Each to­ken name can have mul­ti­ple records with dif­fer­ent weights—one per to­k­enizer.

// in­dex_­to­kens table struc­ture

id | name | weight

1 | parser | 20 // From WordTokenizer

2 | parser | 5 // From PrefixTokenizer

3 | parser | 1 // From NGramsTokenizer

4 | parser | 10 // From SingularTokenizer

Why store sep­a­rate to­kens per weight? Different to­k­eniz­ers pro­duce the same to­ken with dif­fer­ent weights. For ex­am­ple, parser” from WordTokenizer has weight 20, but parser” from PrefixTokenizer has weight 5. We need sep­a­rate records to prop­erly score matches.

The unique con­straint is on (name, weight), so the same to­ken name can ex­ist mul­ti­ple times with dif­fer­ent weights.

This table links to­kens to doc­u­ments with field-spe­cific weights.

// in­dex_en­tries table struc­ture

id | to­ken_id | doc­u­men­t_­type | field­_id | doc­u­men­t_id | weight

1 | 1 | 1 | 1 | 42 | 2000

2 | 2 | 1 | 1 | 42 | 500

The weight here is the fi­nal cal­cu­lated weight: field­_weight × to­k­eniz­er_weight × ceil(sqrt(to­ken_length)). This en­codes every­thing we need for scor­ing. We will talk about scor­ing later in the post.

Why this struc­ture? Simple, ef­fi­cient, and lever­ages what data­bases do best.

What is to­k­eniza­tion? It’s break­ing text into search­able pieces. The word parser” be­comes to­kens like [“parser”], [“par”, pars”, parse”, parser”], or [“par”, ars”, rse”, ser”] de­pend­ing on which to­k­enizer we use.

Why mul­ti­ple to­k­eniz­ers? Different strate­gies for dif­fer­ent match­ing needs. One to­k­enizer for ex­act matches, an­other for par­tial matches, an­other for ty­pos.

in­ter­face TokenizerInterface

pub­lic func­tion to­k­enize(string $text): ar­ray; // Returns ar­ray of Token ob­jects

pub­lic func­tion getWeight(): int; // Returns to­k­enizer weight

This one is straight­for­ward—it splits text into in­di­vid­ual words. parser” be­comes just [“parser”]. Simple, but pow­er­ful for ex­act matches.

First, we nor­mal­ize the text. Lowercase every­thing, re­move spe­cial char­ac­ters, nor­mal­ize white­space:

class WordTokenizer im­ple­ments TokenizerInterface

pub­lic func­tion to­k­enize(string $text): ar­ray

// Normalize: low­er­case, re­move spe­cial chars

$text = mb_str­tolower(trim($text));

$text = preg_re­place(‘/[^​a-z0-9]/‘, , $text);

$text = preg_re­place(‘/\​s+/‘, , $text);

Next, we split into words and fil­ter out short ones:

// Split into words, fil­ter short ones

$words = ex­plode(′ , $text);

$words = ar­ray_­fil­ter($words, fn($w) => mb_strlen($w) >= 2);

Why fil­ter short words? Single-character words are usu­ally too com­mon to be use­ful. a”, I”, x” don’t help with search.

// Return as Token ob­jects with weight

re­turn ar­ray_map(

fn($word) => new Token($word, $this->weight),

ar­ray_u­nique($words)

This gen­er­ates word pre­fixes. parser” be­comes [“par”, pars”, parse”, parser”] (with min length 4). This helps with par­tial matches and au­to­com­plete-like be­hav­ior.

First, we ex­tract words (same nor­mal­iza­tion as WordTokenizer):

class PrefixTokenizer im­ple­ments TokenizerInterface

pub­lic func­tion __construct(

pri­vate int $minPrefixLength = 4,

pri­vate int $weight = 5

pub­lic func­tion to­k­enize(string $text): ar­ray

// Normalize same as WordTokenizer

$words = $this->extractWords($text);

Then, for each word, we gen­er­ate pre­fixes from the min­i­mum length to the full word:

$tokens = [];

fore­ach ($words as $word) {

$wordLength = mb_strlen($word);

// Generate pre­fixes from min length to full word

for ($i = $this->minPrefixLength; $i

Why use an as­so­cia­tive ar­ray? It en­sures unique­ness. If parser” ap­pears twice in the text, we only want one parser” to­ken.

Finally, we con­vert the keys to Token ob­jects:

re­turn ar­ray_map(

fn($pre­fix) => new Token($prefix, $this->weight),

ar­ray_keys($to­kens)

Why min length? Avoid too many tiny to­kens. Prefixes shorter than 4 char­ac­ters are usu­ally too com­mon to be use­ful.

This cre­ates char­ac­ter se­quences of a fixed length (I use 3). parser” be­comes [“par”, ars”, rse”, ser”]. This catches ty­pos and par­tial word matches.

class NGramsTokenizer im­ple­ments TokenizerInterface

pub­lic func­tion __construct(

pri­vate int $ngramLength = 3,

pri­vate int $weight = 1

pub­lic func­tion to­k­enize(string $text): ar­ray

$words = $this->extractWords($text);

Then, for each word, we slide a win­dow of fixed length across it:

$tokens = [];

fore­ach ($words as $word) {

$wordLength = mb_strlen($word);

// Sliding win­dow of fixed length

for ($i = 0; $i

The slid­ing win­dow: for parser” with length 3, we get:

Why this works? Even if some­one types parsr” (typo), we still get par” and ars” to­kens, which match the cor­rectly spelled parser”.

re­turn ar­ray_map(

fn($ngram) => new Token($ngram, $this->weight),

ar­ray_keys($to­kens)

Why 3? Balance be­tween cov­er­age and noise. Too short and you get too many matches, too long and you miss ty­pos.

All to­k­eniz­ers do the same nor­mal­iza­tion:

This en­sures con­sis­tent match­ing re­gard­less of in­put for­mat.

We have three lev­els of weights work­ing to­gether:

Tokenizer weights: Word vs pre­fix vs n-gram (stored in in­dex_­to­kens)

When in­dex­ing, we cal­cu­late the fi­nal weight like this:

$finalWeight = $fieldWeight * $tokenizerWeight * ceil(sqrt($to­ken­Length));

Why use ceil(sqrt())? Longer to­kens are more spe­cific, but we don’t want weights to blow up with very long to­kens. parser” is more spe­cific than par”, but a 100-character to­ken should­n’t have 100x the weight. The square root func­tion gives us di­min­ish­ing re­turns—longer to­kens still score higher, but not lin­early. We use ceil() to round up to the near­est in­te­ger, keep­ing weights as whole num­bers.

You can ad­just weights for your use case:

* Increase field weights for ti­tles if ti­tles are most im­por­tant

* Increase to­k­enizer weights for ex­act matches if you want to pri­or­i­tize ex­act matches

* Adjust the to­ken length func­tion (ceil(sqrt), log, or lin­ear) if you want longer to­kens to mat­ter more or less

You can see ex­actly how weights are cal­cu­lated and ad­just them as needed.

...

Read the original on karboosx.net »

7 231 shares, 8 trendiness

'Is curing patients a sustainable business model?'

Goldman Sachs an­a­lysts at­tempted to ad­dress a touchy sub­ject for biotech com­pa­nies, es­pe­cially those in­volved in the pi­o­neer­ing gene ther­apy” treat­ment: cures could be bad for busi­ness in the long run. Is cur­ing pa­tients a sus­tain­able busi­ness model?” an­a­lysts ask in an April 10 re­port en­ti­tled The Genome Revolution.“”The po­ten­tial to de­liver one shot cures’ is one of the most at­trac­tive as­pects of gene ther­apy, ge­net­i­cally-en­gi­neered cell ther­apy and gene edit­ing. However, such treat­ments of­fer a very dif­fer­ent out­look with re­gard to re­cur­ring rev­enue ver­sus chronic ther­a­pies,” an­a­lyst Salveen Richter wrote in the note to clients Tuesday. While this propo­si­tion car­ries tremen­dous value for pa­tients and so­ci­ety, it could rep­re­sent a chal­lenge for genome med­i­cine de­vel­op­ers look­ing for sus­tained cash flow.“Richter cited Gilead Sciences’ treat­ments for he­pati­tis C, which achieved cure rates of more than 90 per­cent. The com­pa­ny’s U. S. sales for these he­pati­tis C treat­ments peaked at $12.5 bil­lion in 2015, but have been falling ever since. Goldman es­ti­mates the U.S. sales for these treat­ments will be less than $4 bil­lion this year, ac­cord­ing to a table in the re­port.“GILD is a case in point, where the suc­cess of its he­pati­tis C fran­chise has grad­u­ally ex­hausted the avail­able pool of treat­able pa­tients,” the an­a­lyst wrote. In the case of in­fec­tious dis­eases such as he­pati­tis C, cur­ing ex­ist­ing pa­tients also de­creases the num­ber of car­ri­ers able to trans­mit the virus to new pa­tients, thus the in­ci­dent pool also de­clines … Where an in­ci­dent pool re­mains sta­ble (eg, in can­cer) the po­ten­tial for a cure poses less risk to the sus­tain­abil­ity of a fran­chise.“The an­a­lyst did­n’t im­me­di­ately re­spond to a re­quest for com­ment.The re­port sug­gested three po­ten­tial so­lu­tions for biotech firms:“So­lu­tion 1: Address large mar­kets: Hemophilia is a $9-10bn WW mar­ket (hemophilia A, B), grow­ing at ~6-7% an­nu­ally.”

Solution 2: Address dis­or­ders with high in­ci­dence: Spinal mus­cu­lar at­ro­phy (SMA) af­fects the cells (neurons) in the spinal cord, im­pact­ing the abil­ity to walk, eat, or breathe.”

Solution 3: Constant in­no­va­tion and port­fo­lio ex­pan­sion: There are hun­dreds of in­her­ited reti­nal dis­eases (genetics forms of blind­ness) … Pace of in­no­va­tion will also play a role as fu­ture pro­grams can off­set the de­clin­ing rev­enue tra­jec­tory of prior as­sets.”

...

Read the original on www.cnbc.com »

8 219 shares, 9 trendiness

What if you don't need MCP at all?

After months of agen­tic cod­ing frenzy, Twitter is still ablaze with dis­cus­sions about MCP servers. I pre­vi­ously did some very light bench­mark­ing to see if Bash tools or MCP servers are bet­ter suited for a spe­cific task. The TL;DR: both can be ef­fi­cient if you take care.

Unfortunately, many of the most pop­u­lar MCP servers are in­ef­fi­cient for a spe­cific task. They need to cover all bases, which means they pro­vide large num­bers of tools with lengthy de­scrip­tions, con­sum­ing sig­nif­i­cant con­text.

It’s also hard to ex­tend an ex­ist­ing MCP server. You could check out the source and mod­ify it, but then you’d have to un­der­stand the code­base, to­gether with your agent.

MCP servers also aren’t com­pos­able. Results re­turned by an MCP server have to go through the agen­t’s con­text to be per­sisted to disk or com­bined with other re­sults.

I’m a sim­ple boy, so I like sim­ple things. Agents can run Bash and write code well. Bash and code are com­pos­able. So what’s sim­pler than hav­ing your agent just in­voke CLI tools and write code? This is noth­ing new. We’ve all been do­ing this since the be­gin­ning. I’d just like to con­vince you that in many sit­u­a­tions, you don’t need or even want an MCP server.

Let me il­lus­trate this with a com­mon MCP server use case: browser dev tools.

My use cases are work­ing on web fron­tends to­gether with my agent, or abus­ing my agent to be­come a scrapey lit­tle hacker boy so I can scrape all the data in the world. For these two use cases, I only need a min­i­mal set of tools:

* Start the browser, op­tion­ally with my de­fault pro­file so I’m logged in

* Navigate to a URL, ei­ther in the ac­tive tab or a new tab

* Take a screen­shot of the view­port

And if my use case re­quires ad­di­tional spe­cial tool­ing, I want to quickly have my agent gen­er­ate that for me and slot it in with the other tools.

People will rec­om­mend Playwright MCP or Chrome DevTools MCP for the use cases I il­lus­trated above. Both are fine, but they need to cover all the bases. Playwright MCP has 21 tools us­ing 13.7k to­kens (6.8% of Claude’s con­text). Chrome DevTools MCP has 26 tools us­ing 18.0k to­kens (9.0%). That many tools will con­fuse your agent, es­pe­cially when com­bined with other MCP servers and built-in tools.

Using those tools also means you suf­fer from the com­pos­abil­ity is­sue: any out­put has to go through your agen­t’s con­text. You can kind of fix this by us­ing sub-agents, but then you rope in all the is­sues that sub-agents come with.

Here’s my min­i­mal set of tools, il­lus­trated via the README.md:

# Browser Tools

Minimal CDP tools for col­lab­o­ra­tive site ex­plo­ration.

## Start Chrome

\`\`\`bash

./start.js # Fresh pro­file

./start.js –profile # Copy your pro­file (cookies, lo­gins)

Start Chrome on `:9222` with re­mote de­bug­ging.

## Navigate

\`\`\`bash

./nav.js https://​ex­am­ple.com

./nav.js https://​ex­am­ple.com –new

Navigate cur­rent tab or open new tab.

## Evaluate JavaScript

\`\`\`bash

./eval.js document.title’

./eval.js document.querySelectorAll(“a”).length’

Execute JavaScript in ac­tive tab (async con­text).

## Screenshot

\`\`\`bash

./screenshot.js

Screenshot cur­rent view­port, re­turns temp file path.

This is all I feed to my agent. It’s a hand­ful of tools that cover all the bases for my use case. Each tool is a sim­ple Node.js script that uses Puppeteer Core. By read­ing that README, the agent knows the avail­able tools, when to use them, and how to use them via Bash.

When I start a ses­sion where the agent needs to in­ter­act with a browser, I just tell it to read that file in full and that’s all it needs to be ef­fec­tive. Let’s walk through their im­ple­men­ta­tions to see how lit­tle code this ac­tu­ally is.

The agent needs to be able to start a new browser ses­sion. For scrap­ing tasks, I of­ten want to use my ac­tual Chrome pro­file so I’m logged in every­where. This script ei­ther rsyncs my Chrome pro­file to a tem­po­rary folder (Chrome does­n’t al­low de­bug­ging on the de­fault pro­file), or starts fresh:

#!/usr/bin/env node

im­port { spawn, ex­ec­Sync } from node:child_process”;

im­port pup­peteer from puppeteer-core”;

const use­Pro­file = process.argv[2] === –profile”;

if (process.argv[2] && process.argv[2] !== –profile”) {

con­sole.log(“Us­age: start.ts [–profile]“);

con­sole.log(“\nOp­tions:“);

con­sole.log(” –profile Copy your de­fault Chrome pro­file (cookies, lo­gins)“);

con­sole.log(“\nEx­am­ples:“);

con­sole.log(” start.ts # Start with fresh pro­file”);

con­sole.log(” start.ts –profile # Start with your Chrome pro­file”);

process.exit(1);

// Kill ex­ist­ing Chrome

try {

ex­ec­Sync(“kil­lall Google Chrome’”, { stdio: ignore” });

} catch {}

// Wait a bit for processes to fully die

await new Promise((r) => set­Time­out(r, 1000));

// Setup pro­file di­rec­tory

ex­ec­Sync(“mkdir -p ~/.cache/scraping”, { stdio: ignore” });

if (useProfile) {

// Sync pro­file with rsync (much faster on sub­se­quent runs)

ex­ec­Sync(

rsync -a –delete /Users/badlogic/Library/Application Support/Google/Chrome/” ~/.cache/scraping/’,

{ stdio: pipe” },

// Start Chrome in back­ground (detached so Node can exit)

spawn(

/Applications/Google Chrome.app/Contents/MacOS/Google Chrome”,

[“–remote-debugging-port=9222”, `–user-data-dir=${process.env[“HOME”]}/.cache/scraping`],

{ de­tached: true, stdio: ignore” },

).unref();

// Wait for Chrome to be ready by at­tempt­ing to con­nect

let con­nected = false;

for (let i = 0; i < 30; i++) {

try {

const browser = await pup­peteer.con­nect({

browserURL: http://​lo­cal­host:9222,

de­fault­View­port: null,

await browser.dis­con­nect();

con­nected = true;

break;

} catch {

await new Promise((r) => set­Time­out(r, 500));

if (!connected) {

con­sole.er­ror(“✗ Failed to con­nect to Chrome”);

process.exit(1);

con­sole.log(`✓ Chrome started on :9222${useProfile ? with your pro­file” : ”}`);

All the agent needs to know is to use Bash to run the start.js script, ei­ther with –profile or with­out.

Once the browser is run­ning, the agent needs to nav­i­gate to URLs, ei­ther in a new tab or the ac­tive tab. That’s ex­actly what the nav­i­gate tool pro­vides:

#!/usr/bin/env node

im­port pup­peteer from puppeteer-core”;

const url = process.argv[2];

const newTab = process.argv[3] === –new”;

if (!url) {

con­sole.log(“Us­age: nav.js

The agent needs to ex­e­cute JavaScript to read and mod­ify the DOM of the ac­tive tab. The JavaScript it writes runs in the page con­text, so it does­n’t have to fuck around with Puppeteer it­self. All it needs to know is how to write code us­ing the DOM API, and it sure knows how to do that:

#!/usr/bin/env node

im­port pup­peteer from puppeteer-core”;

const code = process.argv.slice(2).join(” );

if (!code) {

con­sole.log(“Us­age: eval.js code’“);

con­sole.log(“\nEx­am­ples:“);

...

Read the original on mariozechner.at »

9 211 shares, 17 trendiness

Where Do the Children Play?

I don’t re­mem­ber the first time I held a ma­chete, be­cause I’ve never held one. Most mem­bers of the BaYaka — a group of no­madic hunter-gath­er­ers found in the Congolese rain­forests — prob­a­bly don’t re­mem­ber ei­ther, but for dif­fer­ent rea­sons. Early mem­ory has its lim­its. Among the BaYaka, pick­ing up a ma­chete is de­vel­op­men­tally akin to lan­guage, walk­ing, and chew­ing solid food.

So goes a BaYaka child­hood. Children wan­der the forests in packs. They climb saplings and bathe in the rivers. They con­duct day-long fish­ing trips: their par­ents glance to­ward them as they or­ga­nize them­selves, then let the kids go on their way.

A few months ago, an an­thro­pol­o­gist named Gül Deniz Salalı doc­u­mented these dy­nam­ics in a bril­liant doc­u­men­tary (linked be­low), which I can­not rec­om­mend highly enough. To our eyes, their lives are ut­terly strange. We should­n’t for­get how lucky we are to live in a time where we can see such won­ders from the com­fort of a chair.

But the BaYaka child­hood is­n’t a nov­elty. As I’ll dis­cuss shortly, it is prob­a­bly the norm for our species. And that means some­thing has gone ter­ri­bly wrong in the West.

Consider some sta­tis­tics on the American child­hood, drawn from chil­dren aged 8-12:

* 45% have not walked in a dif­fer­ent aisle than their par­ents at a store;

* 56% have not talked with a neigh­bor with­out their par­ents;

* 61% have not made plans with friends with­out adults help­ing them;

* 62% have not walked/​biked some­where (a store, park, school) with­out an adult;

* 63% have not built a struc­ture out­side (for ex­am­ple, a fort or tree­house);

* 71% have not used a sharp knife;

Meanwhile, 31% of 8-12 year olds have spo­ken with large lan­guage mod­els. 23% have talked to strangers on­line, while only 44% have phys­i­cally spo­ken to a neigh­bor with­out their par­ents. 50% have seen pornog­ra­phy by the time they turn 13.

In phys­i­cal space, Western chil­dren are al­most com­i­cally shel­tered. But in dig­i­tal space, they’re en­tirely be­yond our com­mand; and in­creas­ingly, that’s where chil­dren spend most of their time. You don’t need me to tell you about the dire con­se­quences of that shift.

Why do our chil­dren spend more time in Fortnite than forests? Usually, we blame the change on tech com­pa­nies. They make their plat­forms as ad­dict­ing as pos­si­ble, and the youth sim­ply can’t re­sist — once a tod­dler locks eyes with an iPad, game over.

I want to sug­gest an al­ter­na­tive: dig­i­tal space is the only place left where chil­dren can grow up with­out us. For most of our evo­lu­tion­ary his­tory, child­hood was­n’t an adult af­fair. Independent worlds and peer cul­tures were the crux of de­vel­op­ment, as they still are among the BaYaka; kids spent their time to­gether, largely be­yond the pry­ing eyes of grown-ups.

But in the West, the grown-ups have paved over the forests and creeks where chil­dren would have once hid­den. They have ex­posed the se­cret places. So the chil­dren seek out a world of their own, as they have for mil­len­nia, if not longer. They find a prover­bial for­est to wan­der. They don’t know what we know: this for­est has eyes and teeth.

In most hu­man so­ci­eties, chil­dren have spent much of their time ex­plor­ing and play­ing within in­de­pen­dent peer cul­tures. This term re­flects two im­por­tant fea­tures of hu­man child­hood. First, the groups con­sist en­tirely of other chil­dren. Second, they are func­tion­ally and cul­tur­ally dis­tinct from adult so­ci­ety; they ex­ist along­side but apart from the world of adults.

The ev­i­dence for this pat­tern is rich and wide­spread. During his re­search among the Trobriand Islanders, for in­stance, Bronislaw Malinowski de­scribed the small re­pub­lic” of chil­dren that acts very much as is its own mem­bers de­ter­mine, stand­ing of­ten in a sort of col­lec­tive op­po­si­tion to its el­ders.”

Margaret Mead re­ported sim­i­lar pat­terns in Samoa. One group of young girls formed a co­hort which played con­tin­u­ally to­gether and main­tained a fairly co­her­ent hos­til­ity to­wards out­siders.” Of their ac­tiv­i­ties, she writes:

On moon­light nights they scoured the vil­lages al­ter­nately at­tack­ing or flee­ing from the gangs of small boys, peek­ing through drawn shut­ters, catch­ing land crabs, am­bush­ing wan­der­ing lovers, or sneak­ing up to watch a birth or a mis­car­riage in some dis­tant house .  . . They were ver­i­ta­ble groups of lit­tle out­laws es­cap­ing from the ex­ac­tions of rou­tine tasks. (Coming of Age in Samoa, p. 62).

These peer cul­tures don’t like to be seen — many ethno­g­ra­phers have no­ticed how play­groups pre­fer to seg­re­gate them­selves from adults. Among the Mbuti, an­other Central African for­ag­ing group, Colin Turnbull ob­served chil­dren spend­ing most of their time in a bopi, a play­ground set away from the main camp:

The wa­ter was fairly shal­low there, and all day the long the chil­dren splashed and wal­lowed about to their hart’s con­tent .  .  . Infants watched with envy as the older chil­dren swung wildly about, climb­ing high up on the vine strands and per­form­ing all sorts of ac­ro­bat­ics. (The Forest People, p.128).

Even in in­dus­trial so­ci­eties, this love of soli­tude shows up. In the 1950s, Iona and Peter Opie doc­u­mented the be­hav­iors of post-war British chil­dren. They no­ticed that the chil­dren liked to roam through bomb sites, where they would build fires and play hide-and-seek.

It is pos­si­ble to see these in­ter­ests play­ing out in an­cient so­ci­eties, too. A pa­per by Ella Assaf and col­leagues doc­u­mented hand­prints, foot­prints and paint­ings left in Paleolithic caves, all prob­a­bly pro­duced by chil­dren. Through in­volve­ment in rit­ual ac­tiv­i­ties, they pro­pose, Upper Paleolithic chil­dren were able to ac­tively shape their own re­al­ity as in­di­vid­u­als, as well as the re­al­ity of their com­mu­nity and its well-be­ing.”

Why do in­de­pen­dent peer cul­tures emerge so re­li­ably in our species? Anthropologists tend to view child­hood as a pe­riod dur­ing which we over im­i­tate and copy our way to cul­tural com­pe­tency. But if that’s true, it seems hard to ac­count for the fact that kids re­ally, re­ally want to get away from adults. They would rather spend time with each other. If child­hood is all about ef­fi­cient cul­tural trans­mis­sion, chil­dren should be hang­ing on our every word.

As of yet, I don’t think there’s a good ex­pla­na­tion. Maybe in­de­pen­dent peer cul­tures ex­pose chil­dren to a wider va­ri­ety of in­for­ma­tion. Maybe they pro­vide safe spaces for the mim­icry of adult ac­tiv­i­ties. Maybe adults just aren’t very much fun. As Antoine de Saint-Exupéry writes in The Little Prince: All grown-ups were chil­dren once, but only a few of them re­mem­ber it.”

All of these ac­counts are prob­a­bly a lit­tle bit true. The im­por­tant point is that kids want to spend time to­gether, in their own space, away from the tire­some grown-ups.

Which is rel­a­tively easy, if you’re liv­ing in a gi­ant for­est with par­ents who don’t pay much at­ten­tion to your ac­tiv­i­ties. But our chil­dren now find them­selves in a strange sit­u­a­tion — they have nowhere to hide, and even if they did, we might not let them go there in the first place.

Over the past few decades, child­hood mo­bil­ity in the West has dropped pre­cip­i­tously. You might think that the change has some­thing to do with the emer­gence of the Internet. But lon­gi­tu­di­nal data sug­gests oth­er­wise. Check out these sta­tis­tics from a decades-long sur­vey on the mo­bil­ity of English chil­dren:

At least in England, in­de­pen­dence is­n’t an Internet prob­lem. Drop-offs in English child­hood mo­bil­ity have been on­go­ing since the 1970s. In 1971, 80% of seven and eight-year-old chil­dren went to school un­ac­com­pa­nied by an adult, and 55% of chil­dren un­der ten-years-old were al­lowed to travel alone to places other than school within walk­ing dis­tance. By 1990, those num­bers had dropped to 9% and 30%, re­spec­tively.

We see sim­i­lar pat­terns in other places. Here’s data from Sweden, for in­stance.

In the United States, sim­i­larly, there was a drop-off from 42% (1969) to 16% (2001) in the num­ber of chil­dren who walked or biked to school alone.

So it does­n’t seem like the col­lapse of in­de­pen­dent mo­bil­ity is a phone is­sue. The truth is much more com­pli­cated.

One el­e­ment is parental at­ti­tudes: ac­cord­ing to re­sponses from a sur­vey by Play England, many par­ents fear stranger dan­ger” or judge­ment from neigh­bors if they let their kids play un­su­per­vised out­doors.

Adult em­ploy­ment pat­terns and lifestyle changes have also been slowly trend­ing to­ward car-de­pen­dency, which means that kids of­ten end up liv­ing far away from their friends. If chil­dren want peo­ple to play with, the most ef­fi­cient so­lu­tion is for their par­ents to drive them to an or­ga­nized sport or other struc­tured ac­tiv­ity.

In the Play England sur­vey, though, par­ents were most afraid that their kids would get hit by a car. Sadly, this is­n’t an un­rea­son­able fear. All the forests are cov­ered in con­crete. What would we make of a city-bound par­ent who let their tod­dler roam the streets with­out an adult nearby?

Unsurprisingly, these changes have been very bad for chil­dren. In 2013, UNICEF tracked child­hood well-be­ing against in­de­pen­dent mo­bil­ity, and their re­sults re­veal a stark cor­re­la­tion.

A mix of shift­ing parental at­ti­tudes, car-de­pen­dency, and ur­ban­iza­tion have led to an un­prece­dented sit­u­a­tion in the his­tory of hu­man child­hood. Children don’t have peer cul­tures, be­cause they have nowhere to play and no one to play with. Even if those fac­tors were ame­lio­rated, we might not let them out any­way. Too bad for them — they can’t hide from us any­more.

Or can they?

The kids won’t get off their damn phones. Children aged 6-14 av­er­age nearly three hours per day of screen time, not count­ing school­work. 50% of teenagers av­er­age over four hours of screen time daily. And so on — you know the sta­tis­tics.

Amidst all the hor­ror sto­ries, though, peo­ple of­ten ig­nore the fact that the kids don’t like it ei­ther. For ex­am­ple: most teens, es­pe­cially girls, say they spend too much time on their smart­phones. 72% of 8 to 12-year-olds say they would rather spend most of there time to­gether do­ing things in-per­son, with­out screens.” The kids are not al­right, but they aren’t dumb — they un­der­stand that some­thing is wrong with the tech­no­log­i­cal world we’ve handed them.

So why don’t they just stop?

Usually, we blame the cor­po­ra­tions. And there are lots of fin­gers to point. It is now well-known, I think, that many plat­forms de­sign their ap­pli­ca­tions like slot ma­chines to max­i­mize our at­ten­tion, and that they pur­pose­fully aim for the more mal­leable minds of chil­dren.

But data on the va­ri­eties of screen us­age sug­gests an­other ex­pla­na­tion. Social me­dia and gam­ing are now ar­guably the two most com­mon dig­i­tal ac­tiv­i­ties for chil­dren: no­tably, and un­like tele­vi­sion, both al­low chil­dren to en­gage with dis­tinct vir­tual com­mu­ni­ties that adults don’t no­tice or un­der­stand.

Tellingly, kids re­ally want those com­mu­ni­ties to ex­ist. 45% of American 8 to 12-year-olds say they would pre­fer to participate in an ac­tiv­ity with their friends in per­son that’s not or­ga­nized by adults.” 61% wish they had more time to play with friends in per­son with­out adults.” And again, most of them wish they could spend less time on screens. It seems like what they want is to wan­der to­gether in a for­est.

But they can’t. So they boot up Fortnite or TikTok in­stead.

By re­treat­ing to dig­i­tal space, chil­dren have found an open fron­tier that lies be­yond the in­ter­est or com­pre­hen­sion of their par­ents. We don’t know how to play Roblox, or what 6-7” means. These worlds are so im­pen­e­tra­ble that The Guardian writes ex­plana­tory ar­ti­cles to keep us at least slightly up-to-date.

But the kids know, and they know that the other kids know, and they know that we don’t know shit. Children have al­ways looked for a world with­out us. With the ad­vent of the Internet, they have found one ly­ing in the rub­ble.

Of course, the prob­lem is that so­cial me­dia plat­forms and video games are not Congolese rain­forests — though at least there are no leop­ards. These dig­i­tal spaces make our chil­dren de­pressed and anx­ious and in­se­cure. They ex­pose them to pornog­ra­phy. They drive them into fright­en­ing po­lit­i­cal rab­bit holes. The chil­dren need some­where to play, but this can’t pos­si­bly be the right place for it.

And yet, it’s hard to pic­ture an al­ter­na­tive. Parents should worry less about stranger dan­ger: fine. But every­thing is still cov­ered in con­crete, and every­one is still mov­ing to cities. Do we want to tear down the sky­scrap­ers and un­pave the roads? Should we de­mand a for­est for our chil­dren?

The truth is that we’re not go­ing back. My chil­dren will not grow up like the BaYaka, and there is noth­ing I can do about it.

But that does­n’t change the fact that kids need their in­de­pen­dent peer cul­tures. If we can’t pro­vide phys­i­cal spaces for them to form, then we must ac­cept that they will of­ten form in dig­i­tal spaces in­stead. So if we’re un­happy with the dig­i­tal spaces on of­fer — if we think there are too many fig­u­ra­tive leop­ards in those forests — then we should make some­thing bet­ter.

What does something bet­ter’ look like? Well, prob­a­bly a plat­form that pre­serves the as­pects of dig­i­tal space that kids find free­ing, but with­out the as­pects that make those spaces dan­ger­ous and ad­dic­tive.

Games like Roblox pre­sent some in­ter­est­ing ideas. Children ab­solutely love Roblox; ac­cord­ing to the game’s par­ent cor­po­ra­tion, their monthly player base in­cludes half of all American chil­dren un­der the age of 16. They love it be­cause it’s mul­ti­player, ex­ploratory, and ex­tremely open-ended. Independent cul­tures and sys­tems of gov­er­nance emerge in the course of game­play. They’re suf­fi­ciently com­plex that some of my col­leagues are now us­ing Roblox as a play­ground for study­ing col­lec­tive be­hav­ior in hu­mans.

But there are adults in Roblox, hid­den be­hind avatars. Their pres­ence en­sures that kids will end up see­ing dis­turb­ing con­tent that they don’t have the tools to un­der­stand. Roblox is also highly Vegasified, fea­tur­ing all the usual ex­ploita­tive tac­tics of loot boxes, sea­son up­grades and cos­metic passes. So it is nowhere near the ideal; all things con­sid­ered, chil­dren would prob­a­bly be bet­ter off if we banned them from play­ing.

Still, I can pic­ture some­thing like Roblox that might also sus­tain peer cul­tures. When I re­mem­ber the hap­pi­est parts of my child­hood, every­thing cen­ters on se­cret places: pil­low forts, hid­den cor­ners in parks, soc­cer fields af­ter dark. But I also re­mem­ber Minecraft. My friends and I would set up a Skype call, start a world and spend hours ex­plor­ing and build­ing.

When we spent time to­gether in phys­i­cal space, we were nearly al­ways be­ing su­per­vised. Not so in Minecraft. I spent thou­sands of hours mas­ter­ing the rules of that pro­ce­du­rally gen­er­ated world; my par­ents did­n’t even know what procedurally gen­er­at­ed’ meant. So, dystopian as it may sound, Minecraft servers were the clos­est thing I had to sprawl­ing Congolese rain­forests.

Perhaps that is­n’t so bad. I wish the chil­dren of to­day had a for­est. But they don’t. They’re mak­ing do with what his­tory has handed them.

We can com­plain about their screen time, lament the anx­ious gen­er­a­tion, scoff at how unnatural’ this brave new world has be­come. Simultaneously, though, we should do our best to un­der­stand why kids are be­hav­ing this way. There’s no point in whin­ing about the im­pulses en­dowed to them by sev­eral hun­dred thou­sand years of evo­lu­tion. Don’t hate the player; hate the game. And if you re­ally hate the game, make a bet­ter one.

...

Read the original on unpublishablepapers.substack.com »

10 198 shares, 8 trendiness

20th Anniversary Edition

The Pragmatic Programmer: From Journeyman to Master by Dave Thomas and Andrew Hunt was given to me as a gift af­ter an in­tern­ship. The book gave me in­valu­able ad­vice as I started out in my ca­reer as a pro­fes­sional soft­ware en­gi­neer. Re-reading it a decade later, I thought the gen­eral ad­vice still held up well, but it made ref­er­ences to tech­nolo­gies such as CORBA that are no longer used and felt dated as a re­sult. The au­thors agreed and wrote a 20th an­niver­sary edi­tion that was up­dated for mod­ern de­vel­op­ers. A third of the book is brand-new ma­te­r­ial, cov­er­ing sub­jects such as se­cu­rity and con­cur­rency. The rest of the book has been ex­ten­sively rewrit­ten based on the au­thors’ ex­pe­ri­ence putting these prin­ci­ples into prac­tice. We dis­cussed the 20th an­niver­sary edi­tion in my book club at work.

The book is meant for those just start­ing out in the world of pro­fes­sional soft­ware en­gi­neer­ing. Many of the tips, such as Tip 28: Always Use Version Control will seem ob­vi­ous to ex­pe­ri­enced hands. However, it can also be a guide for se­nior de­vel­op­ers men­tor­ing ju­nior de­vel­op­ers, putting ac­tion­able ad­vice into words. The book is also valu­able to those who lack a for­mal CS ed­u­ca­tion; it ex­plains things like big-O no­ta­tion and where to learn more about these sub­jects. I think that any soft­ware en­gi­neer will get one or two things out of this book, though it’s most valu­able for be­gin­ners.

One of the things I ap­pre­ci­ate about the book is that they talk about ap­ply­ing the prin­ci­ples not only to soft­ware en­gi­neer­ing but to writ­ing the book as well. The book was orig­i­nally writ­ten in troff and later con­verted to LaTeX. For ex­am­ple, to il­lus­trate Tip 29: Write Code That Writes Code they wrote a pro­gram to con­vert troff markup to LaTeX. In the 20th an­niver­sary edi­tion, they talk about their ef­forts to use par­al­lelism to speed up the book build process and how it led to sur­pris­ing bugs.

Perhaps the best thing about the book is that the au­thors sum­ma­rize their points into short tips high­lighted through­out the book. The au­thors help­fully at­tach these tips to a card at­tached to the phys­i­cal book. This makes it easy to re­mem­ber the prin­ci­ples es­poused in the book and to re­fer to them later. I think this is a fea­ture that more books should in­clude, es­pe­cially man­age­r­ial or tech­ni­cal books.

The first chap­ter is less about cod­ing and more about the gen­eral prin­ci­ples a prag­matic pro­gram­mer fol­lows. Most of all, it’s about tak­ing re­spon­si­bil­ity for your work. The first tip of the chap­ter is Tip 3: You Have Agency: if you don’t like some­thing, you can be a cat­a­lyst for change. Or you can change or­ga­ni­za­tions if change is­n’t hap­pen­ing. The most im­por­tant tip of the chap­ter to me is Tip 4: Provide Options, Don’t Make Lame Excuses. In this sec­tion, they dis­cuss tak­ing re­spon­si­bil­ity for the com­mit­ments you make and hav­ing a con­tin­gency plan for things out­side your con­trol. If you don’t meet the com­mit­ment, pro­vide so­lu­tions to fix the prob­lems. Don’t tell your boss, The cat ate my source code.”

Software rots over time with­out ef­forts to fix it. The au­thors talk about bro­ken win­dows polic­ing, the the­ory that mi­nor prob­lems such as a sin­gle bro­ken win­dow give peo­ple the psy­cho­log­i­cal safety to com­mit larger crimes. Regardless of whether bro­ken win­dows polic­ing is ac­tu­ally true, the metaphor ap­plies to soft­ware. This leads to Tip 5: Don’t Live with Broken Windows: If you see a bro­ken win­dow in your soft­ware, make an ef­fort to fix it, even if it’s only a mi­nor ef­fort to board it up. This may seem im­prac­ti­cal if your pro­ject al­ready has a lot of bro­ken win­dows, but this tip helps you avoid cre­at­ing such an en­vi­ron­ment in the first place. In my ex­pe­ri­ence, it works: when we set up a new pro­ject at work, we made a com­mit­ment to use git com­mit hooks to en­force cod­ing stan­dards. This made each of us more re­luc­tant to com­pro­mise on soft­ware to be­gin with, and all of the code was a good ex­am­ple to copy from.

A prag­matic pro­gram­mer is al­ways learn­ing, and learns things out­side their spe­cialty; they are a jack of all trades. Even if they are a spe­cial­ist in their cur­rent role, they in­vest reg­u­larly in a broad knowl­edge port­fo­lio. In ad­di­tion to soft­ware skills, peo­ple skills are im­por­tant as well. The sec­tion Communicate!” shows how to ef­fec­tively com­mu­ni­cate your ideas, such as how to pre­sent, what to say, and how pick the right time. In the words of Tip 11: English is Just Another Programming Language. If you don’t have an an­swer to an email im­me­di­ately, re­spond with an ac­knowl­edg­ment and that you’ll get back to them later - no­body wants to be talk­ing to a void. Don’t be afraid to reach out for help if you need it; that’s what your col­leagues are there for, af­ter all. And don’t ne­glect doc­u­men­ta­tion! Make it an in­te­gral part of the de­vel­op­ment process, not an af­ter­thought.

Finally, the prin­ci­ples in this book are not iron-clad: you must con­sider the trade­offs be­tween dif­fer­ent val­ues and make the right de­ci­sion for your pro­ject. Your soft­ware does not need to be per­fect. When work­ing on soft­ware, in­volve your users in de­cid­ing what qual­ity is­sues are ac­cept­able in re­turn for get­ting it out faster. After all, if you wait a year to ship the per­fect ver­sion, their re­quire­ments will change any­ways. As Tip 8 says: Make Quality a Requirements Issue.

Why is de­cou­pling good? Because by iso­lat­ing con­cerns we make each eas­ier to change. ETC.

Why is the sin­gle re­spon­si­bil­ity prin­ci­ple use­ful? Because a change in re­quire­ments is mir­rored by a change in just one mod­ule. ETC.

Why is nam­ing im­por­tant? Because good names make code eas­ier to read, and you have to read it to change it. ETC!

However, the au­thors also stress that ETC is a value, not a rule. For ex­am­ple, ETC may not be ap­pro­pri­ate for writ­ing code that has high per­for­mance re­quire­ments; mak­ing the code com­plex to achieve the per­for­mance re­quire­ments is an ac­cept­able trade­off.

They then turn to an­other im­por­tant acronym for im­ple­ment­ing ETC in Tip 15: DRY—Don’t Repeat Yourself. DRY makes things eas­ier to change by hav­ing one place to change any­thing. Worse, if you for­get to make a change, you’ll have con­tra­dic­tory in­for­ma­tion in your pro­gram that could crash it or silently cor­rupt data.

What kind of du­pli­ca­tion is there?

Code Duplication: For ex­am­ple, hav­ing a case state­ment du­pli­cated across sev­eral dif­fer­ent places rather than in a sin­gle func­tion.

Documentation Duplication: Some peo­ple be­lieve that every func­tion needs a com­ment. If you do this, you will also have to up­date the com­ments each time the func­tion changes. Ask what your com­ment adds to the code be­fore writ­ing it!

Data Duplication: Caching an ex­pen­sive re­sult and for­get­ting to up­date the cache when the source data changes.

Representational Duplication: When you work with ex­ter­nal API, the client and server must ad­here to the same for­mat in or­der to work. If one changes, the other side will break Having a com­mon spec­i­fi­ca­tion, such as ope­nAPI al­lows you to in­te­grate more re­li­ably with the ser­vice.

Interdeveloper du­pli­ca­tion: When two de­vel­op­ers do the same work. This can be mit­i­gated by Tip 16: Make It Easy to Reuse. If it’s hard to use your code, other de­vel­op­ers will be tempted to du­pli­cate it.

A closely re­lated prin­ci­ple to DRY is Orthogonality. Two com­po­nents of a soft­ware sys­tem are or­thog­o­nal if changes in one do not ef­fect the other. Systems should be de­signed as a set of co­op­er­at­ing in­de­pen­dent mod­ules, each of which has a sin­gle, well-de­fined pur­pose. Modules com­mu­ni­cate be­tween them­selves us­ing well de­fined in­ter­faces and don’t rely on shared global data or the im­ple­men­ta­tion de­tails of an­other mod­ule. Unless you change a com­po­nen­t’s ex­ter­nal in­ter­faces, it should not cause changes in the rest of the sys­tem. Orthogonal sys­tems are eas­ier to test, be­cause more test­ing can be done at the mod­ule level in unit tests rather than end-to-end in­te­gra­tion tests that test the whole sys­tem.

Often, when start­ing a soft­ware pro­ject, there are a lot of un­knowns. The user has an idea of what they want, but there’s some am­bi­gu­ity in the re­quire­ments. You don’t know if the li­brary and frame­works you pick will work nicely to­gether. The so­lu­tion here is Tip 20: Use Tracer Bullets to Find the Target. In a ma­chine gun, tracer bul­lets are bul­lets that glow in the air, en­abling the user to see if they’re hit­ting the tar­get at night. Tracer Bullet Development pro­vides that kind of im­me­di­ate feed­back. Look for a sin­gle fea­ture that can be built quickly us­ing the ar­chi­tec­tural ap­proach you’ve cho­sen, and put that in front of the users. You may miss; users may say that’s not quite what they wanted. But that’s the point of tracer code: it al­lows you to ad­just your aim with a skele­ton pro­ject that’s eas­ier to change than a fi­nal ap­pli­ca­tion. Users will be de­lighted to see some­thing work­ing early, and you’ll have an in­te­gra­tion plat­form to build the rest of the ap­pli­ca­tion on.

Tracer code is dif­fer­ent from pro­to­types. To the au­thors, pro­to­types are dis­pos­able code used to learn about a prob­lem do­main, never meant to be used in pro­duc­tion. Prototypes don’t even have to be code. A UI can be mocked up in an in­ter­face builder, or an ar­chi­tec­ture mapped out with post-it notes. In terms of Tip 21: Prototype to Learn. In con­trast, tracer bul­let code is meant to be part of the fi­nal ap­pli­ca­tion.

The fi­nal tip of this chap­ter I bring up is Tip 18: There Are No Final Decisions. Decisions should be re­versible; if you rely on MySQL to­day, you may find your­self need­ing to switch to Postgres six months from now. If you’ve prop­erly ab­stracted the data­base logic, mak­ing this change should be easy. Marketing may de­cide that your web app should be a mo­bile app in the fu­ture; if your ar­chi­tec­ture is built well, this ex­tra de­mand should not be a bur­den. This is one tip I dis­agree with: I think it can eas­ily be taken too far. If you pro­vide too much re­versibil­ity, you’ll end up with over-ab­stracted code with con­fig­u­ra­tion op­tions that are never used. I think it’s more rea­son­able to think about what de­ci­sions can rea­son­ably change and make them flex­i­ble; if you spend all your time try­ing to cover for every pos­si­bil­ity, you’ll never get around to ac­tu­ally cod­ing the re­quired func­tion­al­ity.

This chap­ter fo­cuses on how to make the most out of your tools, what tools to in­vest in, and how to ap­proach de­bug­ging. The first bit of ad­vice: Tip 25: Keep Knowledge in Plain Text. By plain text, they mean keep knowl­edge such as con­fig­u­ra­tion or data in a si­mul­ta­ne­ously hu­man-read­able and com­puter read­able for­mat. Plain text in­sures you against ob­so­lesce; you can al­ways write some­thing to parse it later, while re­verse-en­gi­neer­ing a bi­nary for­mat is sig­nif­i­cantly harder. In ad­di­tion, al­most any other tool in ex­is­tence can process plain text in some way, so you’ll have an ex­ten­sive suite of other tools to use. As an ex­ten­sion of the power of plain text, they also sug­gest you mas­ter a com­mand shell such as bash. Shells pro­vide a fam­ily of tools that are com­pos­able with each other, and can be com­bined as much as your imag­i­na­tion al­lows. A GUI in con­trast, lim­its you to the ac­tions the pro­gram­mers of the GUI thought of in ad­vance. Finally, you should learn a text pro­cess­ing lan­guage such as awk or perl to get the most out of text - the au­thors used perl (first edi­tion) and ruby (20th an­niver­sary edi­tion) to au­to­mat­i­cally high­light the source code in the book, for ex­am­ple.

The next topic the au­thors turn to is de­bug­ging. Debugging is the main task a soft­ware en­gi­neer does through­out their day, so it’s es­sen­tial you get good at it. Defects show up in a va­ri­ety of ways, from mis­un­der­stood re­quire­ments to cod­ing er­rors. Some cul­tures try to find some­one to blame for a de­fect; the au­thors think you should avoid that with Tip 29: Fix the Problem, Not the Blame.

They give the fol­low­ing tips on de­bug­ging your code:

Tip 30: Don’t Panic: It’s easy to panic when you’re on a tight dead­line or a client is an­gry at you. However, take a deep breath and think about the prob­lem at hand. The cause of the bug may be sev­eral lay­ers re­moved from what you’re see­ing, so try to fo­cus on root causes rather than fix­ing the symp­toms.

The Impossible has Happened: If you think to your­self that’s not pos­si­ble” - you’re wrong. It’s clearly pos­si­ble, and it’s star­ing you in the face.

Reproduce It!: Find a min­i­mal case that trig­gers the bug, whether that be a cer­tain in­put data set, or pat­tern of ac­tions. Once you can re­li­ably cause the bug, you can trace it through your code.

Tip 32: Read the Damn Error Message: Enough said.

The Operating System is Fine: It’s pos­si­ble that you found a bug in the Linux ker­nel or post­gres, but these are ex­ten­sively bat­tle-tested ap­pli­ca­tions. It’s much more likely that the prob­lem is in your code.

The Binary Chop: Cut things in half un­til you find the prob­lem. This mas­sively de­creases the search space you have to work in. If you have a long stack trace and are try­ing to find which func­tion man­gled the value, log the value halfway through. If the value is fine, log the value halfway through the next half, or if it’s man­gled, halfway through the pre­vi­ous half, and so on. If a re­lease in­tro­duces a re­gres­sion, find a ver­sion that’s fine, and bi­nary chop through the com­mits to find the com­mit that in­tro­duced the bug.

Use a Debugger and/​or Logging Statements: Debuggers al­low you to step through the code and in­spect the val­ues of vari­ables, find­ing the ex­act point where things go wrong. In en­vi­ron­ments where a de­bug­ger is not avail­able, log­ging state­ments can show you how a vari­able changes in time, or just how far the pro­gram got be­fore crash­ing.

Rubber Ducking: Explain the bug to a col­league, or talk out loud to a rub­ber duck. You don’t have to get a re­sponse, by ver­bal­iz­ing your as­sump­tions you may gain sud­den in­sight into the prob­lem.

Once you’ve solved the bug, how­ever there’s still one more step: you should write a test to catch that bug in the fu­ture.

Tip 36: You Can’t Write Perfect Software starts off the chap­ter. While we’d like to write per­fect soft­ware, there will al­ways be bugs, poor de­sign de­ci­sions, and miss­ing doc­u­men­ta­tion. The theme of this chap­ter is how to de­sign this fact in mind.

The first idea they pro­pose is Design By Contract. Similar to le­gal con­tracts, it ex­plains a func­tion or mod­ule’s rights and re­spon­si­bil­i­ties. A con­tract has three parts: It has Preconditions: things that must be true when it is called, such as what qual­i­fies as valid in­puts. Postconditions are what will be true when it is done, such as a sort rou­tine re­turn­ing a sorted ar­ray. Finally, Invariants are things that are al­ways true from the caller’s per­spec­tive - they may change while the rou­tine is run­ning, but will hold at the be­gin­ning and the end of the call. For ex­am­ple, in a sort rou­tine, the in­vari­ant is that the list to be sorted will con­tain the same num­ber of items when it started as when it fin­ished. If the con­tract is vi­o­lated, the con­tract will spec­ify what to do, such as crash or throw an ex­cep­tion.

Some lan­guages, such as Clojure have built-in se­man­tics for de­sign by con­tract, with ex­plicit pre- and post- con­di­tions. However, if your lan­guage does­n’t sup­port con­tracts, you can im­ple­ment them with Tip 39: Use Assertions to Prevent the Impossible. You can as­sert that the con­di­tions of your con­tract are true, and han­dle the cases where the con­tract is vi­o­lated. If you don’t know what to do when a con­tract is vi­o­lated, the au­thors rec­om­mend Tip 38: Crash Early. It’s bet­ter that you crash rather than write in­cor­rect data to the data­base. After all, dead pro­grams tell no lies. Of course, crash­ing im­me­di­ately may not be ap­pro­pri­ate - if you have re­sources open make sure to close them be­fore ex­it­ing.

The fi­nal para­noid tip is Tip 43: Avoid Fortune-Telling. Pragmatic pro­gram­mers only make de­ci­sions that they can get im­me­di­ate feed­back on. The more pre­dic­tions you make about the fu­ture, the more likely you’ll get some of the pre­dic­tions wrong and make the wrong de­ci­sion based on them.

You might find your­self slip­ping into for­tune telling when you have to:

In a pre­vi­ous chap­ter, the au­thors wrote about mak­ing de­ci­sions re­versible and eas­ier to change. This chap­ter tells you how to im­ple­ment it in your code. The key here is to make your code flex­i­ble rather than rigid - good code bends to cir­cum­stances rather than breaks. Part of this is de­cou­pling code. Code is con­sider cou­pled when they share some­thing in com­mon. This may be some­thing as sim­ple as a shared global vari­able, or some­thing more com­plex like an in­her­i­tance chain.

The au­thors ar­gue against what they term Train Wrecks - long chains of method calls, such as this ex­am­ple they give:

This code is tra­vers­ing many dif­fer­ent lev­els of ab­strac­tion - you have to know that a cus­tomer ob­ject ex­poses or­ders, that or­ders have a find method, and that the or­der find re­turns has a get­To­tal method. If any of these lev­els of ab­strac­tion are changed, your code might break. And re­quire­ments may change; What if the busi­ness de­cides to im­ple­ment a max­i­mum dis­count amount of 40%? Certainly, this could be ap­plied in the ap­ply­Dis­count rou­tine, but any­thing could mod­ify the grand­To­tal and dis­count fields - this rule could be vi­o­lated if other mod­ules mod­i­fy­ing the to­tals ob­ject don’t get the memo.

The au­thors sug­gest refac­tor­ing the code so that there is no or­ders ob­ject, just a find method and an ap­ply­Dis­count method for the or­der ob­ject that im­ple­ments the 40% rule:

The au­thors sug­gest hav­ing only one . when you ac­cess some­thing if that some­thing is likely to change, such as any­thing in your ap­pli­ca­tion, or a fast mov­ing ex­ter­nal API. This in­cludes us­ing in­ter­me­di­ate vari­ables be­tween ac­cesses, such as this code:

However, the rule does not ap­ply to things that are un­likely to change, such as core lan­guage APIs. So this code is ok:

Another source of cou­pling is glob­ally ac­ces­si­ble data. Global data makes it hard to rea­son about the state of a pro­gram, since any other mod­ule might be able to change it. Global data in­cludes de­sign pat­terns such as sin­gle­tons, and ex­ter­nal re­sources such as data­bases. Given how ex­ten­sive global re­sources are, how can one avoid them? If global data is un­avoid­able, the key is to man­age them through a well-de­fined API that you con­trol, rather than al­low­ing any­thing to read and write global data. In the words of Tip 48: If It’s Important Enough to Be Global, Wrap It in an API.

Poor use of in­her­i­tance is a third source of cou­pling. Inheritance is used for two rea­sons: code reuse and type mod­el­ing. Inheritance does­n’t work for code reuse; Not only is the code of a child class cou­pled to any an­ces­tor of the class, so is any code that uses the class. Things may un­ex­pect­edly break when an an­ces­tor changes an API, even if you are us­ing a sub­class.

Nor does in­her­i­tance work for mod­el­ing types. Class hi­er­ar­chies quickly be­come tan­gled, wall cov­er­ing mon­strosi­ties. Another prob­lem is mul­ti­ple in­her­i­tance. A Car may be a type of Vehicle, but it may be an Asset or InsuredItem. Multiple in­her­i­tance is re­quired to model this, and many OO lan­guages don’t sup­port mul­ti­ple in­her­i­tance. Instead of pay­ing the in­her­i­tance tax, the au­thors sug­gest us­ing:

Interfaces or Protocols are classes that con­tain no code but in­stead con­tains be­hav­iors. A class that im­ple­ments an in­ter­face promises to de­fine the be­hav­iors. For ex­am­ple, a Car might im­ple­ment Drivable which has meth­ods such as ac­cel­er­ate and brake. Interfaces can be used as types, and any class that im­ple­ments the in­ter­face will be com­pat­i­ble with that type. This is a much eas­ier way to pro­vide poly­mor­phism than in­her­i­tance.

Another al­ter­na­tive to in­her­i­tance is del­e­ga­tion. If you want to in­clude be­hav­ior from class Foo add a mem­ber of type Foo to your class rather than in­herit from Foo. You can then use Foo’s API wrapped in code you con­trol. Delegation is a has-a re­la­tion­ship rather than a is-a re­la­tion­ship.

The prob­lem with in­ter­faces and del­e­ga­tion is that they re­quire writ­ing lots of boil­er­plate code. For ex­am­ple, it’s likely that most of your classes that im­ple­ment Drivable will have the same logic for brake, but each class will have to write it’s own im­ple­men­ta­tion of brake. This leads to re­peated code across your code­base, vi­o­lat­ing the DRY prin­ci­ple. To re­solve this, the au­thors turn to Mixins - sets of func­tions that can be mixed into” a class. This al­lows you to add com­mon func­tion­al­ity with­out us­ing in­her­i­tance. I won­der how mix­ins are im­ple­mented in a lan­guage like Java, which does­n’t have an ob­vi­ous ver­sion of that fea­ture. It’s also not clear to me how mix­ins are dif­fer­ent from in­her­i­tance; aren’t they just a form of mul­ti­ple in­her­i­tance?

Tip 55: Parameterize Your App Using External Configuration: Code may have val­ues that change while the ap­pli­ca­tion is run­ning, such as cre­den­tials for for third-party ser­vices. Rather than di­rectly in­clud­ing the val­ues in your code, you should ex­ter­nal­ize them and put them in a con­fig­u­ra­tion bucket. Keeping cre­den­tials in source code is a se­cu­rity risk - hack­ers scan pub­lic git repos­i­to­ries for com­mon se­cu­rity cre­den­tials, such as AWS keys. It’s com­mon to store them in a flat file or data­base ta­bles, and read them when the ap­pli­ca­tion ini­tial­izes. However, in our world of highly-avail­able ap­pli­ca­tions that’s not as ap­pro­pri­ate. Instead the au­thors pro­pose con­fig­u­ra­tion-as-a-ser­vice, where con­fig­u­ra­tion is stored be­hind a ser­vice API. This al­lows mul­ti­ple ap­pli­ca­tions to share con­fig­u­ra­tion in­for­ma­tion, use ac­cess con­trol to con­trol who can see and edit con­fig­u­ra­tion, and pro­vide a UI to eas­ily edit con­fig in­for­ma­tion. Using the con­fig­u­ra­tion ser­vice, ap­pli­ca­tions can sub­scribe to a con­fig­u­ra­tion item and get no­ti­fi­ca­tions when they change. This al­lows ap­pli­ca­tions to up­date con­fig data on their side with­out restart­ing.

This chap­ter deals with Parallelism, where two pieces of code run at the same time, and Concurrency, where things act as if they run at the same time. In the real world, things are asyn­chro­nous - the user is sup­ply­ing in­put, net­work re­sources are called, and the screen is be­ing re­drawn all at the same time. Applications that run every­thing se­ri­ally feel slug­gish.

In Tip 56: Analyze Workflow to Improve Concurrency the au­thors ad­vo­cate that you break tem­po­ral cou­pling where pos­si­ble. Temporal Coupling is when your code de­pends on event A hap­pen­ing be­fore event B. You should look at your work­flow to see what can be ex­e­cuted con­cur­rently. Look for ac­tiv­i­ties that take a lot of time that would al­low for some­thing else to be done in the mean­time. If your ap­pli­ca­tion makes mul­ti­ple in­de­pen­dent API calls to a re­mote ser­vice, ex­e­cute them on sep­a­rate threads rather than se­ri­ally, then gather up the re­sults of each call. If your work­flow al­lows a way to split the work into mul­ti­ple in­de­pen­dent units, take ad­van­tage of those mul­ti­ple cores and ex­e­cute them in par­al­lel.

Of course, par­al­lelism has its pit­falls as well. For ex­am­ple, imag­ine read­ing an in­te­ger, in­cre­ment­ing it, and writ­ing it back. If two processes read that in­te­ger at the same time, they will each in­cre­ment the value to n+1, when you want it to be n+2. The up­date needs to be atomic; each process needs to do this se­quen­tially with­out the other process in­ter­fer­ing. This can be done through syn­chro­nized meth­ods, sem­a­phores, or other forms of re­source lock­ing. However, they have their own dan­gers as well, such as dead­lock­ing, where two processes each get a lock on one of two needed re­sources, but not the other. Each waits for­ever for the other to re­lease its lock. The au­thors think you should avoid shared state rather than try to han­dle your­self wher­ever pos­si­ble; Tip 57: Shared State Is Incorrect State.

The au­thors ran into this is­sue when writ­ing the 20th an­niver­sary edi­tion: they up­dated the build process for the book to uti­lize par­al­lelism. However, the build would ran­domly fail. The au­thors tracked this down to chang­ing the di­rec­tory tem­porar­ily. In the orig­i­nal, a sub­task would change di­rec­tory, then go back to the orig­i­nal di­rec­tory. However, this no longer worked when new threads started, ex­pect­ing to be in the root di­rec­tory. Depending on the tim­ing, this could break the build. This prompted them to write Tip 58: Random Failures Are Often Concurrency Issues.

Chapter 7: While You Are Coding

This chap­ter is more of a grab-bag. It cov­ers sub­jects such as psy­chol­ogy, big-O no­ta­tion, refac­tor­ing, se­cu­rity, and test­ing.

In Tip 61: Listen to Your Inner Lizard the au­thors talk about lis­ten­ing to your in­stincts (your lizard-brain). If you find your­self hav­ing a hard time writ­ing code, your brain is try­ing to tell you some­thing. Perhaps the struc­ture or de­sign is wrong, or you don’t fully un­der­stand the re­quire­ments. If you find your­self in this sit­u­a­tion, take a step back and think about what you are do­ing. Maybe go for a walk, or sleep on it. You might find that the so­lu­tion is star­ing you in the face when you come back.

Perhaps you need to refac­tor the code in­stead of writ­ing more. Refactoring is a con­tin­u­ous process, es­poused in Tip 65: Refactor Early, Refactor Often. If any­thing strikes you as wrong in your code, such as DRY vi­o­la­tions, out­dated knowl­edge or non-or­thog­o­nal de­sign, don’t hes­i­tate to fix it. When you are refac­tor­ing, make sure you have a good suite of unit tests be­fore­hand to test if your changes break any­thing. Run the tests fre­quently to check if you’ve bro­ken any­thing.

Speaking of tests, the au­thors start with a bold as­ser­tion: Tip 67: Testing Is Not About Finding Bugs. Instead, tests func­tion as the First User of Your Code - a source of im­me­di­ate feed­back, and im­me­di­ately forces you to think about what counts as a cor­rect so­lu­tion. In ad­di­tion, tightly cou­pled code tends to be hard to test, so it helps you make good de­sign de­ci­sions. The au­thors em­phat­i­cally do not think you should adopt full-on Test Driven Development - it’s too easy to be­come a slave to writ­ing tests. They note an ex­am­ple of a TDD ad­vo­cate start­ing a su­doku solver us­ing TDD and spent so much time writ­ing the tests they failed to write the solver it­self!

In a side­bar, Dave Thomas ex­plains that he stopped writ­ing tests for a few months, and said not a lot” hap­pened. The qual­ity did­n’t drop, nor did he in­tro­duce bugs into the code. His code was still testable, it just was­n’t tested.

Andy says I should­n’t in­clude this side­bar. He wor­ries it will tempt in­ex­pe­ri­enced de­vel­op­ers not to test. Here’s my com­pro­mise: Should you write tests? Yes. But af­ter you’ve been do­ing it for 30 years, feel free to ex­per­i­ment a lit­tle to see where the ben­e­fit lies for you.

This chap­ter fo­cuses on how to start your pro­ject on the right foot. The first sub­ject the au­thors tackle is re­quire­ments gath­er­ing: The Requirements Pit. While we talk about gath­er­ing re­quire­ments as if they are on the ground, wait­ing to be picked up, re­quire­ments are non-ob­vi­ous be­cause of Tip 75: No One Knows Exactly What They Want. They think of re­quire­ments gath­er­ing as a kind of ther­apy, where you take an ini­tial re­quire­ment and ask ques­tions about the de­tails to nail down ex­actly what they need. The au­thors show an ex­am­ple of a sim­ple re­quire­ment: Shipping should be free on all or­ders cost­ing $50 or more”. Does that in­clude the ship­ping cost it­self? Tax? If you’re sell­ing ebooks as well, should they be in­cluded? The job of the pro­gram­mer is Tip 76: Programmers Help People Understand What They Want. You should find any edge cases the client may not have con­sid­ered and make sure they’re doc­u­mented. This does­n’t mean cre­at­ing long spec­i­fi­ca­tions the client won’t read. Instead, the au­thors think re­quire­ments should be able to fit on an in­dex card. This helps pre­vent fea­ture creep; if the client un­der­stands how adding one more in­dex card will im­pact the sched­ule, they’ll con­sider the trade­offs and pri­or­i­tize the re­quire­ments they need the most.

You are given con­straints in your re­quire­ments as well. Your job as a soft­ware en­gi­neer is to eval­u­ate if those con­straints are things you ac­tu­ally have to live with or if you can re­lax them. In the words of Tip 81: Don’t Think Outside the Box—Find the Box, the con­straints are the edges of the box. What you ini­tially thought of as a con­straint may ac­tu­ally be an as­sump­tion you held.

Another tip the au­thors ad­vo­cate for is Tip 78: Work with a User to Think Like a User. If you’re build­ing an in­ven­tory sys­tem, work in the ware­house for a few days to get an idea of their processes and how your sys­tem will be used. If you don’t un­der­stand how it will be used, you could cre­ate some­thing that meets all of the re­quire­ments but is to­tally use­less. They cite an ex­am­ple of a dig­i­tal sound mix­ing board that could do any­thing to sound that was pos­si­ble, yet no­body wanted to use it. Rather than take ad­van­tage of record­ing en­gi­neers’ ex­pe­ri­ence with tac­tile slid­ers and knobs, they built an in­ter­face that was un­fa­mil­iar to them. Each fea­ture was buried be­hind menus and given un­in­tu­itive names.  It did what was re­quired, but did­n’t do it how it was re­quired.

The au­thors also con­sider in this chap­ter what it means to be Agile. Many teams and com­pa­nies are ea­ger for an off-the-shelf so­lu­tion: call it Agile-in-a-Box. But no process can make you Agile; Use this process and you’ll be ag­ile” ig­nores a key part of the Agile man­i­festo: Individuals and in­ter­ac­tions over processes and tools. To the au­thors Agile can be boiled down to the fol­low­ing:

Work out where you are.

Make the small­est mean­ing­ful step to­wards where you want to be.

Evaluate where you end up, and fix any­thing you broke.

Do this for every level of what you do, from process to code, and you’ll have adopted the Agile spirit.

Can the lessons of The Pragmatic Programmer be ap­plied to teams too? The au­thors say yes. This chap­ter fo­cuses on how to ap­ply the lessons of the pre­vi­ous chap­ters to the team level. Many of the lessons are the same as those men­tioned pre­vi­ously, so I won’t go into them again.

The au­thors ad­vise Tip 87: Do What Works, Not What’s Fashionable. Just be­cause Google or Facebook adopts process $ x $ does­n’t mean it’s right for your team. How do you know if some­thing works? Try it. Pilot an idea with a small team, and see what works about it and what does­n’t. The goal is­n’t to do Scrum” or be Agile”, but to de­liver work­ing soft­ware con­tin­u­ously. When you adopt a new idea, you should do it with im­prov­ing con­tin­u­ous de­ploy­ment of soft­ware in mind. If you’re mea­sur­ing your de­ploy­ments in months, try to get it down to weeks in­stead. Once you get it down to weeks, try to de­liver in one-week it­er­a­tions.

Related to con­tin­u­ously de­liv­er­ing soft­ware is Tip 96: Delight Users, Don’t Just Deliver Code. Delivering work­ing soft­ware in a timely mat­ter is not enough to de­light your users; that is merely meet­ing ex­pec­ta­tions. The au­thors sug­gest you ask your users a ques­tion:

How will you know that we’ve all been suc­cess­ful a month (or a year, or what­ever) af­ter this pro­ject is done?

The an­swer may not be re­lated to the re­quire­ments, and may sur­prise you. For ex­am­ple, a rec­om­men­da­tions en­gine might be val­ued on dri­ving cus­tomer re­ten­tion. But once you know what the se­cret to suc­cess is, you should aim not just to hit the goal but to ex­ceed it.

Finally, take pride in your work. The fi­nal tip of the book is Tip 97: Sign Your Work.

I was only able to cover a por­tion of this re­mark­able book in this re­view. I highly rec­om­mend this book to any soft­ware en­gi­neer, es­pe­cially to those just start­ing out in the field. It makes a great grad­u­a­tion gift to some­one just fin­ish­ing their CS de­gree.

<< Previous

...

Read the original on www.ahalbert.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.