10 interesting stories served every morning and every evening.




1 642 shares, 45 trendiness

Ghostty Docs

Ghostty is a fast, fea­ture-rich, and cross-plat­form ter­mi­nal em­u­la­tor that uses plat­form-na­tive UI and GPU ac­cel­er­a­tion.

Ghostty is a fast, fea­ture-rich, and cross-plat­form ter­mi­nal em­u­la­tor that uses plat­form-na­tive UI and GPU ac­cel­er­a­tion.

Install Ghostty and run!

Zero con­fig­u­ra­tion re­quired to get up and run­ning.

Ready-to-run bi­na­ries for ma­cOS. Packages or build from source for Linux.

...

Read the original on ghostty.org »

2 530 shares, 24 trendiness

Switch to Claude without starting over

Ask ques­tions about this page

Switch to Claude with­out start­ing over­Bring your pref­er­ences and con­text from other AI providers to Claude. With one copy-paste, Claude up­dates its mem­ory and picks up right where you left off. Memory is avail­able on all paid plans. Import what mat­ters in un­der a min­uteY­ou’ve spent months teach­ing an­other AI how you work. That con­text should­n’t dis­ap­pear be­cause you want to try some­thing new. Claude can im­port what mat­ters, so your first con­ver­sa­tion feels like your hun­dredth.Copy and paste the pro­vided prompt into a chat with any AI provider. It’s writ­ten specif­i­cally to help you get all of your con­text in one chat.Copy and paste the re­sults into Claude’s mem­ory set­tings. That’s it! Claude will up­date its mem­ory and you’re good to go.Mem­ory that un­der­stands how you work­Claude learns your pref­er­ences across con­ver­sa­tions, keeps pro­ject con­text sep­a­rate so noth­ing bleeds to­gether, and lets you see and edit every­thing it re­mem­bers. Your AI should know you from day on­eS­tart your Pro plan, im­port your mem­ory when you’re ready, and see for your­self.

...

Read the original on claude.com »

3 487 shares, 31 trendiness

Ad-Supported AI Chat Demo — See Every Ad Type in Action

A satir­i­cal (but real!) demo of what AI chat could look like in an ad-sup­ported fu­ture. Chat with an AI while ex­pe­ri­enc­ing every mon­e­ti­za­tion pat­tern imag­in­able — ban­ners, in­ter­sti­tials, spon­sored re­sponses, freemium gates, and more.

Join 2 mil­lion pro­fes­sion­als who think faster, fo­cus bet­ter, and ac­com­plish more. AI-powered goal track­ing, habit build­ing, and mem­ory en­hance­ment. First 30 days FREE!Think 10x Faster with AI. First Month FREE! 🧠Did you know? Writing down 3 goals each morn­ing in­creases pro­duc­tiv­ity by 42%! BrainBoost Pro tracks your daily goals with AI re­minders. Your AI as­sis­tant, proudly pow­ered by the finest ad­ver­tis­ing money can buy 💸⚠️ Warning: This AI may spon­ta­neously rec­om­mend prod­ucts at any time🏷️ This con­ver­sa­tion is proudly pow­ered by BrainBoost Pro™ • Ad-supported free tier • Remove adsStressed by all these ads? 10 min­utes of AI-guided med­i­ta­tion changes every­thing.AI-cu­rated meal prep kits de­liv­ered weekly. $30 off your first box!🎨 Today’s chat theme spon­sored by BrainBoost Pro • Colors, fonts, and vibes cu­rated by our ad­ver­tis­ing team

This tool is a satir­i­cal but fully func­tional demon­stra­tion of what AI chat as­sis­tants could look like if they were mon­e­tized through ad­ver­tis­ing — sim­i­lar to how free apps, web­sites, and stream­ing ser­vices fund them­selves to­day. As AI chat be­comes main­stream, com­pa­nies face a fun­da­men­tal ques­tion: how do you make it free for users while cov­er­ing the sig­nif­i­cant com­pute costs? Advertising is one ob­vi­ous an­swer — and this demo shows every ma­jor ad pat­tern that could be ap­plied to a chat in­ter­face.We built this as an ed­u­ca­tional tool to help mar­keters, prod­uct man­agers, and de­vel­op­ers un­der­stand the land­scape of AI mon­e­ti­za­tion, and to give users a glimpse of the fu­ture they might want to avoid (or em­brace, de­pend­ing on your per­spec­tive).

This demo cov­ers the full spec­trum of ad­ver­tis­ing pat­terns that could ap­pear in an AI chat prod­uct.

This tool is ed­u­ca­tional and use­ful for a wide range of pro­fes­sion­als think­ing about the fu­ture of AI prod­ucts.

Are the ads in this demo real?No — all brands and ads are com­pletely fic­tional and cre­ated for this demo. BrainBoost Pro, QuickLearn Academy, ZenFocus, TaskMaster AI, ReadyMeal, and all other brands are made up. No ac­tual ad­ver­tis­ing rev­enue is be­ing gen­er­ated. Does this show what AI chat will ac­tu­ally look like?It shows one pos­si­ble fu­ture. Some ad-sup­ported AI prod­ucts al­ready ex­ist and use sev­eral of these pat­terns. Others are spec­u­la­tive. The goal is to make these pos­si­bil­i­ties con­crete and tan­gi­ble so peo­ple can have in­formed con­ver­sa­tions about what kind of AI fu­ture they want.Is the AI ac­tu­ally work­ing or is every­thing scripted?The AI is real — your mes­sages are processed by a live lan­guage model and you get gen­uine re­sponses. The ads are the scripted part. Some AI re­sponses will in­clude spon­sored prod­uct men­tions as part of the demon­stra­tion.What hap­pens to my chat data?Like all our free tools, con­ver­sa­tions are logged to im­prove the ser­vice. We do not sell this data to ad­ver­tis­ers — this is a demo, not an ac­tual ad net­work.How does the freemium gate work?Af­ter 5 free mes­sages, you can ei­ther watch an ad’ (a sim­u­lated 5-second count­down) to un­lock 5 more mes­sages, or you can up­grade to our ac­tual ad-free ser­vice. This mir­rors how real freemium prod­ucts work.

All of our tools are gen­uinely free — no ads, no pay­walls, no spon­sored re­sponses. Just AI that works.

Build Your Own AI Chatbot — No Ads RequiredNow that you’ve seen what ad-sup­ported AI looks like, imag­ine giv­ing your cus­tomers a clean, fo­cused AI ex­pe­ri­ence with zero in­ter­rup­tions. With 99helpers, you can de­ploy an AI chat­bot trained on your con­tent in min­utes. No credit card re­quired • Setup in min­utes • No ads, ever

...

Read the original on 99helpers.com »

4 458 shares, 60 trendiness

how to talk to anyone – and why you should

It started with two in­ci­dents on the same day. In a fairly empty train car­riage, a stranger in her 70s ap­proached me: Do you mind if I sit here? Or did you want to be alone with your thoughts?” I weighed it up for a split sec­ond, con­scious that I was, in ef­fect, agree­ing to a con­ver­sa­tion: No, of course I don’t mind. Sit down.”

She turned out to be an agree­able, kind woman who had had a dif­fi­cult day. I did­n’t have to say much: I’m sorry to hear that.” That’s tough for you.” She oc­ca­sion­ally asked me ques­tions about my­self, which I dodged po­litely. I could tell she was only ask­ing so the con­ver­sa­tion would not be so one-sided. Some mo­ments are for lis­ten­ing, not shar­ing. I sensed, with­out need­ing to know ex­plic­itly, that she was prob­a­bly re­turn­ing to an empty house and wanted to process the day out loud. I did­n’t feel un­com­fort­able, as I knew I could duck out at any mo­ment by say­ing I needed to get back to my phone mes­sages. But in­stead we talked — or, rather, I lis­tened — for most of the 50-minute jour­ney. I reg­is­tered that it was an un­usual oc­cur­rence, this con­nec­tion, but thought lit­tle more of it. A small part of me was glad this kind of thing still hap­pens.

That evening, I ate at a restau­rant with my fam­ily. As the wait­ress brought the bill, we chat­ted and I learned that she was from Seoul. She was shy and softly spo­ken. We talked gen­tly about Korean food and what she missed about home. Once again, I thought lit­tle of this ex­change.

As we walked home, my 15-year-old son asked: Is it OK to talk to peo­ple in that way?” What way?” He was ask­ing about the bound­aries when it comes to talk­ing to some­one about their home coun­try.

This was a very good ques­tion. How do you know, gen­er­ally, what the terms are of a con­ver­sa­tion with a stranger? I re­alised that there is a sort of un­writ­ten code you learn as you get older, which en­ables you to as­sess whether a con­ver­sa­tion is a good idea or not. I thought about the woman who had ap­proached me ear­lier. How did she know it was OK to talk to me? In the end, I replied to my son: You don’t al­ways know if it’s OK. Sometimes you have to take the risk and find out.”

Then it struck me. A lot of peo­ple have given up tak­ing a chance on other peo­ple: that they might want to lis­ten, that they might want to talk. But they have also given up tak­ing a chance on them­selves: that they might be able to nav­i­gate a con­ver­sa­tion with some­one new, cope with knock­backs and steer a path through any mis­un­der­stand­ings.

The dis­ap­pear­ance of these kinds of in­ter­ac­tions from day-to-day life — in pubs, restau­rants, shops, queues, on pub­lic trans­port — is strik­ing. I have been talk­ing to peo­ple tan­gen­tially about this for the past 10 years, ever since I started re­search­ing my book, How to Own the Room, which came out in 2018 and went on to be­come a pod­cast. This pro­ject was sup­posed to be about pub­lic speak­ing and con­fi­dence. But I re­alised from peo­ple’s re­ac­tions to the topic — es­pe­cially younger peo­ple — that their deep­est anx­i­ety lies else­where, in some­thing much more ba­nal and in­ex­press­ible. Forget public speak­ing”. What a lot of peo­ple don’t like at all any more is speaking to any­one in pub­lic”.

Many rea­sons are cited: state-of-the-art don’t-talk-to-me head­phones, mo­bile phones and so­cial me­dia gen­er­ally, the rise of work­ing from home, the in­tro­duc­tion of touch­screens in take­away restau­rants so you barely in­ter­act with a hu­man, the death of third spaces, the pan­demic. In the end, the biggest ex­cuse be­comes social norm re­in­force­ment”. This is the idea that if no one talks to you, you don’t talk to any­one ei­ther. A ca­sual con­ver­sa­tion in a wait­ing room where no one else is hav­ing a ca­sual con­ver­sa­tion sud­denly sounds not very ca­sual at all.

On an in­di­vid­ual level, some peo­ple per­fectly un­der­stand­ably cite neu­ro­di­ver­gence, in­tro­ver­sion, in­abil­ity to tol­er­ate eye con­tact or an in­tense loathing for small talk (especially about the weather) as rea­sons to avoid these con­ver­sa­tions. It’s cer­tainly true that this time six years ago — at the height of lock­down — it would have been rude and un­safe to start a chat, let alone sit next to some­one on a train. But now? It can feel as if every­one is still ad­her­ing to the 2-metre rule, em­ploy­ing the tech shield” or even phantom phone use” (pretending that you need to be on your phone when you don’t).

This goes deeper than ado­les­cent angst or per­sonal pref­er­ence. And pos­si­bly deeper than our over­re­liance on phones. We are los­ing a ba­sic hu­man skill. The abil­ity to speak to oth­ers and un­der­stand them is be­ing com­pro­mised.

Dr Jared Cooney Horvath, a teacher turned cog­ni­tive neu­ro­sci­en­tist who fo­cuses on speech, has warned that gen Z is the first gen­er­a­tion in his­tory to un­der­per­form the pre­vi­ous gen­er­a­tion on cog­ni­tive mea­sures. And Dr Rangan Chatterjee, a best­selling au­thor and fa­ther of two teenagers, said in an in­ter­view this month: I think we’re rais­ing a gen­er­a­tion of chil­dren who have low self-worth, who don’t know how to con­duct con­ver­sa­tions.”

It’s not only af­fect­ing young peo­ple. The psy­chol­o­gist Esther Perel calls it a global re­la­tional re­ces­sion”. She writes: The point is not depth. The point is prac­tice, the gen­tle strength­en­ing of our so­cial mus­cles.” On her YouTube chan­nel she re­cently in­tro­duced the topic of Talking to Strangers in 2026.

Something that used to come nat­u­rally is now a sub­ject of long­ing and fas­ci­na­tion, as if it were a rare an­thro­po­log­i­cal phe­nom­e­non. Videos are spring­ing up on so­cial me­dia, cat­a­logu­ing en­coun­ters with the un­known other”: earnest, well-mean­ing, whole­some videos, un­der the cat­e­gories social anx­i­ety”, extrovert” and talking to strangers”. Many have the un­stated theme of out and about in the big city”. Some are per­sonal ex­per­i­ments, of­ten ex­tremely ill-ad­vised ones. Can you chal­lenge your­self to tell a joke to an en­tire train car­riage? What hap­pens if you go up to an older woman and tell her she looks beau­ti­ful? The (usually young) per­son do­ing the film­ing is of­ten try­ing to im­prove them­self in some way or at­tempt­ing to be braver” or less so­cially anx­ious”. The cam­era acts as their ac­count­abil­ity part­ner. The peo­ple they’re talk­ing to are rel­e­gated to the role of task to be ticked off the list”. Either that or there’s a push to­wards a Hallmark card ef­fect: Look, other peo­ple are not as hor­ri­ble as you thought.” (Cue swell of trend­ing mo­ti­va­tional au­dio.)

The trou­ble with these so­cial me­dia ex­per­i­ments, of course, is that they are per­for­ma­tive and in­di­vid­u­al­is­tic. There’s an el­e­ment of com­mod­i­fi­ca­tion: the en­counter must be ripe for dig­i­tal pack­ag­ing. Often it’s not clear if the film­ing is con­sen­sual. The con­nec­tions are one-way and bor­der on the ex­ploita­tive or ma­nip­u­la­tive. They are de­signed for in­di­vid­ual per­sonal growth or free, self-di­rected ther­apy (“this made me more con­fi­dent”) and for clicks and voyeurism (“check out this per­son’s re­ac­tion”). The ef­fect is to make talking to ab­solutely any­one” seem even more alien­at­ing, fake and nar­cis­sis­tic. This has spawned a sec­ondary genre of par­ody videos such as the co­me­dian Al Nash’s A cup of tea with a stranger — an amaz­ing con­ver­sa­tion!” In this clip, an ir­ri­tat­ing in­ter­viewer passes tea to a stranger on a park bench un­der the guise of helping you with your lone­li­ness”, only for the en­counter to turn awk­ward when the stranger ac­ci­den­tally drops the cup and smashes it.

It’s only nat­ural to fear re­jec­tion, hu­mil­i­a­tion, giv­ing of­fence or over­step­ping a bound­ary when we ini­ti­ate a con­ver­sa­tion — or even when we re­spond to some­one else’s at­tempt. But ac­cord­ing to a study by the University of Virginia (Talking with strangers is sur­pris­ingly in­for­ma­tive), we over­state these fears in our minds: People tend to un­der­es­ti­mate how much they’ll en­joy the con­ver­sa­tion, feel con­nected to their con­ver­sa­tion part­ner and be liked by their con­ver­sa­tion part­ner.”

The key is to lower the stakes. Make it less of a big deal. Don’t fo­cus on what could go wrong. Also, don’t fo­cus on how amaz­ing this could be. You are just say­ing, It’s cold to­day, is­n’t it?” You are not ask­ing some­one to join you on a quest for world peace. Similarly, if an ap­proach is made to­wards you and you don’t want to re­spond, just be con­fi­dent and clear ei­ther with your ges­tures (look down, don’t make eye con­tact) or with speech: I can’t talk right now.”

In her work on kind­ness, the University of Sussex psy­chol­o­gist Gillian Sandstrom calls these con­ver­sa­tional gam­bits small, hu­man­is­ing acts”. It’s im­por­tant to em­pha­sise the small” as­pect. Sometimes I think peo­ple are over­whelmed by the bigness” in their mind of the fear of in­ter­ac­tion, and how dis­pro­por­tion­ate that seems next to the smallness” of the pa­thetic re­al­ity. Don’t read too much into pass­ing mo­ments. Trust your­self to read so­cial cues and work out how you stand in re­la­tion to them. Know your­self and your own per­son­al­ity. Not every­one wants to talk and not every­one wants to be talked to. And that’s OK. It can de­pend on the day and on your mood. Give your­self get-out-of-jail-free cards in these con­ver­sa­tions. If some­one does­n’t re­spond, as­sume they did­n’t hear you or they’re hav­ing a bad day. If some­one talks to you and you feel un­com­fort­able or you’re hav­ing a bad day, it is not your job to be kind or nice. If their at­tempt was well meant, they’ll get over it. We don’t need to avoid each other. But we also don’t have to be on nice­ness au­topi­lot all the time.

In any case, our worst fears about these in­ter­ac­tions are rarely re­alised. Last year, the team of Stanford psy­chol­o­gist Prof Jamil Zaki, the au­thor of Hope for Cynics: The Surprising Science of Human Goodness, put up posters around cam­pus with mes­sages about ap­proach­a­bil­ity and warmth. They found that what stu­dents most needed was per­mis­sion — the re­minder to take a chance”. They con­cluded: Too of­ten, we’re sure that con­ver­sa­tion and con­nec­tion will ex­haust us, or that we can’t count on oth­ers.” In our minds, we paint peo­ple (and our­selves) as pro­foundly dis­ap­point­ing. They — and we — are rarely that bad. And even if they are, it will make a good story to tell later to the peo­ple who are not strangers to you.

Is it go­ing to change your life if you talk to some­one in a shop about the prospect of rain? Probably not. But in light of the cur­rent state of the world, even the slight­est pos­si­bil­ity of bright­en­ing some­one’s day is valu­able. It’s cer­tainly worth the punt. Perhaps the way they re­spond mat­ters less than the fact that you re­tained your hu­man­ity enough to try some­thing, to risk, to con­nect.

Small talk may not pro­foundly al­ter your life. But its ab­sence will pro­foundly al­ter hu­man life as we know it. We live in a world of in­tense and of­ten un­nec­es­sary di­vi­sion. Small talk is a tiny, free and very pos­si­bly price­less re­minder of our shared hu­man­ity. If we in­ten­tion­ally give up talk­ing to strangers, if we pur­posely de­cide to give in to the phone shield, the con­se­quences will be hor­ri­ble. Arguably, we are al­ready on the verge of do­ing this. Let’s back up and start a con­ver­sa­tion be­fore it’s too late.

...

Read the original on www.theguardian.com »

5 414 shares, 22 trendiness

Decision Trees

Let’s pre­tend we’re farm­ers with a new plot of land. Given only the Diameter and Height of a tree trunk, we must de­ter­mine if it’s an Apple, Cherry, or Oak tree. To do this, we’ll use a Decision Tree. Almost every tree with a Diameter ≥ 0.45 is an Oak tree! Thus, we can prob­a­bly as­sume that any other trees we find in that re­gion will also be one.

This first de­ci­sion node will act as our root node. We’ll draw a ver­ti­cal line at this Diameter and clas­sify every­thing above it as Oak (our first leaf node), and con­tinue to par­ti­tion our re­main­ing data on the left. We con­tinue along, hop­ing to split our plot of land in the most fa­vor­able man­ner. We see that cre­at­ing a new de­ci­sion node at Height ≤ 4.88 leads to a nice sec­tion of Cherry trees, so we par­ti­tion our data there.

Our Decision Tree up­dates ac­cord­ingly, adding a new leaf node for Cherry. And Some More After this sec­ond split we’re left with an area con­tain­ing many Apple and some Cherry trees. No prob­lem: a ver­ti­cal di­vi­sion can be drawn to sep­a­rate the Apple trees a bit bet­ter.

Once again, our Decision Tree up­dates ac­cord­ingly. And Yet Some More The re­main­ing re­gion just needs a fur­ther hor­i­zon­tal di­vi­sion and boom - our job is done! We’ve ob­tained an op­ti­mal set of nested de­ci­sions.

That said, some re­gions still en­close a few mis­clas­si­fied points. Should we con­tinue split­ting, par­ti­tion­ing into smaller sec­tions?

Hmm… If we do, the re­sult­ing re­gions would start be­com­ing in­creas­ingly com­plex, and our tree would be­come un­rea­son­ably deep. Such a Decision Tree would learn too much from the noise of the train­ing ex­am­ples and not enough gen­er­al­iz­able rules.

Does this ring fa­mil­iar? It is the well known trade­off that we have ex­plored in our ex­plainer on The Bias Variance Tradeoff! In this case, go­ing too deep re­sults in a tree that over­fits our data, so we’ll stop here.

We’re done! We can sim­ply pass any new data point’s Height and Diameter val­ues through the newly cre­ated Decision Tree to clas­sify them as ei­ther an Apple, Cherry, or Oak tree!

Decision Trees are su­per­vised ma­chine learn­ing al­go­rithms used for both re­gres­sion and clas­si­fi­ca­tion prob­lems. They’re pop­u­lar for their ease of in­ter­pre­ta­tion and large range of ap­pli­ca­tions. Decision Trees con­sist of a se­ries of de­ci­sion nodes on some dataset’s fea­tures, and make pre­dic­tions at leaf nodes.

Scroll on to learn more! Decision Trees are widely used al­go­rithms for su­per­vised ma­chine learn­ing. They’re pop­u­lar for their ease of in­ter­pre­ta­tion and large range of ap­pli­ca­tions. They work for both re­gres­sion and clas­si­fi­ca­tion prob­lems. A Decision Tree con­sists of a se­ries of se­quen­tial de­ci­sions, or de­ci­sion nodes, on some data set’s fea­tures. The re­sult­ing flow-like struc­ture is nav­i­gated via con­di­tional con­trol state­ments, or if-then rules, which split each de­ci­sion node into two or more subn­odes. Leaf nodes, also known as ter­mi­nal nodes, rep­re­sent pre­dic­tion out­puts for the model. To train a Decision Tree from data means to fig­ure out the or­der in which the de­ci­sions should be as­sem­bled from the root to the leaves. New data may then be passed from the top down un­til reach­ing a leaf node, rep­re­sent­ing a pre­dic­tion for that data point.

We just saw how a Decision Tree op­er­ates at a high-level: from the top down, it cre­ates a se­ries of se­quen­tial rules that split the data into well-sep­a­rated re­gions for clas­si­fi­ca­tion. But given the large num­ber of po­ten­tial op­tions, how ex­actly does the al­go­rithm de­ter­mine where to par­ti­tion the data? Before we learn how that works, we need to un­der­stand Entropy.

Entropy mea­sures the amount of in­for­ma­tion of some vari­able or event. We’ll make use of it to iden­tify re­gions con­sist­ing of a large num­ber of sim­i­lar (pure) or dis­sim­i­lar (impure) el­e­ments. Given a cer­tain set of events that oc­cur with prob­a­bil­i­ties , the to­tal en­tropy can be writ­ten as the neg­a­tive sum of weighted prob­a­bil­i­ties: The quan­tity has a num­ber of in­ter­est­ing prop­er­ties: only if all but one of the are zero, this one hav­ing the value of 1. Thus the en­tropy van­ishes only when there is no un­cer­tainty in the out­come, mean­ing that the sam­ple is com­pletely un­sur­pris­ing. is max­i­mum when all the are equal. This is the most un­cer­tain, or impure’, sit­u­a­tion. Any change to­wards the equal­iza­tion of the prob­a­bil­i­ties in­creases . The en­tropy can be used to quan­tify the im­pu­rity of a col­lec­tion of la­beled data points: a node con­tain­ing mul­ti­ple classes is im­pure whereas a node in­clud­ing only one class is pure. Above, you can com­pute the en­tropy of a col­lec­tion of la­beled data points be­long­ing to two classes, which is typ­i­cal for bi­nary clas­si­fi­ca­tion prob­lems. Click on the Add and Remove but­tons to mod­ify the com­po­si­tion of the bub­ble. Did you no­tice that pure sam­ples have zero en­tropy whereas im­pure ones have larger en­tropy val­ues? This is what en­tropy is do­ing for us: mea­sur­ing how pure (or im­pure) a set of sam­ples is. We’ll use it in the al­go­rithm to train Decision Trees by defin­ing the Information Gain.

With the in­tu­ition gained with the above an­i­ma­tion, we can now de­scribe the logic to train Decision Trees. As the name im­plies, in­for­ma­tion gain mea­sures an amount the in­for­ma­tion that we gain. It does so us­ing en­tropy. The idea is to sub­tract from the en­tropy of our data be­fore the split the en­tropy of each pos­si­ble par­ti­tion there­after. We then se­lect the split that yields the largest re­duc­tion in en­tropy, or equiv­a­lently, the largest in­crease in in­for­ma­tion.

The core al­go­rithm to cal­cu­late in­for­ma­tion gain is called ID3. It’s a re­cur­sive pro­ce­dure that starts from the root node of the tree and it­er­ates top-down on all non-leaf branches in a greedy man­ner, cal­cu­lat­ing at each depth the dif­fer­ence in en­tropy:

To be spe­cific, the al­go­rith­m’s steps are as fol­lows: Calculate the en­tropy as­so­ci­ated to every fea­ture of the data set. Partition the data set into sub­sets us­ing dif­fer­ent fea­tures and cut­off val­ues. For each, com­pute the in­for­ma­tion gain as the dif­fer­ence in en­tropy be­fore and af­ter the split us­ing the for­mula above. For the to­tal en­tropy of all chil­dren nodes af­ter the split, use the weighted av­er­age tak­ing into ac­count , i.e. how many of the sam­ples end up on each child branch. Identify the par­ti­tion that leads to the max­i­mum in­for­ma­tion gain. Create a de­ci­sion node on that fea­ture and split value. When no fur­ther splits can be done on a sub­set, cre­ate a leaf node and la­bel it with the most com­mon class of the data points within it if do­ing clas­si­fi­ca­tion or with the av­er­age value if do­ing re­gres­sion. Recurse on all sub­sets. Recursion stops if af­ter a split all el­e­ments in a child node are of the same type. Additional stop­ping con­di­tions may be im­posed, such as re­quir­ing a min­i­mum num­ber of sam­ples per leaf to con­tinue split­ting, or fin­ish­ing when the trained tree has reached a given max­i­mum depth. Of course, read­ing the steps of an al­go­rithm is­n’t al­ways the most in­tu­itive thing. To make things eas­ier to un­der­stand, let’s re­visit how in­for­ma­tion gain was used to de­ter­mine the first de­ci­sion node in our tree. Recall our first de­ci­sion node split on Diameter ≤ 0.45. How did we choose this con­di­tion? It was the re­sult of max­i­miz­ing in­for­ma­tion gain.

Each of the pos­si­ble splits of the data on its two fea­tures (Diameter and Height) and cut­off val­ues yields a dif­fer­ent value of the in­for­ma­tion gain.

The line chart dis­plays the dif­fer­ent split val­ues for the Diameter fea­ture. Move the de­ci­sion bound­ary your­self to see how the data points in the top chart are as­signed to the left or right chil­dren nodes ac­cord­ingly. On the bot­tom you can see the cor­re­spond­ing en­tropy val­ues of both chil­dren nodes as well as the to­tal in­for­ma­tion gain.

The ID3 al­go­rithm will se­lect the split point with the largest in­for­ma­tion gain, shown as the peak of the black line in the bot­tom chart of 0.574 at Diameter = 0.45. Recall our first de­ci­sion node split on Diameter ≤ 0.45. How did we choose this con­di­tion? It was the re­sult of max­i­miz­ing in­for­ma­tion gain.

Each of the pos­si­ble splits of the data on its two fea­tures (Diameter and Height) and cut­off val­ues yields a dif­fer­ent value of the in­for­ma­tion gain.

The vi­su­al­iza­tion on the right al­lows to try dif­fer­ent split val­ues for the Diameter fea­ture. Move the de­ci­sion bound­ary your­self to see how the data points in the top chart are as­signed to the left or right chil­dren nodes ac­cord­ingly. On the bot­tom you can see the cor­re­spond­ing en­tropy val­ues of both chil­dren nodes as well as the to­tal in­for­ma­tion gain.

The ID3 al­go­rithm will se­lect the split point with the largest in­for­ma­tion gain, shown as the peak of the black line in the bot­tom chart of 0.574 at Diameter = 0.45. An al­ter­na­tive to the en­tropy for the con­struc­tion of Decision Trees is the Gini im­pu­rity. This quan­tity is also a mea­sure of in­for­ma­tion and can be seen as a vari­a­tion of Shannon’s en­tropy. Decision trees trained us­ing en­tropy or Gini im­pu­rity are com­pa­ra­ble, and only in a few cases do re­sults dif­fer con­sid­er­ably. In the case of im­bal­anced data sets, en­tropy might be more pru­dent. Yet Gini might train faster as it does not make use of log­a­rithms.

Another Look At Our Decision Tree Let’s re­cap what we’ve learned so far. First, we saw how a Decision Tree clas­si­fies data by re­peat­edly par­ti­tion­ing the fea­ture space into re­gions ac­cord­ing to some con­di­tional se­ries of rules. Second, we learned about en­tropy, a pop­u­lar met­ric used to mea­sure the pu­rity (or lack thereof) of a given sam­ple of data. Third, we learned how Decision Trees use en­tropy in in­for­ma­tion gain and the ID3 al­go­rithm to de­ter­mine the ex­act con­di­tional se­ries of rules to se­lect. Taken to­gether, the three sec­tions de­tail the typ­i­cal Decision Tree al­go­rithm.

To re­in­force con­cepts, let’s look at our Decision Tree from a slightly dif­fer­ent per­spec­tive.

The tree be­low maps ex­actly to the tree we showed in How to Build a Decision Tree sec­tion above. However, in­stead of show­ing the par­ti­tioned fea­ture space along­side our trees struc­ture, let’s look at the par­ti­tioned data points and their cor­re­spond­ing en­tropy at each node it­self:

From the top down, our sam­ple of data points to clas­sify shrinks as it gets par­ti­tioned to dif­fer­ent de­ci­sion and leaf nodes. In this man­ner, we could trace the full path taken by a train­ing data point if we so de­sired. Note also that not every leaf node is pure: as dis­cussed pre­vi­ously (and in the next sec­tion), we don’t want the struc­ture of our Decision Trees to be too deep, as such a model likely won’t gen­er­al­ize well to un­seen data.

Without ques­tion, Decision Trees have a lot of things go­ing for them. They’re sim­ple mod­els that are easy to in­ter­pret. They’re fast to train and re­quire min­i­mal data pre­pro­cess­ing. And they hand out­liers with ease. Yet they suf­fer from a ma­jor lim­i­ta­tion, and that is their in­sta­bil­ity com­pared with other pre­dic­tors. They can be ex­tremely sen­si­tive to small per­tur­ba­tions in the data: a mi­nor change in the train­ing ex­am­ples can re­sult in a dras­tic change in the struc­ture of the Decision Tree. Check for your­self how small ran­dom Gaussian per­tur­ba­tions on just 5% of the train­ing ex­am­ples cre­ate a set of com­pletely dif­fer­ent Decision Trees:

Why Is This A Problem? In their vanilla form, Decision Trees are un­sta­ble.

If left unchecked, the ID3 al­go­rithm to train Decision Trees will work end­lessly to min­i­mize en­tropy. It will con­tinue split­ting the data un­til all leaf nodes are com­pletely pure - that is, con­sist­ing of only one class. Such a process may yield very deep and com­plex Decision Trees. In ad­di­tion, we just saw that Decision Trees are sub­ject to high vari­ance when ex­posed to small per­tur­ba­tions of the train­ing data.

Both is­sues are un­de­sir­able, as they lead to pre­dic­tors that fail to clearly dis­tin­guish be­tween per­sis­tent and ran­dom pat­terns in the data, a prob­lem known as over­fit­ting. This is prob­lem­atic be­cause it means that our model won’t per­form well when ex­posed to new data. There are ways to pre­vent ex­ces­sive growth of Decision Trees by prun­ing them, for in­stance con­strain­ing their max­i­mum depth, lim­it­ing the num­ber of leaves that can be cre­ated, or set­ting a min­i­mum size for the amount of items in each leaf and not al­low­ing leaves with too few items in them.

As for the is­sue of high vari­ance? Well, un­for­tu­nately it’s an in­trin­sic char­ac­ter­is­tic when train­ing a sin­gle Decision Tree.

...

Read the original on mlu-explain.github.io »

6 380 shares, 22 trendiness

AI Made Writing Code Easier. It Made Engineering Harder.

Yes, writ­ing code is eas­ier than ever.

AI as­sis­tants au­to­com­plete your func­tions. Agents scaf­fold en­tire fea­tures. You can de­scribe what you want in plain English and watch work­ing code ap­pear in sec­onds. The bar­rier to pro­duc­ing code has never been lower.

And yet, the day-to-day life of soft­ware en­gi­neers has got­ten more com­plex, more de­mand­ing, and more ex­haust­ing than it was two years ago.

This is not a con­tra­dic­tion. It is the re­al­ity of what hap­pens when an in­dus­try adopts a pow­er­ful new tool with­out paus­ing to con­sider the sec­ond-or­der ef­fects on the peo­ple us­ing it.

If you are a soft­ware en­gi­neer read­ing this and feel­ing like your job qui­etly be­came harder while every­one around you cel­e­brates how easy every­thing is now, you are not imag­in­ing things. The job changed. The ex­pec­ta­tions changed. And no­body sent a memo.

There is a phe­nom­e­non hap­pen­ing right now that most en­gi­neers feel but strug­gle to ar­tic­u­late. The ex­pected out­put of a soft­ware en­gi­neer in 2026 is dra­mat­i­cally higher than it was in 2023. Not be­cause any­one held a meet­ing and an­nounced new tar­gets. Not be­cause your man­ager sat you down and ex­plained the new rules. The base­line just moved.

It moved be­cause AI tools made cer­tain tasks faster. And when tasks be­come faster, the as­sump­tion fol­lows im­me­di­ately: you should be do­ing more. Not in the fu­ture. Now.

A February 2026 study pub­lished in Harvard Business Review tracked 200 em­ploy­ees at a U. S. tech com­pany over eight months. The re­searchers found some­thing that will sound fa­mil­iar to any­one liv­ing through this shift. Workers did not use AI to fin­ish ear­lier and go home. They used it to do more. They took on broader tasks, worked at a faster pace, and ex­tended their hours, of­ten with­out any­one ask­ing them to. The re­searchers de­scribed a self-re­in­forc­ing cy­cle: AI ac­cel­er­ated cer­tain tasks, which raised ex­pec­ta­tions for speed. Higher speed made work­ers more re­liant on AI. Increased re­liance widened the scope of what work­ers at­tempted. And a wider scope fur­ther ex­panded the quan­tity and den­sity of work.

The num­bers tell the rest of the story. Eighty-three per­cent of work­ers in the study said AI in­creased their work­load. Burnout was re­ported by 62 per­cent of as­so­ci­ates and 61 per­cent of en­try-level work­ers. Among C-suite lead­ers? Just 38 per­cent. The peo­ple do­ing the ac­tual work are car­ry­ing the in­ten­sity. The peo­ple set­ting the ex­pec­ta­tions are not feel­ing it the same way.

This gap mat­ters enor­mously. If lead­er­ship be­lieves AI is mak­ing every­thing eas­ier while en­gi­neers are drown­ing in a new kind of com­plex­ity, the re­sult is a slow ero­sion of trust, morale, and even­tu­ally tal­ent.

A sep­a­rate sur­vey of over 600 en­gi­neer­ing pro­fes­sion­als found that nearly two-thirds of en­gi­neers ex­pe­ri­ence burnout de­spite their or­ga­ni­za­tions us­ing AI in de­vel­op­ment. Forty-three per­cent said lead­er­ship was out of touch with team chal­lenges. Over a third re­ported that pro­duc­tiv­ity had ac­tu­ally de­creased over the past year, even as their com­pa­nies in­vested more in AI tool­ing.

The base­line moved. The ex­pec­ta­tions rose. And for many en­gi­neers, no one ac­knowl­edged that the job they signed up for had fun­da­men­tally changed.

Here is some­thing that gets lost in all the ex­cite­ment about AI pro­duc­tiv­ity: most soft­ware en­gi­neers be­came en­gi­neers be­cause they love writ­ing code.

Not man­ag­ing code. Not re­view­ing code. Not su­per­vis­ing sys­tems that pro­duce code. Writing it. The act of think­ing through a prob­lem, de­sign­ing a so­lu­tion, and ex­press­ing it pre­cisely in a lan­guage that makes a ma­chine do ex­actly what you in­tended. That is what drew most of us to this pro­fes­sion. It is a cre­ative act, a form of crafts­man­ship, and for many en­gi­neers, the most sat­is­fy­ing part of their day.

Now they are be­ing told to stop.

Not ex­plic­itly, of course. Nobody walks into a standup and says stop writ­ing code.” But the mes­sage is there, sub­tle and per­sis­tent. Use AI to write it faster. Let the agent han­dle the im­ple­men­ta­tion. Focus on higher-level tasks. Your value is not in the code you write any­more, it is in how well you di­rect the sys­tems that write it for you.

For early adopters, this feels ex­cit­ing. It feels like evo­lu­tion. For a sig­nif­i­cant por­tion of work­ing en­gi­neers, it feels like be­ing told that the thing they spent years mas­ter­ing, the skill that de­fines their pro­fes­sional iden­tity, is sud­denly less im­por­tant.

One en­gi­neer cap­tured this shift per­fectly in a widely shared es­say, de­scrib­ing how AI trans­formed the en­gi­neer­ing role from builder to re­viewer. Every day felt like be­ing a judge on an as­sem­bly line that never stops. You just keep stamp­ing those pull re­quests. The pro­duc­tion vol­ume went up. The sense of crafts­man­ship went down.

This is not a mi­nor ad­just­ment. It is a fun­da­men­tal shift in pro­fes­sional iden­tity. Engineers who built their ca­reers around deep tech­ni­cal skill are be­ing asked to re­de­fine what they do and who they are, es­sen­tially overnight, with­out any tran­si­tion pe­riod, train­ing, or ac­knowl­edg­ment that some­thing sig­nif­i­cant was lost in the process.

Having led en­gi­neer­ing teams for over two decades, I have seen tech­nol­ogy shifts be­fore. New frame­works, new lan­guages, new method­olo­gies. Engineers adapt. They al­ways have. But this is dif­fer­ent be­cause it is not ask­ing en­gi­neers to learn a new way of do­ing what they do. It is ask­ing them to stop do­ing the thing that made them en­gi­neers in the first place and be­come some­thing else en­tirely.

That is not an up­grade. That is a ca­reer iden­tity cri­sis. And pre­tend­ing it is not hap­pen­ing does not make it go away.

While en­gi­neers are be­ing asked to write less code, they are si­mul­ta­ne­ously be­ing asked to do more of every­thing else.

More prod­uct think­ing. More ar­chi­tec­tural de­ci­sion-mak­ing. More code re­view. More con­text switch­ing. More plan­ning. More test­ing over­sight. More de­ploy­ment aware­ness. More risk as­sess­ment.

The scope of what it means to be a software en­gi­neer” ex­panded dra­mat­i­cally in the last two years, and it hap­pened with­out a pause to catch up.

This is partly a di­rect con­se­quence of AI ac­cel­er­a­tion. When code gets pro­duced faster, the bot­tle­neck shifts. It moves from im­ple­men­ta­tion to every­thing sur­round­ing im­ple­men­ta­tion: re­quire­ments clar­ity, ar­chi­tec­ture de­ci­sions, in­te­gra­tion test­ing, de­ploy­ment strat­egy, mon­i­tor­ing, and main­te­nance. These were al­ways part of the en­gi­neer­ing life­cy­cle, but they were dis­trib­uted across roles. Product man­agers han­dled re­quire­ments. QA han­dled test­ing. DevOps han­dled de­ploy­ment. Senior ar­chi­tects han­dled sys­tem de­sign.

Now, with AI col­laps­ing the im­ple­men­ta­tion phase, or­ga­ni­za­tions are qui­etly re­dis­trib­ut­ing those re­spon­si­bil­i­ties to the en­gi­neers them­selves. The Harvard Business Review study doc­u­mented this ex­act pat­tern. Product man­agers be­gan writ­ing code. Engineers took on prod­uct work. Researchers started do­ing en­gi­neer­ing tasks. Roles that once had clear bound­aries blurred as work­ers used AI to han­dle jobs that pre­vi­ously sat out­side their re­mit.

The in­dus­try is openly talk­ing about this as a pos­i­tive de­vel­op­ment. Engineers should be T-shaped” or full-stack” in a broader sense. Nearly 45 per­cent of en­gi­neer­ing roles now ex­pect pro­fi­ciency across mul­ti­ple do­mains. AI tools aug­ment gen­er­al­ists more ef­fec­tively, mak­ing it eas­ier for one per­son to han­dle mul­ti­ple com­po­nents of a sys­tem.

On pa­per, this sounds em­pow­er­ing. In prac­tice, it means that a mid-level back­end en­gi­neer is now ex­pected to un­der­stand prod­uct strat­egy, re­view AI-generated fron­tend code they did not write, think about de­ploy­ment in­fra­struc­ture, con­sider se­cu­rity im­pli­ca­tions of code they can­not fully trace, and main­tain a big-pic­ture ar­chi­tec­tural aware­ness that used to be some­one else’s job.

That is not em­pow­er­ment. That is scope creep with­out a cor­re­spond­ing in­crease in com­pen­sa­tion, au­thor­ity, or time.

From my ex­pe­ri­ence build­ing and scal­ing teams in fin­tech and high-traf­fic plat­forms, I can tell you that role ex­pan­sion with­out clear bound­aries al­ways leads to the same out­come: peo­ple try to do every­thing, noth­ing gets done with the depth it re­quires, and burnout fol­lows. The en­gi­neers who sur­vive are the ones who learn to say no, to pri­or­i­tize ruth­lessly, and to push back when the scope of their role qui­etly dou­bles with­out any­one ac­knowl­edg­ing it.

There is an irony at the cen­ter of the AI-assisted en­gi­neer­ing work­flow that no­body wants to talk about: re­view­ing AI-generated code is of­ten harder than writ­ing the code your­self.

When you write code, you carry the con­text of every de­ci­sion in your head. You know why you chose this data struc­ture, why you han­dled this edge case, why you struc­tured the mod­ule this way. The code is an ex­pres­sion of your think­ing, and re­view­ing it later is straight­for­ward be­cause the rea­son­ing is al­ready stored in your mem­ory.

When AI writes code, you in­herit the out­put with­out the rea­son­ing. You see the code, but you do not see the de­ci­sions. You do not know what trade­offs were made, what as­sump­tions were baked in, what edge cases were con­sid­ered or ig­nored. You are re­view­ing some­one else’s work, ex­cept that some­one is not a col­league you can ask ques­tions. It is a sta­tis­ti­cal model that pro­duces plau­si­ble-look­ing code with­out any un­der­stand­ing of your sys­tem’s spe­cific con­straints.

A sur­vey by Harness found that 67 per­cent of de­vel­op­ers re­ported spend­ing more time de­bug­ging AI-generated code, and 68 per­cent spent more time re­view­ing it than they did with hu­man-writ­ten code. This is not a fail­ure of the tools. It is a struc­tural prop­erty of the work­flow. Code re­view with­out shared con­text is in­her­ently more de­mand­ing than re­view­ing code you par­tic­i­pated in cre­at­ing.

Yet the ex­pec­ta­tion from man­age­ment is that AI should be mak­ing every­thing faster. So en­gi­neers find them­selves in a bind: they are pro­duc­ing more code than ever, but the qual­ity as­sur­ance bur­den has in­creased, the con­text-per-line-of-code has de­creased, and the cog­ni­tive load of main­tain­ing a sys­tem they only par­tially built is grow­ing with every sprint.

This is the su­per­vi­sion para­dox. The faster AI gen­er­ates code, the more hu­man at­ten­tion is re­quired to en­sure that code ac­tu­ally works in the con­text of a real sys­tem with real users and real busi­ness con­straints. The pro­duc­tion bot­tle­neck did not dis­ap­pear. It moved from writ­ing to un­der­stand­ing, and un­der­stand­ing is harder to speed up.

What makes all of this es­pe­cially dif­fi­cult is the self-re­in­forc­ing na­ture of the cy­cle.

AI makes cer­tain tasks faster. Faster tasks cre­ate the per­cep­tion of more avail­able ca­pac­ity. More per­ceived ca­pac­ity leads to more work be­ing as­signed. More work leads to more AI re­liance. More AI re­liance leads to more code that needs re­view, more con­text that needs to be main­tained, more sys­tems that need to be un­der­stood, and more cog­ni­tive load on en­gi­neers who are al­ready stretched thin.

The Harvard Business Review re­searchers de­scribed this as workload creep.” Workers did not con­sciously de­cide to work harder. The ex­pan­sion hap­pened nat­u­rally, al­most in­vis­i­bly. Each in­di­vid­ual step felt rea­son­able. In ag­gre­gate, it pro­duced an un­sus­tain­able pace.

Before AI, there was a nat­ural ceil­ing on how much you could pro­duce in a day. That ceil­ing was set by think­ing speed, typ­ing speed, and the time it takes to look things up. It was frus­trat­ing some­times, but it was also a gov­er­nor. A nat­ural speed limit that pre­vented you from out­run­ning your own abil­ity to main­tain qual­ity.

AI re­moved the gov­er­nor. Now the only limit is your cog­ni­tive en­durance. And most peo­ple do not know their cog­ni­tive lim­its un­til they have al­ready blown past them.

This is where many en­gi­neers find them­selves right now. Shipping more code than any quar­ter in their ca­reer. Feeling more drained than any quar­ter in their ca­reer. The two facts are not un­re­lated.

The trap is that it looks like pro­duc­tiv­ity from the out­side. Metrics go up. Velocity charts look great. More fea­tures shipped. More pull re­quests merged. But un­der­neath the num­bers, qual­ity is qui­etly erod­ing, tech­ni­cal debt is ac­cu­mu­lat­ing faster than it can be ad­dressed, and the peo­ple do­ing the work are run­ning on fumes.

If the pic­ture is dif­fi­cult for ex­pe­ri­enced en­gi­neers, it is even harder for those start­ing their ca­reers.

Junior en­gi­neers have tra­di­tion­ally learned by do­ing the sim­pler, more task-ori­ented work. Fixing small bugs. Writing straight­for­ward fea­tures. Implementing well-de­fined tick­ets. This hands-on work built the foun­da­tional un­der­stand­ing that even­tu­ally al­lowed them to take on more com­plex chal­lenges.

AI is rapidly con­sum­ing that train­ing ground. If an agent can han­dle the rou­tine API hookup, the boil­er­plate mod­ule, the straight­for­ward CRUD end­point, what is left for a ju­nior en­gi­neer to learn from? The ex­pec­ta­tion is shift­ing to­ward need­ing to con­tribute at a higher level al­most from day one, with­out the grad­ual ramp-up that pre­vi­ous gen­er­a­tions of en­gi­neers re­lied on.

Entry-level hir­ing at the 15 largest tech firms fell 25 per­cent from 2023 to 2024. The HackerRank 2025 Developer Skills Report con­firmed that ex­pec­ta­tions are ris­ing faster than pro­duc­tiv­ity gains, and that early-ca­reer hir­ing re­mains slug­gish com­pared to se­nior-level roles. Companies are pri­or­i­tiz­ing ex­pe­ri­enced tal­ent, but the pipeline that pro­duces ex­pe­ri­enced tal­ent is be­ing qui­etly dis­man­tled.

This is a prob­lem that ex­tends be­yond in­di­vid­ual ca­reer con­cerns. If ju­nior en­gi­neers do not get the op­por­tu­nity to build foun­da­tional skills through hands-on work, the in­dus­try will even­tu­ally face a short­age of se­nior en­gi­neers who truly un­der­stand the sys­tems they over­see. You can­not su­per­vise what you never learned to build.

As I have writ­ten be­fore, code is for hu­mans to read. If the next gen­er­a­tion of en­gi­neers never de­vel­ops the flu­ency to read, un­der­stand, and rea­son about code at a deep level, no amount of AI tool­ing will com­pen­sate for that gap.

If you lead en­gi­neer­ing teams, the most im­por­tant thing you can do right now is ac­knowl­edge that this tran­si­tion is gen­uinely dif­fi­cult. Not the­o­ret­i­cally. Not ab­stractly. For the ac­tual peo­ple on your team.

The ca­reer they signed up for changed fast. The skills they were hired for are be­ing repo­si­tioned. The ex­pec­ta­tions they are work­ing un­der shifted with­out a clear an­nounce­ment. Acknowledging this re­al­ity is not a sign of weak­ness. It is a pre­req­ui­site for main­tain­ing a team that trusts you.

Start with em­pa­thy, but do not stop there.

Give your team real train­ing. Not a lunch-and-learn about prompt en­gi­neer­ing. Real in­vest­ment in the skills that the new en­gi­neer­ing land­scape ac­tu­ally re­quires: sys­tem de­sign, ar­chi­tec­tural think­ing, prod­uct rea­son­ing, se­cu­rity aware­ness, and the abil­ity to crit­i­cally eval­u­ate code they did not write. These are not triv­ial skills. They take time to de­velop, and your team needs struc­tured sup­port to build them.

Give them space to ex­per­i­ment with­out the pres­sure of im­me­di­ate pro­duc­tiv­ity gains. The en­gi­neers who will thrive in this en­vi­ron­ment are the ones who have room to fig­ure out how AI fits into their work­flow with­out be­ing pe­nal­ized for the learn­ing curve. Every ex­pe­ri­enced tech­nol­o­gist I know who has suc­cess­fully in­te­grated AI tools went through an ad­just­ment pe­riod where they were less pro­duc­tive be­fore they be­came more pro­duc­tive. That ad­just­ment pe­riod is nor­mal, and it needs to be pro­tected.

Set ex­plicit bound­aries around role scope. If you are ask­ing en­gi­neers to take on prod­uct think­ing, plan­ning, and risk as­sess­ment in ad­di­tion to their tech­ni­cal work, name it. Define it. Compensate for it. Do not let it hap­pen silently and then won­der why your team is burned out.

Rethink your met­rics. If your en­gi­neer­ing suc­cess met­rics are still cen­tered on ve­loc­ity, tick­ets closed, and lines of code, you are mea­sur­ing the wrong things in an AI-assisted world. System sta­bil­ity, code qual­ity, de­ci­sion qual­ity, cus­tomer out­comes, and team health are bet­ter in­di­ca­tors of whether your en­gi­neer­ing or­ga­ni­za­tion is ac­tu­ally pro­duc­ing value or just pro­duc­ing vol­ume.

Protect the ju­nior pipeline. If you have stopped hir­ing ju­nior en­gi­neers be­cause AI can han­dle en­try-level tasks, you are solv­ing a short-term ef­fi­ciency prob­lem by cre­at­ing a long-term tal­ent cri­sis. The se­nior en­gi­neers you rely on to­day were ju­nior en­gi­neers who learned by do­ing the work that AI is now con­sum­ing. That path still mat­ters.

And fi­nally, keep chal­leng­ing your team. I have never met a good en­gi­neer who did not love a good chal­lenge. The en­gi­neers on your team are not frag­ile. They are ca­pa­ble, in­tel­li­gent peo­ple who signed up for hard prob­lems. They can han­dle this tran­si­tion. Just make sure they are set up to meet it.

If you are an en­gi­neer nav­i­gat­ing this shift, here is what I would tell you based on two decades of watch­ing tech­nol­ogy cy­cles re­shape this pro­fes­sion.

First, do not aban­don your fun­da­men­tals. The pres­sure to be­come an AI-first” en­gi­neer is real, but the en­gi­neers who will be most valu­able in five years are the ones who deeply un­der­stand the sys­tems they work on. AI is a tool. Understanding ar­chi­tec­ture, de­bug­ging com­plex sys­tems, rea­son­ing about per­for­mance and se­cu­rity: these skills are not be­com­ing less im­por­tant. They are be­com­ing more im­por­tant be­cause some­one needs to be the adult in the room when AI-generated code breaks in pro­duc­tion at 2 AM.

Second, learn to set bound­aries with the ac­cel­er­a­tion trap. Just be­cause you can pro­duce more does not mean you should. Sustainable pace mat­ters. The en­gi­neers who burn out try­ing to match the the­o­ret­i­cal max­i­mum out­put AI makes pos­si­ble are not the ones who build last­ing ca­reers. The ones who learn to work with AI de­lib­er­ately, choos­ing when to use it and when to think in­de­pen­dently, are the ones who will still be thriv­ing in this pro­fes­sion a decade from now.

Third, em­brace the parts of the ex­panded role that gen­uinely in­ter­est you. If the en­gi­neer­ing role now in­cludes more prod­uct think­ing, more ar­chi­tec­tural de­ci­sion-mak­ing, more cross-func­tional com­mu­ni­ca­tion, treat that as an op­por­tu­nity rather than an im­po­si­tion. These are skills that se­nior en­gi­neers and tech­ni­cal lead­ers need. You are be­ing given ac­cess to a broader set of ca­pa­bil­i­ties ear­lier in your ca­reer than any pre­vi­ous gen­er­a­tion of en­gi­neers. That is not a bur­den. It is a head start.

Fourth, talk about what you are ex­pe­ri­enc­ing. The iso­la­tion of feel­ing like you are the only one strug­gling with this tran­si­tion is one of the most dam­ag­ing as­pects of the cur­rent mo­ment. You are not the only one. The data con­firms it. Two-thirds of en­gi­neers re­port burnout. The ex­pec­ta­tion gap be­tween lead­er­ship and en­gi­neer­ing teams is well doc­u­mented. Talking openly about these chal­lenges, with your team, with your man­ager, with your broader net­work, is not com­plain­ing. It is pro­fes­sional hon­esty.

And fifth, re­mem­ber that this pro­fes­sion has sur­vived every pre­dic­tion of its demise. COBOL was sup­posed to elim­i­nate pro­gram­mers. Expert sys­tems were sup­posed to re­place them. Fourth-generation lan­guages, CASE tools, vi­sual pro­gram­ming, no-code plat­forms, out­sourc­ing. Every decade brings a new tech­nol­ogy that promises to make soft­ware en­gi­neers ob­so­lete, and every decade the de­mand for skilled en­gi­neers grows. AI will not be dif­fer­ent. The tools change. The fun­da­men­tals en­dure.

AI made writ­ing code eas­ier and made be­ing an en­gi­neer harder. Both of these things are true at the same time, and pre­tend­ing that only the first one mat­ters is how or­ga­ni­za­tions lose their best peo­ple.

The en­gi­neers who are strug­gling right now are not strug­gling be­cause they are bad at their jobs. They are strug­gling be­cause their jobs changed un­der­neath them while the in­dus­try cel­e­brated the part that got eas­ier and ig­nored the parts that got harder.

Expectations rose with­out an­nounce­ment. Roles ex­panded with­out bound­aries. Output de­mands in­creased with­out cor­re­spond­ing in­creases in sup­port, train­ing, or ac­knowl­edg­ment. And the en­gi­neers who raised con­cerns were told, im­plic­itly or ex­plic­itly, that they just needed to adapt faster.

That is not how you build a sus­tain­able en­gi­neer­ing cul­ture. That is how you build a burnout ma­chine.

The in­dus­try needs to name this para­dox hon­estly. AI is an in­cred­i­ble tool. It is also plac­ing enor­mous new de­mands on the peo­ple us­ing it. Both things can be true. Both things need to be ad­dressed.

The or­ga­ni­za­tions that get this right, that in­vest in their peo­ple along­side their tools, that ac­knowl­edge the hu­man cost of rapid tech­no­log­i­cal change while still push­ing for­ward, those are the or­ga­ni­za­tions that will at­tract and re­tain the best en­gi­neer­ing tal­ent in the years ahead.

The ones that do not will dis­cover some­thing that every tech­nol­ogy cy­cle even­tu­ally teaches: tools do not build prod­ucts. People do. And peo­ple have lim­its that no amount of AI can au­to­mate away.

If this res­onated with you, I would love to hear your per­spec­tive. What has changed most about your en­gi­neer­ing role in the last year? Drop me a mes­sage or con­nect with me on LinkedIn. I write reg­u­larly about the in­ter­sec­tion of AI, soft­ware en­gi­neer­ing, and lead­er­ship at ivan­turkovic.com. Follow along if you want hon­est, ex­pe­ri­ence-dri­ven per­spec­tives on how tech­nol­ogy is ac­tu­ally chang­ing this pro­fes­sion.

...

Read the original on www.ivanturkovic.com »

7 290 shares, 25 trendiness

MCP is dead. Long live the CLI

I’m go­ing to make a bold claim: MCP is al­ready dy­ing. We may not fully re­al­ize it yet, but the signs are there. OpenClaw does­n’t sup­port it. Pi does­n’t sup­port it. And for good rea­son.

When Anthropic an­nounced the Model Context Protocol, the in­dus­try col­lec­tively lost its mind. Every com­pany scram­bled to ship MCP servers as proof they were AI first.” Massive re­sources poured into new end­points, new wire for­mats, new au­tho­riza­tion schemes, all so LLMs could talk to ser­vices they could al­ready talk to.

I’ll ad­mit, I never fully un­der­stood the need for it. You know what LLMs are re­ally good at? Figuring things out on their own. Give them a CLI and some docs and they’re off to the races.

I tried to avoid writ­ing this for a long time, but I’m con­vinced MCP pro­vides no real-world ben­e­fit, and that we’d be bet­ter off with­out it. Let me ex­plain.

LLMs are re­ally good at us­ing com­mand-line tools. They’ve been trained on mil­lions of man pages, Stack Overflow an­swers, and GitHub re­pos full of shell scripts. When I tell Claude to use gh pr view 123, it just works.

MCP promised a cleaner in­ter­face, but in prac­tice I found my­self writ­ing the same doc­u­men­ta­tion any­way: what each tool does, what pa­ra­me­ters it ac­cepts, and more im­por­tantly, when to use it. The LLM did­n’t need a new pro­to­col.

When Claude does some­thing un­ex­pected with Jira, I can run the same jira is­sue view com­mand and see ex­actly what it saw. Same in­put, same out­put, no mys­tery.

With MCP, the tool only ex­ists in­side the LLM con­ver­sa­tion. Something goes wrong and now I’m spelunk­ing through JSON trans­port logs in­stead of just run­ning the com­mand my­self. Debugging should­n’t re­quire a pro­to­col de­coder.

This is where the gap gets wide. CLIs com­pose. I can pipe through jq, chain with grep, redi­rect to files. This is­n’t just con­ve­nient; it’s of­ten the only prac­ti­cal ap­proach.

With MCP, your op­tions are dump­ing the en­tire plan into the con­text win­dow (expensive, of­ten im­pos­si­ble) or build­ing cus­tom fil­ter­ing into the MCP server it­self. Either way, you’re do­ing more work for a worse re­sult. The CLI ap­proach uses tools that al­ready ex­ist, are well-doc­u­mented, and that both hu­mans and agents un­der­stand.

MCP is un­nec­es­sar­ily opin­ion­ated about auth. Why should a pro­to­col for giv­ing an LLM tools to use need to con­cern it­self with au­then­ti­ca­tion?

CLI tools don’t care. aws uses pro­files and SSO. gh uses gh auth lo­gin. kubectl uses kube­con­fig. These are bat­tle-tested auth flows that work the same whether I’m at the key­board or Claude is dri­ving. When auth breaks, I fix it the way I al­ways would: aws sso lo­gin, gh auth re­fresh. No MCP-specific trou­bleshoot­ing re­quired.

Local MCP servers are processes. They need to start up, stay run­ning, and not silently hang. In Claude Code, they’re spawned as child processes, which works un­til it does­n’t.

CLI tools are just bi­na­ries on disk. No back­ground processes, no state to man­age, no ini­tial­iza­tion dance. They’re there when you need them and in­vis­i­ble when you don’t.

Beyond the de­sign phi­los­o­phy, MCP has real day-to-day fric­tion:

Initialization is flaky. I’ve lost count of the times I’ve restarted Claude Code be­cause an MCP server did­n’t come up. Sometimes it works on retry, some­times I’m clear­ing state and start­ing over.

Re-auth never ends. Using mul­ti­ple MCP tools? Have fun au­then­ti­cat­ing each one. CLIs with SSO or long-lived cre­den­tials just don’t have this prob­lem. Auth once and you’re done.

Permissions are all-or-noth­ing. Claude Code lets you al­lowlist MCP tools by name, but that’s it. You can’t scope to read-only op­er­a­tions or re­strict pa­ra­me­ters. With CLIs, I can al­lowlist gh pr view but re­quire ap­proval for gh pr merge. That gran­u­lar­ity mat­ters.

I’m not say­ing MCP is com­pletely use­less. If a tool gen­uinely has no CLI equiv­a­lent, MCP might be the right call. I still use plenty in my day-to-day, when it’s the only op­tion avail­able.

I might even ar­gue there’s some value in hav­ing a stan­dard­ized in­ter­face, and that there are prob­a­bly use­cases where it makes more sense than a CLI.

But for the vast ma­jor­ity of work, the CLI is sim­pler, faster to de­bug, and more re­li­able.

The best tools are the ones that work for both hu­mans and ma­chines. CLIs have had decades of de­sign it­er­a­tion. They’re com­pos­able, de­bug­gable, and they pig­gy­back on auth sys­tems that al­ready ex­ist.

MCP tried to build a bet­ter ab­strac­tion. Turns out we al­ready had a pretty good one.

If you’re a com­pany in­vest­ing in an MCP server but you don’t have an of­fi­cial CLI, stop and re­think what you’re do­ing. Ship a good API, then ship a good CLI. The agents will fig­ure it out.

...

Read the original on ejholmes.github.io »

8 259 shares, 21 trendiness

New iron nanomaterial wipes out cancer cells without harming healthy tissue

Researchers at Oregon State University have cre­ated a new nano­ma­te­r­ial de­signed to de­stroy can­cer cells from the in­side. The ma­te­r­ial ac­ti­vates two sep­a­rate chem­i­cal re­ac­tions once in­side a tu­mor cell, over­whelm­ing it with ox­ida­tive stress while leav­ing sur­round­ing healthy tis­sue un­harmed.

Researchers at Oregon State University have cre­ated a new nano­ma­te­r­ial de­signed to de­stroy can­cer cells from the in­side. The ma­te­r­ial ac­ti­vates two sep­a­rate chem­i­cal re­ac­tions once in­side a tu­mor cell, over­whelm­ing it with ox­ida­tive stress while leav­ing sur­round­ing healthy tis­sue un­harmed.

The work, led by Oleh Taratula, Olena Taratula, and Chao Wang from the OSU College of Pharmacy, was pub­lished in Advanced Functional Materials.

The dis­cov­ery strength­ens the grow­ing field of chemo­dy­namic ther­apy or CDT. This emerg­ing can­cer treat­ment strat­egy takes ad­van­tage of the unique chem­i­cal con­di­tions found in­side tu­mors. Compared with nor­mal tis­sue, can­cer cells tend to be more acidic and con­tain higher lev­els of hy­dro­gen per­ox­ide.

Traditional CDT uses these tu­mor con­di­tions to spark the for­ma­tion of hy­droxyl rad­i­cals, highly re­ac­tive mol­e­cules made of oxy­gen and hy­dro­gen that con­tain an un­paired elec­tron. These re­ac­tive oxy­gen species dam­age cells through ox­i­da­tion, strip­ping elec­trons from es­sen­tial com­po­nents such as lipids, pro­teins, and DNA.

More re­cent CDT ap­proaches have also suc­ceeded in gen­er­at­ing sin­glet oxy­gen in­side tu­mors. Singlet oxy­gen is an­other re­ac­tive oxy­gen species, named for its sin­gle elec­tron spin state rather than the three spin states seen in the more sta­ble oxy­gen mol­e­cules pre­sent in the air.

However, ex­ist­ing CDT agents are lim­ited,” Oleh Taratula said. They ef­fi­ciently gen­er­ate ei­ther rad­i­cal hy­drox­yls or sin­glet oxy­gen but not both, and they of­ten lack suf­fi­cient cat­alytic ac­tiv­ity to sus­tain ro­bust re­ac­tive oxy­gen species pro­duc­tion. Consequently, pre­clin­i­cal stud­ies of­ten only show par­tial tu­mor re­gres­sion and not a durable ther­a­peu­tic ben­e­fit.”

To ad­dress these short­com­ings, the team de­vel­oped a new CDT nanoa­gent built from an iron-based metal-or­ganic frame­work or MOF. This struc­ture is ca­pa­ble of pro­duc­ing both hy­droxyl rad­i­cals and sin­glet oxy­gen, in­creas­ing its can­cer-fight­ing po­ten­tial. The MOF demon­strated strong tox­i­c­ity across mul­ti­ple can­cer cell lines while caus­ing min­i­mal harm to non­cancer­ous cells.

When we sys­tem­i­cally ad­min­is­tered our nanoa­gent in mice bear­ing hu­man breast can­cer cells, it ef­fi­ciently ac­cu­mu­lated in tu­mors, ro­bustly gen­er­ated re­ac­tive oxy­gen species and com­pletely erad­i­cated the can­cer with­out ad­verse ef­fects,” Olena Taratula said. We saw to­tal tu­mor re­gres­sion and long-term pre­ven­tion of re­cur­rence, all with­out see­ing any sys­temic tox­i­c­ity.”

In these pre­clin­i­cal ex­per­i­ments, tu­mors dis­ap­peared en­tirely and did not re­turn, and the an­i­mals showed no signs of harm­ful side ef­fects.

Before mov­ing into hu­man tri­als, the re­searchers plan to test the treat­ment in ad­di­tional can­cer types, in­clud­ing ag­gres­sive pan­cre­atic can­cer, to de­ter­mine whether the ap­proach can be ef­fec­tive across a wide range of tu­mors.

Other con­trib­u­tors to the study in­cluded Oregon State re­searchers Kongbrailatpam Shitaljit Sharma, Yoon Tae Goo, Vladislav Grigoriev, Constanze Raitmayr, Ana Paula Mesquita Souza, and Manali Parag Phawde. Funding was pro­vided by the National Cancer Institute of the National Institutes of Health and the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

...

Read the original on www.sciencedaily.com »

9 234 shares, 12 trendiness

Introduction to Modern AI

* Lectures: MW[F] 9:30–10:50 Tepper 1403 (note: Friday lec­tures will only be used for re­view ses­sions or makeup lec­tures when needed)

A min­i­mal free ver­sion of this course will be of­fered on­line, si­mul­ta­ne­ous to the CMU of­fer­ing, start­ing on 1/26 (with a two-week de­lay from the CMU course). This means that (lecture videos, as­sign­ments avail­able on mu­grade, etc) will be avail­able to the on­line course af­ter the dates in­di­cated in the sched­ule be­low. By this, we mean that any­one will be able to watch lec­ture videos for the course, and sub­mit (autograded) as­sign­ments (though not quizzes or midterms/​fi­nal). Enroll here to re­ceive emails on lec­tures and home­works once they are avail­able. Note that in­for­ma­tion here about TAs, of­fice hours, grad­ing, pre­req­ui­sites, etc, are for the CMU ver­sion, not the on­line of­fer­ing.

This course pro­vides an in­tro­duc­tion to how mod­ern AI sys­tems work. By modern AI, we specif­i­cally mean the ma­chine learn­ing meth­ods and large lan­guage mod­els (LLMs) be­hind sys­tems like ChatGPT, Gemini, and Claude. [Note]

Despite their seem­ingly amaz­ing gen­er­al­ity, the ba­sic tech­niques that un­der­lie these AI mod­els are sur­pris­ingly sim­ple: a min­i­mal LLM im­ple­men­ta­tion lever­ages a fairly small set of ma­chine learn­ing meth­ods and ar­chi­tec­tures, and can be writ­ten in a few hun­dred lines of code.

This course will guide you through the ba­sic meth­ods that will let you im­ple­ment a ba­sic AI chat­bot. You will learn the ba­sics of su­per­vised ma­chine learn­ing, large lan­guage mod­els, and post-train­ing. By the end of the course you will be able to write the code that runs an open source LLM from scratch, as well as train these mod­els based upon a cor­pus of data. The ma­te­r­ial we cover will in­clude:

* Post-training

The top­ics above are a gen­eral fram­ing of what the course will cover. However, as this course is be­ing of­fered for the first time in Spring 2026, some el­e­ments are likely to change over the first of­fer­ing.

* Programming: 15-112 or 15-122. You must be pro­fi­cient in ba­sic Python pro­gram­ming, in­clud­ing ob­ject ori­ented meth­ods.

* Math: 21-111 or 21-120. The course will use ba­sic meth­ods from dif­fer­en­tial cal­cu­lus, in­clud­ing com­put­ing de­riv­a­tives. Some fa­mil­iar­ity with lin­ear al­ge­bra and prob­a­bil­ity is also ben­e­fi­cial, but these top­ics will be cov­ered to the ex­tent needed for the course.

A ma­jor com­po­nent of the course will be the de­vel­op­ment of a min­i­mal AI chat­bot through a se­ries of pro­gram­ming as­sign­ments. Homeworks are sub­mit­ted us­ing mu­grade sys­tem (tutorial video). Some as­sign­ments build on pre­vi­ous ones, though for the in-class CMu ver­sion we’ll dis­trib­ute so­lu­tions to help you work through any er­rors that may have cropped up in pre­vi­ous as­sign­ments (for the on­line ver­sion, we’d sug­gest talk­ing to oth­ers who were able to com­plete the as­sign­ment). In ad­di­tion to the (main) pro­gram­ming as­pect, some home­works may con­tain shorter writ­ten por­tion that works out some of the math­e­mat­i­cal de­tails be­hind the ap­proach.

All home­works are re­leased as Colab note­books, at the links be­low. We are also re­leas­ing Marimo note­book ver­sions. The mu­grade ver­sion of the on­line as­sign­ment will be avail­able two weeks af­ter the re­lease dates for the CMU course.

Each home­work will be ac­com­pa­nied by an in-class (15 minute) quiz that as­sesses ba­sic ques­tions based upon the as­sign­ment. This will in­clude repli­cat­ing (at a high level) some of the code you wrote for the as­sign­ment, or an­swer­ing con­cep­tual ques­tions about the as­sign­ment. All quizzes are closed book and closed notes.

In ad­di­tion to the home­work quizzes, there will be 3 in-per­son ex­ams, two midterms and a fi­nal (during fi­nals pe­riod). The midterms will fo­cus on ma­te­r­ial only cov­ered dur­ing that sec­tion of the courses, while the fi­nal will be cu­mu­la­tive (but with an em­pha­sis on the last third of the course). All midterms and fi­nal and closed book and closed notes.

Lecture sched­ule is ten­ta­tive and will be up­dated over the course of se­mes­ter. All ma­te­ri­als will be avail­able to the on­line course two weeks af­ter the dates here.

Students are per­mit­ted to use AI as­sis­tants for all home­work and pro­gram­ming as­sign­ments (especially as a ref­er­ence for un­der­stand­ing any top­ics that seem con­fus­ing), but we strongly en­cour­age you to com­plete your fi­nal sub­mit­ted ver­sion of your as­sign­ment with­out AI. You can­not use any such as­sis­tants, or any ex­ter­nal ma­te­ri­als, dur­ing in-class eval­u­a­tions (both the home­work quizzes and the midterms and fi­nal).

The ra­tio­nale be­hind this pol­icy is a sim­ple one: AI can be ex­tremely help­ful as a learn­ing tool (and to be clear, as an ac­tual im­ple­men­ta­tion tool), but over-re­liance on these sys­tems can cur­rently be a detri­ment to learn­ing in many cases. You ab­solutely need to learn how to code and do other tasks us­ing AI tools, but turn­ing in AI-generated so­lu­tions for the rel­a­tively short as­sign­ments we give you can (at least in our cur­rent ex­pe­ri­ence) ul­ti­mately lead to sub­stan­tially less un­der­stand­ing of the ma­te­r­ial. The choice is yours on as­sign­ments, but we be­lieve that you will ul­ti­mately per­form much bet­ter on the in-class quizzes and ex­ams if you do work through your fi­nal sub­mit­ted home­work so­lu­tions your­self.

...

Read the original on modernaicourse.org »

10 218 shares, 28 trendiness

MicroGPT explained interactively

Andrej Karpathy wrote a 200-line Python script that trains and runs a GPT from scratch, with no li­braries or de­pen­den­cies, just pure Python. The script con­tains the al­go­rithm that pow­ers LLMs like ChatGPT.

Let’s walk through it piece by piece and watch each part work. Andrej did a walk­through on his blog, but here I take a more vi­sual ap­proach, tai­lored for be­gin­ners.

The model trains on 32,000 hu­man names, one per line: emma, olivia, ava, is­abella, sophia… Each name is a doc­u­ment. The mod­el’s job is to learn the sta­tis­ti­cal pat­terns in these names and gen­er­ate plau­si­ble new ones that sound like they could be real.

By the end of train­ing, the model pro­duces names like kamon”, karai”, anna”, and anton”.The model has learned which char­ac­ters tend to fol­low which, which sounds are com­mon at the start vs. the end, and how long a typ­i­cal name runs. From ChatGPT’s per­spec­tive, your con­ver­sa­tion is just a doc­u­ment. When you type a prompt, the mod­el’s re­sponse is a sta­tis­ti­cal doc­u­ment com­ple­tion.

Neural net­works work with num­bers, not char­ac­ters. So we need a way to con­vert text into a se­quence of in­te­gers and back. The sim­plest pos­si­ble to­k­enizer as­signs one in­te­ger to each unique char­ac­ter in the dataset. The 26 low­er­case let­ters get ids 0 through 25, and we add one spe­cial to­ken called BOS (Beginning of Sequence) with id 26 that marks where a name starts and ends.

Type a name be­low and watch it get to­k­enized. Each char­ac­ter maps to its in­te­ger id, and BOS to­kens wrap both ends:

The in­te­ger val­ues them­selves have no mean­ing. Token 4 is­n’t more” than to­ken 2. Each to­ken is just a dis­tinct sym­bol, like as­sign­ing a dif­fer­ent color to each let­ter. Production to­k­eniz­ers like tik­to­ken (used by GPT-4) work on chunks of char­ac­ters for ef­fi­ciency, giv­ing a vo­cab­u­lary of ~100,000 to­kens, but the prin­ci­ple is the same.

Here’s the core task: given the to­kens we’ve seen so far, pre­dict what comes next. We slide through the se­quence one po­si­tion at a time. At po­si­tion 0, the model sees only BOS and must pre­dict the first let­ter. At po­si­tion 1, it sees BOS and the first let­ter and must pre­dict the sec­ond let­ter. And so on.

Step through the se­quence be­low and watch the con­text grow while the tar­get shifts for­ward:

Each step pro­duces one train­ing ex­am­ple: the con­text on the left is the in­put, the green to­ken on the right is what the model should pre­dict. For the name emma”, that’s five in­put-tar­get pairs. This slid­ing win­dow is how all lan­guage mod­els train, in­clud­ing ChatGPT.

At each po­si­tion, the model out­puts 27 raw num­bers, one per pos­si­ble next to­ken. These num­bers (called ) can be any­thing: pos­i­tive, neg­a­tive, large, small. We need to con­vert them into prob­a­bil­i­ties that are pos­i­tive and sum to 1. does this by ex­po­nen­ti­at­ing each score and di­vid­ing by the to­tal.

Adjust the log­its be­low and watch the prob­a­bil­ity dis­tri­b­u­tion change. Notice how one large logit dom­i­nates, and the ex­po­nen­tial am­pli­fies dif­fer­ences.

Here’s the ac­tual soft­max code from mi­crogpt. Step through it to see the in­ter­me­di­ate val­ues at each line:

The sub­trac­tion of the max value be­fore ex­po­nen­ti­at­ing does­n’t change the re­sult math­e­mat­i­cally (dividing nu­mer­a­tor and de­nom­i­na­tor by the same con­stant can­cels out) but pre­vents over­flow. Without it, exp(100) would pro­duce in­fin­ity.

How wrong was the pre­dic­tion? We need a sin­gle num­ber that cap­tures the model thought the cor­rect an­swer was un­likely.” If the model as­signs prob­a­bil­ity 0.9 to the cor­rect next to­ken, the loss is low (0.1). If it as­signs prob­a­bil­ity 0.01, the loss is high (4.6). The for­mula is where is the prob­a­bil­ity the model as­signed to the cor­rect to­ken. This is called .

Drag the slider to ad­just the prob­a­bil­ity of the cor­rect to­ken and watch the loss change:

The curve has two prop­er­ties that make it use­ful. First, it’s zero when the model is per­fectly con­fi­dent in the right an­swer (). Second, it goes to in­fin­ity as the model as­signs near-zero prob­a­bil­ity to the truth (), which pun­ishes con­fi­dent wrong an­swers se­verely. Training min­i­mizes this num­ber.

To im­prove, the model needs to an­swer: for each of my 4,192 , if I nudge it up by a tiny amount, does the loss go up or down, and by how much?” com­putes this by walk­ing the com­pu­ta­tion back­ward, ap­ply­ing the at each step.

Every math­e­mat­i­cal op­er­a­tion (add, mul­ti­ply, exp, log) is a node in a graph. Each node re­mem­bers its in­puts and knows its lo­cal de­riv­a­tive. The back­ward pass starts at the loss (where the is triv­ially 1.0) and mul­ti­plies lo­cal de­riv­a­tives along every path back to the in­puts.

Step through the for­ward pass, then the back­ward pass for a small ex­am­ple where with :

Now step through the ac­tual Value class code. Watch how each op­er­a­tion records its chil­dren and lo­cal gra­di­ents, then how back­ward() walks the graph in re­verse, ac­cu­mu­lat­ing gra­di­ents:

Notice that has a gra­di­ent of 4.0, not 3.0. That’s be­cause is used in two places: once in the mul­ti­pli­ca­tion () and once in the ad­di­tion (). The gra­di­ents from both paths sum up: . This is the mul­ti­vari­able chain rule in ac­tion. If a value con­tributes to the loss through mul­ti­ple paths, the to­tal de­riv­a­tive is the sum of con­tri­bu­tions from each path. This is the same al­go­rithm that PyTorch’s loss.back­ward() runs, op­er­at­ing on scalars in­stead of ten­sors. Same al­go­rithm, just smaller and slower.

We know how to mea­sure er­ror and how to trace that er­ror back to every pa­ra­me­ter. Now let’s build the model it­self, start­ing with how it rep­re­sents to­kens.

A raw to­ken id like 4 is just an in­dex. The model can’t do math with a bare in­te­ger. So each to­ken looks up a learned vec­tor (a list of 16 num­bers) from an table. Think of it as each to­ken hav­ing a 16-dimensional personality” that the model can ad­just dur­ing train­ing.

Position mat­ters too. The let­ter a” at po­si­tion 0 plays a dif­fer­ent role than a” at po­si­tion 4. So there’s a sec­ond em­bed­ding table in­dexed by po­si­tion. The to­ken em­bed­ding and po­si­tion em­bed­ding are added to­gether to form the in­put to the rest of the net­work.

Click a to­ken be­low to see its em­bed­ding vec­tors and how they com­bine:

The em­bed­ding val­ues start as small ran­dom num­bers and get tuned dur­ing train­ing. After train­ing, to­kens that be­have sim­i­larly (like vow­els) tend to end up with sim­i­lar em­bed­ding vec­tors. The model learns these rep­re­sen­ta­tions from scratch, with no prior knowl­edge of what a vowel is.

How to­kens talk to each other

This is how work. At each po­si­tion, the model needs to gather in­for­ma­tion from pre­vi­ous po­si­tions. It does this through : each to­ken pro­duces three vec­tors from its em­bed­ding.

A Query (“what am I look­ing for?“), a Key (“what do I con­tain?“), and a Value (“what in­for­ma­tion do I of­fer if se­lected?“). The query at the cur­rent po­si­tion is com­pared against all keys from pre­vi­ous po­si­tions via . High dot prod­uct means high rel­e­vance. Softmax con­verts these scores into at­ten­tion weights, and the weighted sum of val­ues is the out­put.

Explore the at­ten­tion weights be­low. Each cell shows how much one po­si­tion at­tends to an­other. Switch be­tween the four at­ten­tion heads to see dif­fer­ent pat­terns:

The gray re­gion in the up­per-right is the causal mask. Position 2 can’t at­tend to po­si­tion 4 be­cause po­si­tion 4 has­n’t hap­pened yet. This is what makes the model : each po­si­tion only sees the past.Dif­fer­ent heads learn dif­fer­ent pat­terns. One head might at­tend strongly to the most re­cent to­ken. Another might fo­cus on the BOS to­ken (to re­mem­ber we’re gen­er­at­ing a name”). A third might look for vow­els. The four heads run in par­al­lel, each op­er­at­ing on a 4-dimensional slice of the 16-dimensional em­bed­ding, and their out­puts are con­cate­nated and pro­jected back to 16 di­men­sions.

The model pipes each to­ken through: em­bed, nor­mal­ize, at­tend, add , nor­mal­ize, MLP, add resid­ual, pro­ject to out­put log­its. The (multilayer per­cep­tron) is a two-layer feed-for­ward net­work: pro­ject up to 64 di­men­sions, ap­ply (zero out neg­a­tives), pro­ject back to 16. If at­ten­tion is how to­kens com­mu­ni­cate, the MLP is where each po­si­tion thinks in­de­pen­dently.

Step through the pipeline for one to­ken and watch data flow through each stage:

Here’s the ac­tual gpt() func­tion from mi­crogpt. Step through to see the code ex­e­cut­ing line by line, with the in­ter­me­di­ate vec­tor at each stage:

The resid­ual con­nec­tions (the Add” steps) are load-bear­ing. Without them, gra­di­ents would shrink to near-zero by the time they reach the early lay­ers, and train­ing would stall. The resid­ual con­nec­tion gives gra­di­ents a short­cut, which is why deep net­works can train at all.RM­SNorm (root-mean-square nor­mal­iza­tion) rescales each vec­tor to have unit root-mean-square. This pre­vents ac­ti­va­tions from grow­ing or shrink­ing as they pass through the net­work, which sta­bi­lizes train­ing. GPT-2 used LayerNorm; RMSNorm is sim­pler and works just as well.

The train­ing loop re­peats 1,000 times: pick a name, to­k­enize it, run the model for­ward over every po­si­tion, com­pute the cross-en­tropy loss at each po­si­tion, av­er­age the losses, back­prop­a­gate to get gra­di­ents for every pa­ra­me­ter, and up­date the pa­ra­me­ters to make the loss a bit lower.

The op­ti­mizer is Adam, which is smarter than naive gra­di­ent de­scent. It main­tains a run­ning av­er­age of each pa­ra­me­ter’s re­cent gra­di­ents (momentum) and a run­ning av­er­age of the squared gra­di­ents (adaptive ). Parameters that have been get­ting con­sis­tent gra­di­ents take larger steps. Parameters that have been os­cil­lat­ing take smaller ones.

Watch the loss de­crease over 1,000 train­ing steps. The model starts at ~3.3 (random guess­ing among 27 to­kens: ) and set­tles around 2.37. The gen­er­ated names evolve from gib­ber­ish to plau­si­ble:

Step through the code for one com­plete train­ing it­er­a­tion. Watch it pick a name, run the for­ward pass at each po­si­tion, com­pute the loss, run back­ward, and up­date the pa­ra­me­ters:

Once train­ing is done, is straight­for­ward. Start with BOS, run the for­ward pass, get 27 prob­a­bil­i­ties, ran­domly sam­ple one to­ken, feed it back in, and re­peat un­til the model out­puts BOS again (meaning I’m done”) or we hit the max­i­mum length.

Temperature con­trols how we sam­ple. Before soft­max, we di­vide the log­its by the tem­per­a­ture. A tem­per­a­ture of 1.0 sam­ples di­rectly from the learned dis­tri­b­u­tion. Lower tem­per­a­tures sharpen the dis­tri­b­u­tion (the model picks its top choices more of­ten). Higher tem­per­a­tures flat­ten it (more di­verse but po­ten­tially less co­her­ent out­put).

Adjust the tem­per­a­ture and watch the prob­a­bil­ity dis­tri­b­u­tion change:

Step through the in­fer­ence loop to see a name be­ing gen­er­ated char­ac­ter by char­ac­ter. At each step, the model runs for­ward, pro­duces prob­a­bil­i­ties, and sam­ples the next to­ken:

A tem­per­a­ture ap­proach­ing 0 would al­ways pick the high­est-prob­a­bil­ity to­ken (greedy de­cod­ing). This pro­duces the most average” out­put. A tem­per­a­ture of 1.0 matches what the model ac­tu­ally learned. Values above 1.0 in­ject ex­tra ran­dom­ness, which can pro­duce cre­ative out­puts but also non­sense. The sweet spot for names is around 0.5.

Everything else is ef­fi­ciency

This 200-line script con­tains the com­plete al­go­rithm. Between this and ChatGPT, litte changes con­cep­tu­ally. The dif­fer­ences are things like: tril­lions of to­kens in­stead of 32,000 names. Subword to­k­eniza­tion (100K vo­cab­u­lary) in­stead of char­ac­ters. Tensors on GPUs in­stead of scalar Value ob­jects in Python. Hundreds of bil­lions of pa­ra­me­ters in­stead of 4,192. Hundreds of lay­ers in­stead of one. Training across thou­sands of GPUs for months.

But the loop is the same. Tokenize, em­bed, at­tend, com­pute, pre­dict the next to­ken, mea­sure sur­prise, walk the gra­di­ents back­ward, nudge the pa­ra­me­ters. Repeat.

...

Read the original on growingswe.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.