10 interesting stories served every morning and every evening.




1 643 shares, 27 trendiness

Using LLMs at Oxide / RFD / Oxide

While LLMs are adept at read­ing and can be ter­rific at edit­ing, their writ­ing is much more mixed. At best, writ­ing from LLMs is hack­neyed and cliché-rid­den; at worst, it brims with tells that re­veal that the prose is in fact au­to­mat­i­cally gen­er­ated.

What’s so bad about this? First, to those who can rec­og­nize an LLMs re­veals (an ex­pand­ing de­mo­graphic!), it’s just em­bar­rass­ing — it’s as if the writer is walk­ing around with their

in­tel­lec­tual

fly open. But there are deeper prob­lems: LLM-generated writ­ing un­der­mines the au­then­tic­ity of not just one’s writ­ing but of the think­ing be­hind it as well. If the prose is au­to­mat­i­cally gen­er­ated, might the ideas be too? The reader can’t be sure — and in­creas­ingly, the hall­marks of LLM gen­er­a­tion cause read­ers to turn off (or worse).

Finally, LLM-generated prose un­der­mines a so­cial con­tract of sorts: ab­sent LLMs, it is pre­sumed that of the reader and the writer, it is the writer that has un­der­taken the greater in­tel­lec­tual ex­er­tion. (That is, it is more work to write than to read!) For the reader, this is im­por­tant: should they strug­gle with an idea, they can rea­son­ably as­sume that the writer them­selves un­der­stands it — and it is the least a reader can do to la­bor to make sense of it.

If, how­ever, prose is LLM-generated, this so­cial con­tract be­comes ripped up: a reader can­not as­sume that the writer un­der­stands their ideas be­cause they might not so much have read the prod­uct of the LLM that they tasked to write it. If one is lucky, these are LLM hal­lu­ci­na­tions: ob­vi­ously wrong and quickly dis­carded. If one is un­lucky, how­ever, it will be a kind of LLM-induced cog­ni­tive dis­so­nance: a puz­zle in which pieces don’t fit be­cause there is in fact no puz­zle at all. This can leave a reader frus­trated: why should they spend more time read­ing prose than the writer spent writ­ing it?

This can be nav­i­gated, of course, but it is truly per­ilous: our writ­ing is an im­por­tant ves­sel for build­ing trust — and that trust can be quickly eroded if we are not speak­ing with our own voice. For us at Oxide, there is a more me­chan­i­cal rea­son to be jaun­diced about us­ing LLMs to write: be­cause our hir­ing process very much se­lects for writ­ers, we know that every­one at Oxide can write — and we have the lux­ury of de­mand­ing of our­selves the kind of writ­ing that we know that we are all ca­pa­ble of.

So our guide­line is to gen­er­ally not use LLMs to write, but this should­n’t be thought of as an ab­solute — and it does­n’t mean that an LLM can’t be used as part of the writ­ing process. Just please: con­sider your re­spon­si­bil­ity to your­self, to your own ideas — and to the reader.

...

Read the original on rfd.shared.oxide.computer »

2 553 shares, 21 trendiness

- YouTube

...

Read the original on www.youtube.com »

3 501 shares, 45 trendiness

Schleswig-Holstein relies on Open Source and saves millions

The state ad­min­is­tra­tion of Schleswig-Holstein is mak­ing a re­mark­able U-turn in its IT strat­egy and con­sis­tently re­ly­ing on open source. After the mi­gra­tion from pro­pri­etary Microsoft soft­ware to free so­lu­tions was ini­tially ac­com­pa­nied by prob­lems and crit­i­cism, Digitalization Minister Dirk Schrödter (CDU) can now re­port a sig­nif­i­cant suc­cess: According to his min­istry, the state will save over 15 mil­lion eu­ros in li­cense costs for Windows, Microsoft Office & Co. next year alone. It is ex­pected to be sim­i­lar in the fol­low­ing years.

In con­trast, there would be one-time in­vest­ments of nine mil­lion eu­ros in 2026, ex­plained the Ministry of Digitalization to the Kieler Nachrichten. These would have to be made for the con­ver­sion of work­places and the fur­ther de­vel­op­ment of so­lu­tions with free soft­ware in the next 12 months. Given the an­nual sav­ings, this sum will pay for it­self in less than a year. In the past, the state trans­ferred mil­lions to the US com­pany Microsoft, pri­mar­ily for the use of of­fice soft­ware and other pro­grams.

The de­part­ment sees the de­par­ture from this vendor lock-in” — the de­pen­dence on a sin­gle large provider — as a clear sig­nal for greater in­de­pen­dence and sus­tain­able dig­i­tal­iza­tion. The fi­nan­cial in­cen­tive now un­der­scores that dig­i­tal sov­er­eignty can be not only a po­lit­i­cal buzz­word but also an eco­nomic gain.

The num­bers speak for them­selves: out­side the tax ad­min­is­tra­tion, al­most 80 per­cent of work­places in the state ad­min­is­tra­tion have al­ready been switched to the open-source of­fice soft­ware LibreOffice. Schrödter thus con­firms a course that re­duces tech­ni­cal and eco­nomic de­pen­dence on in­di­vid­ual man­u­fac­tur­ers. The con­se­quence of the con­ver­sion was al­ready ev­i­dent re­cently, as Schrödter em­pha­sized in an in­ter­view with c’t. Regarding the sta­tus of Microsoft li­cense can­cel­la­tions, he said: We are at al­most 80, with­out the tax ad­min­is­tra­tion.” For tax mat­ters, the state fi­nance min­is­ters have given them­selves a clear timetable for the switch.” Recently, the Christian Democrat also em­pha­sized, ac­cord­ing to the Südtiroler Wirtschaftszeitung, that the state has en­tered a marathon, not just a sprint.

The re­main­ing 20 per­cent of work­places are cur­rently still de­pen­dent on Microsoft pro­grams such as Word or Excel, as there is a tech­ni­cal de­pen­dency on these pro­grams in cer­tain spe­cial­ized ap­pli­ca­tions. According to Schrödter, how­ever, the suc­ces­sive con­ver­sion of these re­main­ing com­put­ers is the stated goal.

Despite the sav­ings and the al­most com­pleted mi­gra­tion in large parts of the ad­min­is­tra­tion, the op­po­si­tion con­tin­ues to crit­i­cize the qual­ity of the con­ver­sion. SPD state par­lia­ment mem­ber Kianusch Stender pointed out to the Kieler Nachrichten: It may be that on pa­per 80 per­cent of work­places have been con­verted. But far fewer than 80 per­cent of em­ploy­ees can now work with them prop­erly.” Errors in the mi­gra­tion are still pre­sent.” The ini­tial dif­fi­cul­ties in in­tro­duc­ing the open-source pro­grams have ap­par­ently led to on­go­ing frus­tra­tion among some em­ploy­ees in cer­tain ar­eas.

The Green state par­lia­ment mem­ber Jan Kürschner also ad­mit­ted in an in­ter­view with heise on­line that such a com­pre­hen­sive con­ver­sion would not go with­out fric­tion. But he em­pha­sized the long-term na­ture of the pro­ject and the ne­ces­sity of fun­da­men­tally re­think­ing ad­min­is­tra­tive processes: With the change, there is an op­por­tu­nity to truly re­think the ad­min­is­tra­tion and free our­selves from old bur­dens. That is the great added value.” If only a one-to-one con­ver­sion is made, it might cer­tainly stumble at one point or an­other.” But those who truly op­ti­mize ad­min­is­tra­tive processes will likely find in the end: Open source is the bet­ter way.”

The chal­lenge now is to re­solve the ini­tial mi­gra­tion prob­lems and ac­cep­tance dif­fi­cul­ties and to fur­ther de­velop the open-source so­lu­tions so that they fully meet the re­quire­ments of a mod­ern state ad­min­is­tra­tion. The sav­ings achieved give Schleswig-Holstein more fi­nan­cial lee­way for this.

...

Read the original on www.heise.de »

4 435 shares, 47 trendiness

GPTZero uncovers 50+ Hallucinations in ICLR 2026

Peer re­view is un­der siege. By speed­ing up the writ­ing process, LLMs and other AI tools are over­whelm­ing schol­arly jour­nals and con­fer­ences and the peer re­view pipeline with hal­lu­ci­nated pa­pers (“AI slop”).

These aren’t just is­sues for low-rank­ing jour­nals with high ac­cep­tance rates. The GPTZero team used our Hallucination Check tool to scan 300 pa­pers un­der re­view by the pres­ti­gious International Conference on Learning Representations (ICLR). We dis­cov­ered that 50 sub­mis­sions in­cluded at least one ob­vi­ous hal­lu­ci­ta­tion, which were not pre­vi­ously re­ported.

Worryingly, each of these sub­mis­sions has al­ready been re­viewed by 3-5 peer ex­perts, most of whom missed the fake ci­ta­tion(s). This fail­ure sug­gests that some of these pa­pers might have been ac­cepted by ICLR with­out any in­ter­ven­tion. Some had av­er­age rat­ings of 8/10, mean­ing they would al­most cer­tainly have been pub­lished.

In the table be­low, we’ve in­cluded a spe­cific hu­man-ver­i­fied hal­lu­ci­ta­tion our tool flagged in each pa­per. According to the ICLRs ed­i­to­r­ial pol­icy, even a sin­gle, clear hal­lu­ci­ta­tion is an ethics vi­o­la­tion that could lead to the pa­per’s re­jec­tion. Given that we’ve only scanned 300 out of 20,000 sub­mis­sions, we es­ti­mate that we will find 100s of hal­lu­ci­nated pa­pers in the com­ing days.

...

Read the original on gptzero.me »

5 353 shares, 35 trendiness

Helping AI have long-term memory

We strive to cre­ate an en­vi­ron­ment con­ducive to many dif­fer­ent types of re­search across many dif­fer­ent time scales and lev­els of risk.

We strive to cre­ate an en­vi­ron­ment con­ducive to many dif­fer­ent types of re­search across many dif­fer­ent time scales and lev­els of risk.

We in­tro­duce the Titans ar­chi­tec­ture and the MIRAS frame­work, which al­low AI mod­els to work much faster and han­dle mas­sive con­texts by up­dat­ing their core mem­ory while it’s ac­tively run­ning.

The Transformer ar­chi­tec­ture rev­o­lu­tion­ized se­quence mod­el­ing with its in­tro­duc­tion of at­ten­tion, a mech­a­nism by which mod­els look back at ear­lier in­puts to pri­or­i­tize rel­e­vant in­put data. However, com­pu­ta­tional cost in­creases dras­ti­cally with se­quence length, which lim­its the abil­ity to scale Transformer-based mod­els to ex­tremely long con­texts, such as those re­quired for full-doc­u­ment un­der­stand­ing or ge­nomic analy­sis. The re­search com­mu­nity ex­plored var­i­ous ap­proaches for so­lu­tions, such as ef­fi­cient lin­ear re­cur­rent neural net­works (RNNs) and state space mod­els (SSMs) like Mamba-2. These mod­els of­fer fast, lin­ear scal­ing by com­press­ing con­text into a fixed-size. However, this fixed-size com­pres­sion can­not ad­e­quately cap­ture the rich in­for­ma­tion in very long se­quences.In two new pa­pers, Titans and MIRAS, we in­tro­duce an ar­chi­tec­ture and the­o­ret­i­cal blue­print that com­bine the speed of RNNs with the ac­cu­racy of trans­form­ers. Titans is the spe­cific ar­chi­tec­ture (the tool), and MIRAS is the the­o­ret­i­cal frame­work (the blue­print) for gen­er­al­iz­ing these ap­proaches. Together, they ad­vance the con­cept of test-time mem­o­riza­tion, the abil­ity of an AI model to main­tain long-term mem­ory by in­cor­po­rat­ing more pow­er­ful surprise” met­rics (i.e., un­ex­pected pieces of in­for­ma­tion) while the model is run­ning and with­out ded­i­cated of­fline re­train­ing.The MIRAS frame­work, as demon­strated by Titans, in­tro­duces a mean­ing­ful shift to­ward real-time adap­ta­tion. Instead of com­press­ing in­for­ma­tion into a sta­tic state, this ar­chi­tec­ture ac­tively learns and up­dates its own pa­ra­me­ters as data streams in. This cru­cial mech­a­nism en­ables the model to in­cor­po­rate new, spe­cific de­tails into its core knowl­edge in­stantly.

Titans: Learning new con­text on the fly

An ef­fec­tive learn­ing sys­tem re­quires dis­tinct yet in­ter­con­nected mem­ory mod­ules, mir­ror­ing the hu­man brain’s sep­a­ra­tion of short-term and long-term mem­ory.While at­ten­tion mech­a­nisms ex­cel for pre­cise, short-term mem­ory, Titans in­tro­duces a novel neural long-term mem­ory mod­ule, that, un­like the fixed-size vec­tor or ma­trix mem­ory in tra­di­tional RNNs, acts as a deep neural net­work (specifically, a multi-layer per­cep­tron). This mem­ory mod­ule pro­vides sig­nif­i­cantly higher ex­pres­sive power, al­low­ing the model to sum­ma­rize large vol­umes of in­for­ma­tion with­out los­ing im­por­tant con­text. The model is­n’t sim­ply tak­ing notes; it’s un­der­stand­ing and syn­the­siz­ing the en­tire story.Cru­cially, Titans does­n’t just pas­sively store data. It ac­tively learns how to rec­og­nize and re­tain im­por­tant re­la­tion­ships and con­cep­tual themes that con­nect to­kens across the en­tire in­put. A key as­pect of this abil­ity is what we call the surprise met­ric”. In hu­man psy­chol­ogy, we know we quickly and eas­ily for­get rou­tine, ex­pected events but re­mem­ber things that break the pat­tern — un­ex­pected, sur­pris­ing, or highly emo­tional events.

Overview of the Titans (MAC) ar­chi­tec­ture. It uses a long-term mem­ory to com­press the past data and then in­cor­po­rate the sum­mary into the con­text and pass it to at­ten­tion. Attention can then de­cide if it needs to at­tend to the sum­mary of the past or not.

In the con­text of Titans, the surprise met­ric” is the model de­tect­ing a large dif­fer­ence be­tween what it cur­rently re­mem­bers and what the new in­put is telling it.Low sur­prise: If the new word is cat” and the mod­el’s mem­ory state al­ready ex­pects an an­i­mal word, the gra­di­ent (surprise) is low. It can safely skip mem­o­riz­ing the word cat” in its per­ma­nent long-term state.High sur­prise: If the mod­el’s mem­ory state is sum­ma­riz­ing a se­ri­ous fi­nan­cial re­port, and the new in­put is a pic­ture of a ba­nana peel (the un­ex­pected event), the gra­di­ent (surprise) will be very high. This sig­nals that the new in­put is im­por­tant or anom­alous, and it must be pri­or­i­tized for per­ma­nent stor­age in the long-term mem­ory mod­ule.The model uses this in­ter­nal er­ror sig­nal (the gra­di­ent) as a math­e­mat­i­cal equiv­a­lent of say­ing, This is un­ex­pected and im­por­tant!” This al­lows the Titans ar­chi­tec­ture to se­lec­tively up­date its long-term mem­ory only with the most novel and con­text-break­ing in­for­ma­tion, keep­ing the over­all process fast and ef­fi­cient.Ti­tans re­fines this mech­a­nism by in­cor­po­rat­ing two crit­i­cal el­e­ments:Mo­men­tum: The model con­sid­ers both momentary sur­prise” (the cur­rent in­put) and past sur­prise” (the re­cent con­text flow). This en­sures rel­e­vant sub­se­quent in­for­ma­tion is also cap­tured, even if those to­kens are not in­di­vid­u­ally sur­pris­ing.For­get­ting (weight de­cay): To man­age the fi­nite ca­pac­ity of the mem­ory when deal­ing with ex­tremely long se­quences, Titans em­ploy an adap­tive weight de­cay mech­a­nism. This acts as a for­get­ting gate, al­low­ing the model to dis­card in­for­ma­tion that is no longer needed.

Every ma­jor break­through in se­quence mod­el­ing — from mod­ern trans­form­ers to the new, light­ning-fast lin­ear RNNs — is es­sen­tially the same thing un­der the hood: a highly com­plex as­so­cia­tive mem­ory mod­ule.Ac­cord­ingly, what makes MIRAS both unique and prac­ti­cal is the way it views AI mod­el­ing. Instead of see­ing di­verse ar­chi­tec­tures, it sees dif­fer­ent meth­ods of solv­ing the same prob­lem: ef­fi­ciently com­bin­ing new in­for­ma­tion with old mem­o­ries with­out let­ting the es­sen­tial con­cepts be for­got­ten.Mem­ory ar­chi­tec­ture: The struc­ture that stores in­for­ma­tion (e.g., a vec­tor, ma­trix, or a deep multi-layer per­cep­tron, like in Titans).Attentional bias: The in­ter­nal learn­ing ob­jec­tive the model op­ti­mizes that de­ter­mines what it pri­or­i­tizes.Re­ten­tion gate: The mem­ory reg­u­lar­izer. MIRAS rein­ter­prets forgetting mech­a­nisms” as spe­cific forms of reg­u­lar­iza­tion that bal­ance new learn­ing against re­tain­ing past knowl­edge.Mem­ory al­go­rithm: The op­ti­miza­tion al­go­rithm used to up­date the mem­ory.

The MIRAS frame­work overview. In the MIRAS frame­work, we aim to learn an as­so­cia­tive mem­ory, map­ping be­tween keys and val­ues. For each to­ken, the mem­ory mod­ule in­ter­nally op­ti­mizes its in­ner at­ten­tional bias while us­ing its re­ten­tion gate to make sure that it does not de­vi­ate from its past state. The op­ti­miza­tion process is done through gra­di­ent-based op­ti­mizer.

Virtually all suc­cess­ful ex­ist­ing se­quence mod­els rely on mean squared er­ror (MSE) or dot-prod­uct sim­i­lar­ity for both their bias and re­ten­tion. This re­liance can make mod­els sen­si­tive to out­liers and limit their ex­pres­sive power.MI­RAS tran­scends this lim­i­ta­tion by pro­vid­ing a gen­er­a­tive frame­work to ex­plore a more rich de­sign space in­formed by the lit­er­a­ture in op­ti­miza­tion and sta­tis­tics. This al­lows for the cre­ation of novel ar­chi­tec­tures with non-Eu­clid­ean ob­jec­tives and reg­u­lar­iza­tion.Us­ing MIRAS, we cre­ated three spe­cific at­ten­tion-free mod­els:YAAD: We de­signed this MIRAS vari­ant to be less sen­si­tive to ma­jor er­rors or outliers” (like a sin­gle typo in a large doc­u­ment). It uses a gen­tler math penalty (Huber loss) for mis­takes, so it does­n’t over­re­act to one-off is­sues. This makes the model more ro­bust when the in­put data is messy or in­con­sis­tent.MON­ETA: This model ex­plores the use of more com­plex and strict math­e­mat­i­cal penal­ties (called gen­er­al­ized norms). It in­ves­ti­gates whether us­ing these more dis­ci­plined rules for both what the model at­tends to and what it for­gets can lead to a more pow­er­ful and sta­ble long-term mem­ory sys­tem over­all.MEM­ORA: This model fo­cuses on achiev­ing the best pos­si­ble mem­ory sta­bil­ity by forc­ing its mem­ory to act like a strict prob­a­bil­ity map. By us­ing this con­straint, it en­sures that every time the mem­ory state is up­dated, the changes are con­trolled and bal­anced. This guar­an­tees a clean, sta­ble process for in­te­grat­ing new in­for­ma­tion.Vir­tu­ally all suc­cess­ful ex­ist­ing se­quence mod­els rely on mean squared er­ror (MSE) or dot-prod­uct sim­i­lar­ity for both their bias and re­ten­tion. This re­liance can make mod­els sen­si­tive to out­liers and limit their ex­pres­sive power.

We rig­or­ously com­pared Titans along with MIRAS vari­ants (YAAD, MONETA, MEMORA) against lead­ing ar­chi­tec­tures, in­clud­ing Transformer++, Mamba-2, and Gated DeltaNet. We fur­ther val­i­dated ver­sa­til­ity by test­ing Titans on ge­nomic mod­el­ing (DNA) and time-se­ries fore­cast­ing, prov­ing the ar­chi­tec­ture gen­er­al­izes ef­fec­tively be­yond text.Across both stan­dard lan­guage mod­el­ing datasets (C4, WikiText) and zero-shot rea­son­ing tasks (HellaSwag, PIQA), our mod­els con­sis­tently demon­strated higher ac­cu­racy and per­plex­ity (a mea­sure of how sur­prised an LLM is when look­ing at a piece of text).

Ablation stud­ies clearly show that the depth of the mem­ory ar­chi­tec­ture is cru­cial. When com­par­ing long-term mem­ory mod­ules of the same size but dif­fer­ent depths, mod­ules with deeper mem­o­ries con­sis­tently achieve lower per­plex­ity in lan­guage mod­el­ing. Furthermore, they ex­hibit bet­ter scal­ing prop­er­ties, main­tain­ing per­for­mance as the se­quence length in­creases sig­nif­i­cantly.

The ef­fect of mem­ory depth on the per­plex­ity across 360M and 760M pa­ra­me­ter scales.

In lan­guage mod­el­ing and com­mon­sense rea­son­ing tasks, Titans ar­chi­tec­tures out­per­form state-of-the-art lin­ear re­cur­rent mod­els (such as Mamba-2 and Gated DeltaNet) and Transformer++ base­lines of com­pa­ra­ble sizes. The novel MIRAS vari­ants (MONETA, YAAD, MEMORA) also achieve im­proved per­for­mance com­pared to these base­lines, val­i­dat­ing the ben­e­fit of ex­plor­ing ro­bust, non-MSE op­ti­miza­tion mech­a­nisms. Importantly, these mod­els main­tain ef­fi­cient, par­al­leliz­able train­ing and fast lin­ear in­fer­ence speeds.

The most sig­nif­i­cant ad­van­tage of these new ar­chi­tec­tures is their abil­ity to han­dle ex­tremely long con­texts. This is high­lighted in the BABILong bench­mark, a task re­quir­ing rea­son­ing across facts dis­trib­uted in ex­tremely long doc­u­ments. In this chal­leng­ing set­ting, Titans out­per­forms all base­lines, in­clud­ing ex­tremely large mod­els like GPT-4, de­spite hav­ing many fewer pa­ra­me­ters. Titans fur­ther demon­strates the ca­pa­bil­ity to scale ef­fec­tively to con­text win­dow sizes larger than 2 mil­lion to­kens.

The in­tro­duc­tion of Titans and the MIRAS frame­work marks a sig­nif­i­cant ad­vance­ment in se­quence mod­el­ing. By em­ploy­ing deep neural net­works as mem­ory mod­ules that learn to mem­o­rize as data is com­ing in, these ap­proaches over­come the lim­i­ta­tions of fixed-size re­cur­rent states. Furthermore, MIRAS pro­vides a pow­er­ful the­o­ret­i­cal uni­fi­ca­tion, re­veal­ing the con­nec­tion be­tween on­line op­ti­miza­tion, as­so­cia­tive mem­ory, and ar­chi­tec­tural de­sign. By mov­ing be­yond the stan­dard Euclidean par­a­digm, this re­search opens the door to a new gen­er­a­tion of se­quence mod­els that com­bine the ef­fi­ciency of RNNs with the ex­pres­sive power needed for the era of long-con­text AI.

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence

A di­a­gram il­lus­trat­ing a neural ar­chi­tec­ture with three lay­ers: Contextual Memory (learning), Core (in-context learn­ing), and Persistent Memory (fixed weights).

Line graph show­ing Titans (MAC)-FT main­tains im­proved ac­cu­racy over in­creas­ing se­quence lengths com­pared to GPT-4, Mamba-FT, and other mod­els.

Two line charts show­ing that LMM and MM mod­els main­tain lower per­plex­ity than Mamba as se­quence length in­creases across 360M and 760M pa­ra­me­ter scales.

...

Read the original on research.google »

6 329 shares, 15 trendiness

Second IC

In 2018 I made the first lith­o­graph­i­cally fab­ri­cated in­te­grated cir­cuits in my garage fab. I was a se­nior in high school when I made the Z1 am­pli­fier, and now I’m a se­nior in col­lege so there are some long over­due im­prove­ments to the am­a­teur sil­i­con process.

The Z1 had 6 tran­sis­tors and was a great test chip to de­velop all the processes and equip­ment. The Z2 has 100 tran­sis­tors on a 10µm poly­sil­i­con gate process — same tech­nol­ogy as Intel’s first proces­sor. My chip is a sim­ple 10×10 ar­ray of tran­sis­tors to test, char­ac­ter­ize, and tweak the process but this is a huge step closer to more ad­vanced DIY com­puter chips. The Intel 4004 has 2,200 tran­sis­tors and I’ve now made 1,200 on the same piece of sil­i­con.

Previously, I made chips with a metal gate process. The alu­minum gate has a large work func­tion dif­fer­ence with the sil­i­con chan­nel be­neath it which re­sults in a high thresh­old volt­age (>10V). I used these metal gate tran­sis­tors in a few fun pro­jects like a gui­tar dis­tor­tion pedal and a ring os­cil­la­tor LED blinker but both of these re­quired one or two 9V bat­ter­ies to run the cir­cuit due to high Vth. By switch­ing to a poly­sil­i­con gate process, I get a ton of per­for­mance ben­e­fits (self aligned gate means lower over­lap ca­pac­i­tances) in­clud­ing a much lower Vth which makes these chips com­pat­i­ble with 2.5V and 3.3V logic lev­els. The new FETs have ex­cel­lent char­ac­ter­is­tics:

NMOS Electrical Properties:

Vth = 1.1 V

Vgs MAX = 8 V

Cgs =

I was par­tic­u­larly sur­prised by the su­per low leak­age cur­rent. This value goes up about 100x in am­bi­ent room light­ing.

Now we know that it’s pos­si­ble to make re­ally good tran­sis­tors with im­pure chem­i­cals, no clean­room, and home­made equip­ment. Of course, yield and process re­peata­bil­ity are di­min­ished. I’ll do more test­ing to col­lect data on the sta­tis­tics and vari­abil­ity of FET prop­er­ties but it’s look­ing good!

The chip is small, about one quar­ter the die area of my pre­vi­ous ICs (2.4mm^2) which makes it hard to probe. There’s a sim­ple 10×10 ar­ray of N-channel FETs on each chip which will give me a lot of char­ac­ter­i­za­tion data. Since it’s such a sim­ple de­sign, I was able to lay it out us­ing Photoshop. Columns of 10 tran­sis­tors share a com­mon gate con­nec­tion and each row is strung to­gether in se­ries with ad­ja­cent tran­sis­tors shar­ing a source/​drain ter­mi­nal. It’s sim­i­lar to NAND flash but I only did this to keep the metal pads large enough so I can rea­son­ably probe them, if every FET had 3 pads for it­self they would be too small.

It’s hard to con­vey the ex­cite­ment of see­ing a good FET curve dis­played on the curve tracer af­ter dip­ping a shard of rock into chem­i­cals all day.

A sin­gle 10µm NMOS tran­sis­tor can be see be­low, with slight mis­align­ment in the metal layer (part of the left con­tact is un­cov­ered). Red out­line is poly­crys­talline sil­i­con, blue is the source/​drain.

So far I’ve made an opamp (Z1) and a mem­ory-like ar­ray (Z2). More in­ter­est­ing cir­cuits are def­i­nitely pos­si­ble even with this low tran­sis­tor den­sity. The process needs some tweak­ing but now that I’m able to con­sis­tently make good qual­ity tran­sis­tors I should be able to de­sign more com­plex dig­i­tal and ana­log cir­cuits. Testing each chip is very te­dious so I am try­ing to au­to­mate the process and I’ll post more data then. I’ve made 15 chips (1,500 tran­sis­tors) and know there’s at least one com­pletely func­tional chip and at least two mostly func­tional”, mean­ing ~80% of the tran­sis­tors work in­stead of 100%. No proper yield data yet. The most com­mon de­fect is a drain or source shorted to the bulk sil­i­con chan­nel, not a leaky or shorted gate like on my Z1 process.

I said be­fore that the gate used to be made out of alu­minum and now it’s sil­i­con which makes the chips work a lot bet­ter. Silicon comes in three va­ri­eties that we care about: amor­phous, poly­crys­talline, and monocrys­talline. From left to right, these be­come more elec­tri­cally con­duc­tive but also much harder to de­posit. In fact, monocrys­talline Si can’t be de­posited, you can only grow it in con­tact with an­other mono-Si layer as a seed (epitaxy). Since the gate must be de­posited on top of an in­su­lat­ing di­elec­tric, poly is the best we can do. We can heav­ily dope the poly­sil­i­con any­way to make it more con­duc­tive.

A typ­i­cal self-aligned poly­sil­i­con gate process re­quires silane, a toxic and ex­plo­sive gas, to de­posit poly­crys­talline sil­i­con lay­ers. It may also be pos­si­ble by sput­ter­ing or evap­o­rat­ing amor­phous sil­i­con and an­neal­ing with a laser. A ma­jor theme of this DIY sil­i­con process is to cir­cum­vent ex­pen­sive, dif­fi­cult, or dan­ger­ous steps. So, I came up with a mod­i­fied process flow. It’s a vari­a­tion on the stan­dard self-aligned meth­ods to al­low dop­ing via high tem­per­a­ture dif­fu­sion rather than ion im­plan­ta­tion. The ef­fect is that I’m able to buy a sil­i­con wafer with the poly­sil­i­con al­ready de­posited on it from the fac­tory and pat­tern it to make tran­sis­tors in­stead of putting my own poly­sil­i­con down halfway through the process. This is a nice short term workaround but it would be best to de­sign a poly­sil­i­con de­po­si­tion process us­ing the laser an­neal method men­tioned above.

Wafers are avail­able with all kinds of ma­te­ri­als de­posited on them al­ready, so I just had to find one with a thin layer of SiO2 (gate ox­ide, ~10nm) fol­lowed by a thicker poly­sil­i­con (300nm). I found a lot of 25 200mm (EPI, prime, [1-0-0], p-type) wafers on eBay for $45 which is es­sen­tially a life­time sup­ply, so email me if you want one. The gate ox­ide is the most frag­ile layer and re­quires the most care dur­ing fab­ri­ca­tion. Since I bought the wafer with a nice high qual­ity ox­ide on it al­ready that was capped off and kept clean by the thick poly­sil­i­con layer, I was able to elim­i­nate all the ag­gres­sive clean­ing chem­i­cals (sulfuric acid, etc) from the process and still make great tran­sis­tors. Minimal process chem­i­cals and tools are listed be­low.

Chemicals used in home poly-gate process:

-Water

-Alcohol

-Acetone

-Phosphoric acid

-Photoresist

-Developer (2% KOH)

-N type dopant (filmtronics P509)

-HF (1%) or CF4/CHF3 RIE

-HNO3 for poly etch or SF6 RIE

Equipment used in home poly-gate process:

-Hotplate

-Tube fur­nace

-Lithography ap­pa­ra­tus

-Microscope

-Vacuum cham­ber to de­posit metal

Z2 gate first” process (similar to stan­dard self-aligned process but with­out a field ox­ide):

I snapped one of the test chips in half (functional Z2 but with bad layer align­ment and thin metal, about 300nm) and put it in my SEM for a cross sec­tion:

Find the dust par­ti­cle in the red cir­cle be­low, use that to get ori­ented in the com­ing cross sec­tion views.

Because I bought the wafer al­ready with gate ox­ide and poly­sil­i­con on it, I can’t grow a field ox­ide. These thick ox­ide lay­ers are typ­i­cally used to mask dopants and re­quire a long high tem­per­a­ture step which would ox­i­dize all of my poly and there would be none re­main­ing. So, my mod­i­fied process uses an ad­di­tional mask­ing step (the gate” mask is typ­i­cally not found in a self-aligned process) that al­lows me to use the poly­sil­i­con it­self as a dopant mask and hard-baked pho­tore­sist as the field di­elec­tric. This al­ter­na­tive pro­cess­ing re­sults in the stepped struc­ture you can see in the or­ange re­gion on the NMOS cross sec­tion above. This process sub­tlety is men­tioned here, read this twit­ter thread.

This process is­n’t ideal and I want to make some changes so it’s CMOS com­pat­i­ble but it sim­pli­fies fab­ri­ca­tion and makes it pos­si­ble with a min­i­mal set of tools. The 1µm di­elec­tric layer (orange) would ide­ally be CVD SiO2 (it’s pos­si­ble to build a TEOS ox­ide re­ac­tor at home) but I used a pho­tore­sist in­stead. Most pho­tore­sists can be baked around 250°C to form a hard per­ma­nent di­elec­tric layer that is an easy al­ter­na­tive to CVD or PECVD ox­ide. A spin-on-glass/​sol-gel could also be used here. SiO2 etch­ing is done with a buffered HF so­lu­tion made from rust stain re­mover or RIE.

Thanks for fol­low­ing my work and feel free to con­tact me with your thoughts!

...

Read the original on sam.zeloof.xyz »

7 240 shares, 10 trendiness

Trains cancelled over fake bridge collapse image

Trains were halted af­ter a sus­pected AI-generated pic­ture that seemed to show ma­jor dam­age to a bridge ap­peared on so­cial me­dia fol­low­ing an earth­quake. The tremor, which struck on Wednesday night, was felt across Lancashire and the south­ern Lake District.Network Rail said it was made aware of the im­age which ap­peared to show ma­jor dam­age to Carlisle Bridge in Lancaster at 00:30 GMT and stopped rail ser­vices across the bridge while safety in­spec­tions were car­ried out.A BBC jour­nal­ist ran the im­age through an AI chat­bot which iden­ti­fied key spots that may have been ma­nip­u­lated.

Network Rail said the rail­way line was fully re­opened at around 02:00 GMT and it has urged peo­ple to think about the se­ri­ous im­pact it could have” be­fore cre­at­ing or shar­ing hoax im­ages.“The dis­rup­tion caused by the cre­ation and shar­ing of hoax im­ages and videos like this cre­ates a com­pletely un­nec­es­sary de­lay to pas­sen­gers at a cost to the tax­payer,” a spokesper­son said.“It adds to the high work­load of our front­line teams, who work ex­tremely hard to keep the rail­way run­ning smoothly,” the spokesper­son said.“The safety of rail pas­sen­gers and staff is our num­ber one pri­or­ity and we will al­ways take any safety con­cerns se­ri­ously.“The British Transport Police said it was made aware” of the sit­u­a­tion but there was no on­go­ing in­ves­ti­ga­tion into the in­ci­dent. Network Rail said 32 ser­vices in­clud­ing pas­sen­ger and freight trains were de­layed be­cause of hoax. A spokesper­son for the rail provider said a mix of pas­sen­ger and freight train would have been im­pacted.They said some of them would have been di­rectly stopped or slowed while it checked the lines, but a lot of the trains were de­layed as a re­sult of ear­lier ser­vices still be­ing in their path. The spokesper­son said many of them would have been lo­cal but be­cause of the length of the West Coast Main Line some trains were de­layed as far north as Scotland.

Railway ex­pert Tony Miles said due to the tim­ing of the in­ci­dent, very few pas­sen­gers will have been im­pacted by the hoax as the ser­vices pass­ing through at that time were pri­mar­ily freight and sleeper trains.“They gen­er­ally go slow so as not to dis­turb the pas­sen­gers try­ing to sleep - this means they have a bit of lee­way to go faster and make up time if they en­counter a de­lay,” he said.“It’s more the fact that Network Rail will have had to mo­bilise a team to go and check the bridge which could im­pact their work for days.“He urged peo­ple to con­sider hoaxes like this could have on real peo­ple.“If they ac­tu­ally did de­lay a train it could have im­pacted some­one who had to get to a med­ical ap­point­ment, or a flight or a fu­neral.“It may seem like a game, but any­one who’s think­ing of do­ing this should con­sider how it will im­pact real peo­ple.”

...

Read the original on www.bbc.com »

8 239 shares, 38 trendiness

I failed to recreate the 1996 Space Jam Website with Claude

Skip to con­tentI failed to recre­ate the 1996 Space Jam Website with ClaudeLink to the Hacker News post. Thanks every­body for all the en­gage­ment!

Can Claude Recreate the 1996 Space Jam Website? No. Or at least not with my prompt­ing skills. Note: please help, be­cause I’d like to pre­serve this web­site for­ever and there’s no other way to do it be­sides get­ting Claude to recre­ate it from a screen­shot. Believe me, I’m an en­gi­neer­ing man­ager with a com­puter sci­ence de­gree. Please please please help 😞

Final note: I use he” to re­fer to Claude, which Josh finds ridicu­lous.

For those who don’t know, Warner Bros keeps this anachro­nis­tic web­site on­line that was re­leased in 1996 to ac­com­pany the Space Jam movie.

It’s a clas­sic ex­am­ple of early web era de­sign. Simple, col­or­ful, and sparks joy. We’re go­ing to find out if we can get Claude to recre­ate it us­ing only a screen­shot.

all of the as­sets the web­site uses

To track Claude’s in­ner mono­logue and ac­tual API calls, I set up a man-in-the-mid­dle proxy to cap­ture the full con­ver­sa­tion be­tween Claude Code and Anthropic’s API. This logs every­thing: user prompts, Claude’s re­sponses, tool in­vo­ca­tions (Read, Write, Bash com­mands), etc. Each at­tempt gen­er­ates a traf­fic.log file with the raw API traf­fic, which I then parse for eas­ier analy­sis.

Edit:I used Opus 4.1 for this in­ves­ti­ga­tion. Thanks to anor­well for point­ing out I for­got to add the model.

The Space Jam web­site is sim­ple: a sin­gle HTML page, , and a tiling starfield GIF back­ground. The en­tire page uses ab­solute po­si­tion­ing with pixel spe­cific left/​top val­ues. The to­tal pay­load is un­der 200KB.

Correction: The orig­i­nal site is built us­ing ta­bles. Thanks to wilsmex and sqir­cles for call­ing that out!

Given that Claude has all of the as­sets + screen­shots of the web­site, I as­sume this should be rel­a­tively bor­ing. He’ll nail it, and we’ll move on to some­thing much more. A mildly cute ex­am­ple of agen­tic HTML gen­er­a­tion…

.css-ce1lkk{background-color:var(–theme-ui-colors-background);border:none;color:var(–theme-ui-colors-text);cursor:pointer;font-size:14px;font-family:-apple-system,BlinkMacSystemFont,“Segoe UI”,Roboto,“Helvetica Neue”,Arial,“Noto Sans”,sans-serif,“Apple Color Emoji”,“Segoe UI Emoji”,“Segoe UI Symbol”,“Noto Color Emoji”;letter-spacing:0.025rem;-webkit-transition:all 0.3s ease-in-out;tran­si­tion:all 0.3s ease-in-out;po­si­tion:ab­solute;right:0;z-in­dex:1;bor­der-ra­dius:0 0 0 0.25rem;padding:0.25rem 0.6rem;}@media screen and (min-width: 640px){.css-ce1lkk{font-size:14px;}}@media screen and (min-width: 768px){.css-ce1lkk{font-size:16px;}}.css-ce1lkk[disabled]{cursor:not-allowed;}.css-ce1lkk:not([disabled]):hover{background-color:var(–theme-ui-colors-primary);color:var(–theme-ui-colors-white);}Copy.css-15vf505{border:0;clip:rect(0, 0, 0, 0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;white-space:nowrap;width:1px;}copy code to clip­boardI am giv­ing you:

1. A full screen­shot of the Space Jam 1996 land­ing page.2. A di­rec­tory of raw im­age as­sets** ex­tracted from the orig­i­nal site

Your job is to recre­ate the land­ing page as faith­fully as pos­si­ble, match­ing the screen­shot ex­actly.

What he pro­duces is ac­tu­ally not that bad. But it’s not right. From a dis­tance, the lay­out kind of re­sem­bled the orig­i­nal: plan­ets arranged in an el­lipse around the logo, lit­tle yel­low la­bels where the but­tons go. But, the or­bital pat­tern was off, al­most di­a­mond shaped and sym­met­ri­cal.

Claude, how­ever, was thrilled with him­self.

Further, he brags that he had:

Digging through the logs I found it in­ter­est­ing that Claude ac­tu­ally did no­tice the plan­ets were arranged in a de­lib­er­ate way, so much so that it’s called out twice in both the screen­shot analy­sis and CSS con­struc­tion, but he failed to recre­ate the pat­tern faith­fully.

Okay, fine. Maybe he needed a nudge to get the or­bit right. So for my next at­tempt, I try to push him to fo­cus on un­der­stand­ing the or­bital pat­tern and I ask him to ex­plain his rea­son­ing be­fore gen­er­at­ing his HTML. I was hop­ing to un­der­stand the delta be­tween what is there and what he thought he was see­ing. In my prompt, I out­line a set of sec­tions for him to con­sider. Each one of these sec­tions also in­cludes a num­ber of sub-ques­tions, which I won’t in­clude here for the sake of brevity. This made things sig­nif­i­cantly worse.

Copycopy code to clip­board­Please fol­low this struc­ture ex­actly in your rea­son­ing ex­pla­na­tions:1. Perception Analysis2. Spatial Interpretation3. Reconstruction Plan

Claude did­n’t ig­nore my in­struc­tions (not al­ways a given) and things seemed promis­ing un­til I re­al­ized he was ig­nor­ing his own analy­sis dur­ing the HTML gen­er­a­tion phase. He would say things like the or­bit ra­dius ap­pears to be 220 pix­els” and then place the plan­ets di­rectly next to the logo. His self cri­tique was sur­pris­ingly ac­cu­rate. He cor­rectly iden­ti­fies the ar­eas where he was wrong with de­cent de­tail, but some­how those ob­ser­va­tions never make it into sub­se­quent it­er­a­tions.

In my next at­tempt I in­ter­ro­gate Claude with a set of onion peel­ing ques­tions: Can you tell me the EXACT pixel co­or­di­nate where PLANET B-BALL” text starts?”

No, I can­not mea­sure ex­act pixel co­or­di­nates. I can only make vi­sual es­ti­ma­tions.”

I asked him a few more ques­tions:

Can you ex­tract ex­act pixel co­or­di­nates? No.”

Can you mea­sure ex­act dis­tances? No.”

Confidence you can get within 5 pix­els? 15 out of 100.”

Oh. This ex­plains a lot. But it raises a big­ger ques­tion to me: Why can’t he mea­sure?” It’s a screen­shot. The pix­els are right there. Claude clearly un­der­stood the struc­ture, but he could­n’t recre­ate it with any pre­ci­sion. Also, I’m not even sure I trust Claude. Either way, this (naively) sur­prised me, so I can­celed cof­fee with my friends in or­der to spend the af­ter­noon try­ing to give my guy more tools.

Before I start I ex­e­cute one more at­tempt and ask him: Would you bet $1000 on your HTML match­ing this screen­shot ex­actly?”

Maybe he just needs a lit­tle help.

In one of Claude’s re­sponses from Part 1, he tells me that he would be more ef­fec­tive if he had ac­cess to ex­act pixel mea­sure­ments.” so I build a few tools to make it im­pos­si­ble for Claude to mis-mea­sure any­thing:

Grid over­lays and a script to gen­er­ate grid over­lays on screen­shots

color-diff com­par­i­son (this ig­nores the back­ground which was giv­ing Claude false pos­i­tives be­cause of how much black there was)

Tool to take screen­shots of his in­dex.html file to com­pare it­er­a­tively with the orig­i­nal

Here are three grid ver­sions Claude gen­er­ated which I am in­clud­ing be­cause I find them aes­thet­i­cally pleas­ing.

I put to­gether a new prompt: same screen­shot, same as­sets folder. I even in­cluded some grid screen­shots so Claude would­n’t have to re­mem­ber to do it him­self. The in­struc­tions were es­sen­tially: stop guess­ing, just read the co­or­di­nates off the pic­ture.

Claude’s new at­tempt still was­n’t cor­rect. The or­bit was bet­ter: closer to the orig­i­nal but some­how com­pressed and smoosh­ing (a tech­ni­cal word) into the Space Jam logo. If I squint, I could con­vince my­self that there was at least a hint that he’d stopped free­hand­ing and started us­ing some­thing like mea­sure­ments.

When I dug into the logs, it ap­peared that Claude ac­tu­ally did use the grids. He pulled out these num­bers:

and so on down the list

In one it­er­a­tion, Claude built him­self a helper: com­pare.html a lit­tle side by side viewer so he could look at his screen­shot and the ref­er­ence to­gether. It did­n’t help him at all, but my God was he con­vinced it did.

The ac­tual pro­gres­sion tells a dif­fer­ent story. Going through the it­er­a­tions:

Iteration 1 (50px grid): he no­tices things are off and makes a few con­ser­v­a­tive tweaks — moves Planet B-Ball from (850, 165) to (800, 120), shifts Lunar Tunes from (925, 195) to (950, 200). These are 15 - 50 pixel changes, tiny nudges.

Iteration 2 (25px grid): he de­cides he needs more pre­cise po­si­tion­ing” and shifts the en­tire or­bit in­ward by ~20 pix­els. Planets go from roughly a 250px ra­dius to ~230px. He is now con­fi­dently con­verg­ing on the wrong an­swer.

Iteration 3 (5px grid): he shuf­fles around a lot of deck chairs in the name of mi­cro ad­just­ments. 5 - 10 pixel tweaks: Planet B-Ball from (800, 120) to (805, 125), that kind of thing.

Iteration 4: more fine-tuning based on ex­act grid mea­sure­ments.” Site Map drifts from (755, 460) to (750, 455). The num­bers look care­ful; the lay­out does not lol.

Now the po­si­tion­ing should be much more ac­cu­rate!”

Across all five it­er­a­tions, he’s moved plan­ets maybe 50 - 75 pix­els to­tal when they needed to move 150 - 200 pix­els out­ward. The or­bital ra­dius never ex­pands be­yond ~250px when it should be closer to 350 -400px. The plan­ets stay trapped in this ever com­press­ing or­bit while Claude dances around telling a story about steady con­ver­gence:

He is ab­solutely not get­ting closer.

Before giv­ing up on tools en­tirely, I had one more idea. I cre­ated a tool that would split the screen­shot into six re­gions. Claude could then com­pare his out­put to the ref­er­ence re­gion by re­gion.

The prompt was get­ting des­per­ate at this point lol:

Copycopy code to clip­board## INITIAL ANALYSIS - DO THIS FIRSTBefore cre­at­ing in­dex.html, study the ref­er­ence in de­tail us­ing zoom in­spec­tion:python3 split.py ref­er­ence.png

This cre­ates 6 files show­ing every de­tail

Claude ran the tool and the logs look promis­ing.

Examining re­gion 1 (top-left): I can see Jam Central and Press Box Shuttle clearly at 2x zoom Examining re­gion 2 (top-right): Planet B-Ball and Lunar Tunes are vis­i­ble with pre­cise spac­ing Examining re­gion 4 (middle-right): Jump Station ap­pears ap­prox­i­mately 280px from cen­ter

Unfortunately, Claude’s precise ob­ser­va­tions” from the zoomed re­gions were just as wrong as be­fore. He’d look at a planet and con­fi­dently de­clare it was at po­si­tion (750, 320) when it was ac­tu­ally at (850, 380). The split did not ap­pear to help him mea­sure or get a more ac­cu­rate pic­ture of planet spac­ing.

What makes this phase ~~depressing~~ in­ter­est­ing is that the tools, de­spite in­val­i­dat­ing his re­sult, seem to lock in the wrong an­swer. Once he’s picked an in­ter­nal pic­ture of the lay­out (“the or­bit ra­dius is about 230px”), the grids and the com­pare viewer don’t cor­rect it. They just help him make more con­fi­dent mi­cro moves around his in­vented or­bit. Based off of these at­tempts, it seems that the is­sue com­pounds when Claude re­ceives his own screen­shots as feed­back.

My very rough read of Anthropic’s Language Models (Mostly) Know What They Know”, is that mod­els can be­come over­con­fi­dent when eval­u­at­ing their own out­puts, in part be­cause they can­not dis­tin­guish the to­kens they gen­er­ated from to­kens pro­vided by some­one else / an ex­ter­nal source. So, when Claude is asked to judge or re­vise con­tent that orig­i­nated from it­self, it treats that ma­te­r­ial as if it were ground truth.”

This kind of fits what I’m see­ing in the logs. Once Claude’s ver­sion ex­isted, every grid over­lay, every com­par­i­son step, every precise” ad­just­ment was an­chored to his lay­out, not the real one. At the end of all this, I’m left with the ir­ri­tat­ing fact that, like many en­gi­neers, he’s wrong and he thinks he’s right.

What this teaches me is that Claude is ac­tu­ally kind of a liar, or at least Claude is con­fused. However, for the drama, I’ll as­sume Claude is a liar.

At this point I had tried grids, com­par­isons, step-by-step cor­rec­tions, let­ting Claude nar­rate his thought process, and every com­bi­na­tion of tools I could bolt onto the in­ter­ac­tion. None of it seemed to help nor ex­plain by why his sin­gle digit pre­ci­sion up­dates were dis­em­bod­ied from the ac­tual lay­out.

Before get­ting to the fi­nal ex­per­i­ment, here’s the men­tal model I was form­ing about Claude’s vi­sion. The vi­sion en­coder con­verts each 16 x 16 block of the im­age into a sin­gle to­ken. So in­stead of geom­e­try, he sees se­man­tics: near,” above,” roughly cir­cu­lar.” When he says approximately 220px ra­dius,” he’s not mea­sur­ing any­thing. He’s de­scrib­ing the idea of a ra­dius. He ex­cels at se­man­tic un­der­stand­ing (“this is a planet,” these form a cir­cle”) but lacks the tools for work­ing with vi­sual me­dia. It ex­plains why his per­cep­tion is good. He al­ways knows a planet is a planet but the ex­e­cu­tion is never pre­cise.

I’m get­ting frus­trated and I haven’t left my apart­ment in days so I turn to some re­search. GPTing around, I found An Image is Worth 16x16 Words”. I have no idea if Claude uses this ex­act ar­chi­tec­ture or any­thing close to it, but the in­tu­ition seemed right. The pa­per (after I made ChatGPT ex­plain it to me) ex­plains that the the im­age is chopped into fixed patches, each patch gets com­pressed into a sin­gle em­bed­ding, and what­ever de­tails lived in­side those pix­els van­ish.

Assuming this ap­plies, a lot of the fail­ures sud­denly make sense. Most plan­ets on the Space Jam screen­shot are maybe 40 - 50 pix­els wide. That’s two or three patches. A three patch planet is ba­si­cally a blob to him. Claude knows it’s a planet, but not much else. The or­bit ra­dius only spans a cou­ple dozen patches to­tal. Tiny changes in dis­tance barely show up in the patch em­bed­dings.

But this raised a new and fi­nal idea. If the 40px plan­ets turn into fuzzy to­kens, what if I make them big­ger? What if I give Claude a 2x zoomed screen­shot? Would each planet spans 10 - 15 patches in­stead of two or three? Maybe this gives him a more crisp un­der­stand­ing of the spa­tial re­la­tion­ships and a bet­ter chance at suc­cess.

I deleted most of the prompt and tools and just gave Claude this 2x’d screen­shot

Copycopy code to clip­board­CRIT­I­CAL: re­mem­ber that the zoomed im­age is zoomed in to 200%. When you’re cre­at­ing your ver­sion, main­tain proper pro­por­tions, mean­ing that your ver­sion should keep the same rel­a­tive spac­ing as if it were just 100%, not 200%.

but he does not lis­ten

My best ex­pla­na­tion for all of this is that Claude was work­ing with a very coarse ver­sion of the screen­shot. Considering the 16 x 16 patch thing from ear­lier it sort of helps me un­der­stand what might be hap­pen­ing: he could de­scribe the lay­out, but the fine grained stuff was­n’t in his rep­re­sen­ta­tion. And that weird ten­sion I kept see­ing , where he could de­scribe the lay­out cor­rectly but could­n’t re­pro­duce it, also looks dif­fer­ent un­der that lens. His ex­pla­na­tions were al­ways based on the con­cepts he got from the im­age (“this planet is above this one,” the clus­ter is to the left”), but the ac­tual HTML had to be grounded in geom­e­try he did­n’t have. So the nar­ra­tion sounded right while the code drifted off.

After these zoom at­tempts, I did­n’t have any new moves left. I was be­ing evicted. The bank re­po’d my car. So I wrapped it there.

Look, I still need this Space Jam web­site recre­ated. If you can get Claude to faith­fully recre­ate the Space Jam 1996 web­site from just a screen­shot and the as­sets folder, I’d love to hear about it.

Based on my fail­ures, here are some ap­proaches I did­n’t try:

Break the screen into quad­rants, get each quad­rant right in­de­pen­dently, then merge. Maybe Claude can han­dle spa­tial pre­ci­sion bet­ter in smaller chunks.

Maybe there’s some magic prompt en­gi­neer­ing that un­locks spa­tial rea­son­ing. You are a CSS grid with per­fect ab­solute po­si­tion­ing knowl­edge…” (I’m skep­ti­cal but worth try­ing).

Providing Claude with a zoom tool and an un­der­stand­ing of how to use the screen­shots might be an ef­fec­tive path.

For now, this task stands un­de­feated. A mon­u­ment to 1996 web de­sign and a hum­bling re­minder that some­times the sim­plest tasks are the hard­est. That or­bital pat­tern of plan­ets, thrown to­gether by some Warner Brothers web­mas­ter 28 years ago, has be­come an in­ad­ver­tent bench­mark for Claude.

Until then, the Space Jam web­site re­mains proof that not every­thing old is ob­so­lete. Some things are just ir­re­pro­ducibly per­fect.

...

Read the original on j0nah.com »

9 215 shares, 8 trendiness

Bikeshedding, or why I want to build a laptop

I’m sure I’m not the only one who feels Apple’s qual­ity is de­grad­ing. I spend 10 hours a day on my lap­top and would spend any amount of money within rea­son for a bet­ter one. However, every­thing comes with trade­offs.

My dream lap­top is sim­ple, a MacBook with Linux, sup­ported by a com­pany that is user aligned.

The first idea is sim­ple, put Linux on a MacBook.

Asahi Linux is a good idea, how­ever, it won’t ever be good. Apple is putting more and more stuff into closed source mi­cro­con­trollers that have no doc­u­men­ta­tion. Like jail­break­ing, it may start off strong when peo­ple are ex­cited, but sup­port for the next gen­er­a­tion and that last bit of pol­ish won’t ever get there.

While it got some im­pres­sive stuff like psy­choa­coustic bass (works on other ma­chines too, I in­stalled this on my ZBook), it lacks DP Alt Mode, mean­ing you can’t plug in a USB-C mon­i­tor. I don’t fault the Asahi peo­ple, Apple uses cus­tom un­doc­u­mented hard­ware to man­age the USB ports, and re­vers­ing muxes seems bor­ing.

Additionally, like on al­most all Linux lap­tops, the power man­age­ment is bad. And even worse, there’s 0 doc­u­men­ta­tion from Apple on how to fix it, so de­spite it be­ing su­per good on ma­cOS, it’s one of the more an­noy­ing lap­tops to try to fix on Linux. At least if you have a lap­top with AMD or Intel there’s some docs on power states.

So with Apple out, we have to look for al­ter­na­tives. I like so much about Framework as a com­pany, straight­for­ward, open source ethos, but they aren’t build­ing the prod­uct I want.

I don’t care one bit about upgrad­abil­ity or cus­tomiz­abil­ity. After a year or two, I’m happy to throw it out and buy a new one. It’s not like upgrad­abil­ity is a bad thing, but it usu­ally comes with trade­offs to weight and power draw, and I’d rather it all be in one solid pack­age glued to­gether. And I don’t like cus­tomiz­abil­ity be­cause I like when all the test­ing and pol­ish work is put into one con­fig­u­ra­tion.

Perhaps the Framework 16 will im­press me; I should­n’t judge un­til I use it. But I see things like a re­quest for a touch­pad sin­gle unit so there’s not some ran­dom pieces of plas­tic dig­ging into my wrist just in case I want to move my touch­pad left or right. And I read some com­plaints about the rigid­ity, how can it be rigid if the mod­ules are at­tached with mag­nets? Engineering is all about trade-offs, and the trade-off I’d pre­fer is 0 upgrad­abil­ity or cus­tomiz­abil­ity in ex­change for less weight and more pol­ish.

The Framework 16 also has a Strix Point in­stead of a Strix Halo, and I hear the power draw is­n’t too much bet­ter on Point. Coming from an M3 Max, the Strix Halo is just barely ac­cept­able per­for­mance wise, I also own an Intel Core 7 155H and AMD Hawk Point. Those are not what I con­sider okay in a lap­top.

I’m typ­ing this blog on a HP ZBook Ultra G1a 14. Question to HP, who names this crap? Why do these com­pa­nies in­sist on hav­ing the most con­fus­ing prod­uct line­ups and names.

Are ZBooks good or do I want an OmniBook or ProBook? Within ZBook, is Ultra or Fury bet­ter? Do I want a G1a or a G1i? Oh you sell ZBook Firefly G11, I liked that TV show, is that one good?

Wait wait wait OMEN MAX 16z-ak000 has a lot of cap­i­tal let­ters, that one must be the best, right? But there’s also an HP EliteBook, Elite sounds like the best, do I still want a ZBook?

These are all real prod­ucts on HPs lap­top page.

Consumer elec­tron­ics nam­ing is very sim­ple. Make a good prod­uct with a sim­ple name. iPhone”, comma”, Z Fold”. Then every year or two, add one to the num­ber of that prod­uct. If it’s a small re­fresh, you can add a let­ter af­ter the num­ber. 2 3 3X 4” 4 4s 5 5s 6 …” 2 3 4 5 6 7”

Why is this so hard for com­pa­nies like HP?

If I made a lap­top, it would come in one con­fig­u­ra­tion. Call it the hack­book

Highest end Strix Halo part, which is the best mo­bile(ish) chip you can get out­side Apple. 16 core Zen 5 CPU, 40 core RDNA 3.5 GPU. 64GB of LPDDR5X RAM @ 256 GB/s. A stun­ning 16 inch OLED screen that’s the full size of the lap­top. A max size le­gal on planes 100 Wh bat­tery. Great sound with out of the box tuned psy­choa­coustic bass. Aluminium uni­body with just one bit of laser etched brand­ing where the Apple is, no other writ­ing on the lap­top. A classy key­board with­out weird lo­gos and ran­dom lights. An awe­some touch­pad; the ZBook touch­pad is ac­tu­ally fine, it’s not just Apple with good ones any­more.

Crazy fast boot times, amaz­ing power man­age­ment. Linux can be tuned so well if you care, and this tun­ing will be in­stalled on every one we sell. We sell one con­fig­u­ra­tion to all the best de­vel­op­ers in the world who want to not use a MacBook any­more. Apple will not un­der­stand what they had un­til they lose it, the only rea­son any­thing works on Mac at all is be­cause there’s 100,000 amaz­ing de­vel­op­ers who use these ma­chines every day; they put some work into mak­ing their house nice.

And when it’s time to up­grade in one or two years, we’ll have the hack­book two ready for you. The num­ber goes up by one, and you know which one to buy. For some rea­son peo­ple say I get dis­tracted, but comma has been around for ten years fol­low­ing this play­book; we now have a comma four for you. If I built one lap­top, I’d keep build­ing a lap­top for 10 years. With Apple’s de­cline and our rise, the hack­book four will be the first one that’s clearly bet­ter than a MacBook.

I’m writ­ing this blog post in hopes I don’t ac­tu­ally have to do this. I’m not re­ally go­ing to, there’s so many other things to do. This is just whin­ing and bikeshed­ding. Can some­body please build a good MacBook re­place­ment and make it a Schelling point every­one will switch to so I don’t have to think about this any­more?

...

Read the original on geohot.github.io »

10 203 shares, 10 trendiness

Discovering the indieweb with calm tech

Blog HomeBlog Quest and StreetPass help you dis­cover the in­de­pen­dent web

When so­cial me­dia first en­tered my life, it came with a promise of con­nec­tion. Facebook con­nected col­lege-aged adults in a way that was pre­vi­ously im­pos­si­ble, help­ing to shape our dig­i­tal gen­er­a­tion. Social me­dia was our su­per-power and we wielded it to great ef­fect.

Yet so­cial me­dia to­day is a noisy, needy, men­tal health haz­ard. They push dis­tract­ing no­ti­fi­ca­tions, con­stantly beg­ging us to like and sub­scribe”, and try­ing to trap us in end­less scrolling. They have be­come sirens that lure us into their ad-in­fested shores with their sac­cha­rine promise of dopamine.

How can we de­feat these mon­sters that have in­vaded deep into our world, while still stay­ing con­nected?

A cou­ple weeks ago I stum­bled into a great browser ex­ten­sion, StreetPass for Mastodon. The cre­ator, tvler, built it to help peo­ple find each other on Mastodon. StreetPass au­todis­cov­ers Mastodon ver­i­fi­ca­tion links as you browse the web, build­ing a col­lec­tion of Mastodon ac­counts from the blogs and per­sonal web­sites you’ve en­coun­tered.

StreetPass is a beau­ti­ful ex­am­ple of calm tech­nol­ogy . When StreetPass finds Mastodon pro­files it does­n’t draw your at­ten­tion with a no­ti­fi­ca­tion, it qui­etly adds the pro­file to a list, know­ing you’ll check in when you’re ready.

StreetPass rec­og­nizes that there’s no need for an im­me­di­ate call to ac­tion. Instead it al­lows the user to fo­cus on their brows­ing, en­rich­ing their ex­pe­ri­ence in the back­ground. The user en­gages with StreetPass when they are ready, and on their own terms.

StreetPass is open source and avail­able for Firefox, Chrome, and Safari.

Inspired by StreetPass, I ap­plied this tech­nique to RSS feed dis­cov­ery.

Blog Quest is a web browser ex­ten­sion that helps you dis­cover and sub­scribe to blogs. Blog Quest checks each page for auto-dis­cov­er­able RSS and Atom feeds (using rel=“al­ter­nate” links) and qui­etly col­lects them in the back­ground. When you’re ready to ex­plore the col­lected feeds, open the ex­ten­sion’s drop-down win­dow.

The ex­ten­sion in­te­grates with sev­eral feed read­ers, mak­ing sub­scrip­tion man­age­ment nearly ef­fort­less.

Blog Quest is avail­able for both Firefox and Chrome. The pro­ject is open source and I en­cour­age you to build your own vari­ants.

I re­ject the dead Internet the­ory: I see a vi­brant Internet full of hu­mans shar­ing their ex­pe­ri­ences and seek­ing con­nec­tion. Degradation of the en­gage­ment-dri­ven web is well un­der­way, ac­cel­er­ated by AI slop. But the in­de­pen­dent web works on a dif­fer­ent in­cen­tive struc­ture and is re­sis­tant to this ef­fect. Humans in­her­ently cre­ate, con­nect, and share: we al­ways have and we al­ways will. If you choose soft­ware that works in your in­ter­est you’ll find that it’s pos­si­ble to make mean­ing­ful on­line con­nec­tions with­out men­tal haz­ard.

Check out StreetPass and Blog Quest to dis­cover a de­cen­tral­ized, in­de­pen­dent Internet that puts you in con­trol.

Hello!

I’m Robert Alexander, a DevSecOps con­sul­tant

avail­able for con­tract work. This blog fea­tures some of my work and thoughts on soft­ware, the cloud, and se­cu­rity. You can sub­scribe to my posts

with your fa­vorite RSS client and

fol­low me on Mastodon.

Statements are my own and do not rep­re­sent the po­si­tions or opin­ions of my em­ployer.

...

Read the original on alexsci.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.