10 interesting stories served every morning and every evening.




1 1,835 shares, 73 trendiness

F-Droid - Free and Open Source Android App Repository

During our talks with F-Droid users at FOSDEM26 we were baf­fled to learn most were re­lieved that Google has can­celed their plans to lock-down Android.

Why baf­fled? Because no such thing ac­tu­ally hap­pened, the plans an­nounced last August are still sched­uled to take place. We see a bat­tle of PR cam­paigns and whomever has the last post out re­mains in the me­dia mem­ory as the truth, and hav­ing jour­nal­ists just copy/​paste Google posts serves no one.

But Google said… Said what? That there’s a mag­i­cal advanced flow”? Did you see it? Did any­one ex­pe­ri­ence it? When is it sched­uled to be re­leased? Was it part of Android 16 QPR2 in December? Of 16 QPR3 Beta 2.1 last week? Of Android 17 Beta 1? No? That’s the is­sue… As time marches on peo­ple were left with the im­pres­sion that every­thing was done, fixed, Google wasn’t evil” af­ter all, this time, yay!

While we all have bad mem­o­ries of banners” as the dreaded ad de­liv­ery medium of the Internet, af­ter FOSDEM we de­cided that we have to raise the is­sue back and have every­one, who cares about Android as an open plat­form, in­formed that we are run­ning out of time un­til Google be­comes the gate-keeper of all users de­vices.

Hence, the web­site and start­ing to­day our clients, with the up­dates of F-Droid and F-Droid Basic, fea­ture a ban­ner that re­minds every­one how lit­tle time we have and how to voice their con­cerns to what­ever lo­cal au­thor­ity is able to un­der­stand the dan­gers of this path Android is led to.

We are not alone in our fight, IzzyOnDroid added a ban­ner too, more F-Droid clients will add the warn­ing ban­ner soon and other app down­load­ers, like Obtainium, al­ready have an in-app warn­ing di­a­logue.

Regarding F-Droid Basic rewrite, de­vel­op­ment con­tin­ues with a new re­lease 2.0-alpha3:

Note that if you are al­ready us­ing F-Droid Basic ver­sion 1.23.x, you won’t re­ceive this up­date au­to­mat­i­cally. You need to nav­i­gate to the app in­side F-Droid and tog­gle Allow beta up­dates” in top right three dot menu.

In apps news, we’re slowly get­ting back on track with post Debian up­grade fixes (if your app still uses Java 17 is there a chance you can up­grade to 21?) and post FOSDEM de­lays. Every app is im­por­tant to us, yet ac­tions like the Google one above waste the time we could have put to bet­ter use in Gitlab.

Buses was up­dated to 1.10 af­ter a two year hia­tus.

Conversations and Quicksy were up­dated to 2.19.10+free im­prov­ing on clean­ing up af­ter banned users, a bet­ter QR work­flow and bet­ter tablet ro­ta­tion sup­port. These are nice, but an­other change raises our in­ter­est, Play Store fla­vor: Stop us­ing Google li­brary and in­ter­face di­rectly with Google Play Service via IPC. Sounds in­ter­est­ing for your app too? Is this a path to hav­ing one sin­gle ver­sion for both F-Droid and Play that is fully FLOSS? We don’t know yet, but we salute any trick that re­moves an­other pro­pri­etary de­pen­dency from the code. If cu­ri­ous feel free to take a look at the com­mit.

Dolphin Emulator was up­dated to 2512. We missed one ver­sion in be­tween so the changel­ogs are huge, luck­ily the devs pub­lish highly de­tailed posts about up­dates. So we’ll start with Release 2509” (about 40 mins to read), we side-track with Starlight Spotlight: A Hospital Wii in a New Light” (for about 50 mins), we con­tinue to the cur­rent re­lease in Release 2512” (40 more min­utes) and we fin­ish with Rise of the Triforce” delv­ing in his­tory for more than one hour.

Image Toolbox was up­dated to 3.6.1 adding many fixes and… some AI tools. Were you ex­pect­ing such helpers? Will you use them?

Luanti was up­dated to 5.15.1 adding some wel­comed fixes. If your game world started flick­er­ing af­ter the last up­date make sure to up­date.

Nextcloud apps are get­ting an up­date al­most every week, like Nextcloud was up­dated to 33.0.0, Nextcloud Cookbook to 0.27.0, Nextcloud Dev to 20260219, Nextcloud Notes to 33.0.0 and Nextcloud Talk was up­dated to 23.0.0.

But are you fol­low­ing the server side too? Nextcloud Hub 26 Winter was just re­leased adding a plethora of fea­tures. If you want to read about them, see the 30 min­utes post here or watch the one hour long video pre­sen­ta­tion from the team here.

ProtonVPN - Secure and Free VPN was up­dated to 5.15.70.0 adding more con­trol to auto-con­nects, coun­tries and cities. Also all con­nec­tions are han­dled now by WireGuard and Stealth pro­to­cols as the older OpenVPN was re­moved mak­ing the app al­most 40% smaller.

Offi was up­dated to 14.0 with a bit of code pol­ish. Unfortunately for Android 7 users, the app now needs Android 8 or later.

QUIK SMS was up­dated to 4.3.4 with many fixes. But Vishal praised the du­pli­cate re­mover, the de­fault auto de-du­pli­ca­tion func­tion and found that the bug that made deleted mes­sages reap­pear is fixed.

SimpleEmail was up­dated to 1.5.4 af­ter a 2 year pause. It’s just a fixes re­lease, up­dat­ing trans­la­tions and mak­ing the app com­pat­i­ble with Android 12 and later ver­sions.

* NeoDB You: A na­tive Android app for NeoDB de­signed with Material 3/You

Thank you for read­ing this week’s TWIF 🙂

Please sub­scribe to the RSS feed in your favourite RSS ap­pli­ca­tion to be up­dated of new TWIFs when they come up.

You are wel­come to join the TWIF fo­rum thread. If you have any news from the com­mu­nity, post it there, maybe it will be fea­tured next week 😉

To help sup­port F-Droid, please check out the do­na­tion page and con­tribute what you can.

...

Read the original on f-droid.org »

2 1,447 shares, 48 trendiness

Trump raises global tariffs to 15%, day after Supreme Court ruling

US President Donald Trump has an­nounced that the US will raise global tar­iffs to 15%.

This is an in­crease from the 10% rate an­nounced on Friday, when the pres­i­dent in­voked a never-be­fore-used law known

as Section 122 af­ter the Supreme Court struck down his pre­vi­ous tar­iffs with a 6-3 ma­jor­ity.

The law, which falls un­der the 1974 Trade Act, gives Trump the power to put in place tar­iffs up to a max­i­mum of 15% for 150 days, at which point Congress must step in.

Trump has called the Supreme Court’s de­ci­sion ridiculous” and extraordinarily anti-Amer­i­can”.

Some law­mak­ers are ques­tion­ing the pres­i­den­t’s de­ci­sion to con­tinue the levies, with Democratic con­gress­man Ted Lieu say­ing Trump is tak­ing out his anger to­wards the top court on Americans. These tem­po­rary tar­iffs will be chal­lenged in court and Democrats will kill them when they ex­pire,” he writes on X.

American al­lies have also weighed in on the changes, with German Chancellor Friedrich Merz warn­ing about the un­cer­tainty they bring the global econ­omy. Meanwhile, the UK says it ex­pects to re­tain its privileged trad­ing po­si­tion with the US.

We’ve wrap­ping up our live cov­er­age for now, but you can

read more in our news ar­ti­cle.

...

Read the original on www.bbc.com »

3 1,379 shares, 62 trendiness

How I built Timeframe, our family e-paper dashboard

TL;DR: Over the past decade, I’ve worked to build the per­fect fam­ily dash­board sys­tem for our home, called Timeframe. Combining cal­en­dar, weather, and smart home data, it’s be­come an im­por­tant part of our daily lives.

When Caitlin and I got mar­ried a decade ago, we set an in­ten­tion to have a healthy re­la­tion­ship with tech­nol­ogy in our home. We kept our bed­room free of any screens, charg­ing our de­vices else­where overnight. But we missed our cal­en­dar and weather apps.

So I set out to build a so­lu­tion to our prob­lem. First, I con­structed a Magic Mirror us­ing an off-the-shelf med­i­cine cab­i­net and LCD dis­play with its frame re­moved. It showed the cal­en­dar and weather data we needed:

But it was hard to read the text, es­pe­cially dur­ing the day as we get sig­nif­i­cant nat­ural light in Colorado. At night, it glowed like any back­lit dis­play, stick­ing out sorely in our liv­ing space.

I then spent about a year ex­per­i­ment­ing with var­i­ous jail­bro­ken Kindle de­vices, even­tu­ally land­ing on de­sign with cal­en­dar and weather data on a pair of screens. The Kindles took a few sec­onds to re­fresh and flash the screen to re­set the ink pix­els, so they only up­dated every half hour. I de­signed wood en­clo­sures and laser-cut them at the lo­cal li­brary mak­er­space:

Software-wise, I built a Ruby on Rails app for fetch­ing the nec­es­sary data from Google Calendar and Dark Sky. The Kindles woke up on a sched­ule, load­ing a URL in the app that ren­dered a PNG us­ing IMGKit. The pro­to­type proved e-pa­per was the right so­lu­tion: it was un­ob­tru­sive re­gard­less of light­ing:

The Kindles were a hack, re­quir­ing con­stant tin­ker­ing to keep them work­ing. It was time for a more re­li­able so­lu­tion. I tried an OLED screen to see if the lack of a global back­light would be less dis­tract­ing, but it was­n’t much bet­ter than the Magic Mirror:

So it was back to e-pa­per. I found a sys­tem of dis­plays from Visionect, which came in 6”/10”/13”/32” sizes and could up­date every ten min­utes for 2-3 months on a sin­gle charge:

The 32” screen used an out­dated lower-con­trast panel and its res­o­lu­tion was too low to ren­der text smoothly. The smaller sizes used a con­trasty, high-PPI panel. I ended up us­ing a com­bi­na­tion of them around the house: a 6” in the mud­room for the weather, a 13” (with its built-in mag­netic back­ing) in the kitchen at­tached to the side of the fridge, and a 10” in the bed­room.

The Visionect dis­plays re­quired run­ning cus­tom closed-source soft­ware, ei­ther as a SaaS or lo­cally with Docker. I opted for a lo­cal in­stal­la­tion on the Raspberry Pi al­ready run­ning the Rails back­end. I had my best re­sults push­ing im­ages to the Visionect dis­plays every five min­utes in a re­cur­ring back­ground job. It used IMGKit to gen­er­ate a PNG and send it to the Visionect API, logic I ex­tracted into vi­sionect-ruby. This setup proved to be in­cred­i­bly re­li­able, with­out a sin­gle fail­ure for months at a time.

Visiting friends of­ten asked how they could have a sim­i­lar sys­tem in their home. Three years af­ter the ini­tial pro­to­type, I did my first mar­ket test with a po­ten­tial cus­tomer. At their re­quest, I ex­per­i­mented with dif­fer­ent for­mats, in­clud­ing a month view on the 13” screen:

Unfortunately, the cus­tomer did­n’t see enough value to jus­tify the $1000 price tag (in 2019!) for the 13” de­vice, let alone any­thing I’d charge for a sub­scrip­tion ser­vice. At around the same time, Visionect started charg­ing a $7/mo per-de­vice fee to run their back­end soft­ware on premises with Docker, af­ter years of it be­ing free to use. I’d have needed to charge $10/month, if not more, for a sin­gle screen!

In late 2021, the Marshall Fire de­stroyed our home along with ~1,000 oth­ers. Our home­own­er’s in­sur­ance gave us two years to re­build, so we set off to re­design our home from the ground up.

Around the same time, Boox re­leased the 25.3” Mira Pro, the first high-res­o­lu­tion op­tion for large e-pa­per screens. Best of all, it could up­date in re­al­time! Unlike the Visionect de­vices, it was just a dis­play with an HDMI port and needed to be plugged into power. A quick pro­to­type pow­ered by an old Mac Mini made it im­me­di­ately ob­vi­ous that it was a huge step for­ward in ca­pa­bil­ity. The larger screen al­lowed for sig­nif­i­cantly more in­for­ma­tion to be dis­played:

But the most com­pelling in­no­va­tion was hav­ing the screen up­date in re­al­time. I added a clock, the cur­rent song play­ing on our Sonos sys­tem (using jishi/​node-sonos-http-api) and the next-hour pre­cip­i­ta­tion fore­cast from Dark Sky:

The work­ing pro­to­type was enough to con­vince me to build a place for it in the new house. We de­signed a phone nook” on our main floor with an art light for the dis­play:

We also ran power to two more lo­ca­tions for 13” Visionect dis­plays, one in our bed­room and one by the door to our garage:

The real-time re­quire­ments of the Mira Pro im­me­di­ately sur­faced per­for­mance and com­plex­ity is­sues in the back­end, prompt­ing an al­most com­plete rewrite.

While the Visionect sys­tem worked just fine with mul­ti­ple-sec­ond re­sponse times, switch­ing to long-polling every two sec­onds put a ceil­ing on how slow re­sponse times could be. To start, I moved away from gen­er­at­ing im­ages. The Visionect folks added the abil­ity to ren­der a URL di­rectly in the back­end, free­ing up re­sources to serve the long-polling re­quests.

Most sig­nif­i­cantly, I started mi­grat­ing to­wards Home Assistant (HA) as the pri­mary data source. HA al­ready had in­te­gra­tions for Google Calendar, Dark Sky (now Apple Weather), and Sonos, en­abling me to re­move over half of the code in the Timeframe code­base! I ended up land­ing a PR to Home Assistant to al­low for the cal­en­dar be­hav­ior I needed, and will prob­a­bly need to write a cou­ple more be­fore HA can be the sole data source.

With less data-fetch­ing logic, I was able to re­move both the data­base and Redis from the Rails ap­pli­ca­tion, a mas­sive re­duc­tion in com­plex­ity. I now run the back­ground tasks with Rufus Scheduler and save data fetch­ing re­sults with the Rails file store cache back­end.

In ad­di­tion to data re­trieval, I’ve also worked to move as much of the ap­pli­ca­tion logic into Home Assistant. I now au­to­mat­i­cally dis­play the sta­tus of any sen­sor that be­gins with sen­sor.time­frame, us­ing a sim­ple ICON,Label CSV for­mat.

For ex­am­ple, the other day I wanted to have a re­minder to start or sched­ule our dish­washer af­ter 8pm if it was­n’t set to run. It took me about a minute to write a tem­plate sen­sor us­ing the power level from the out­let:

{% if states(‘sen­sor.kitchen_dish­wash­er_switched_out­let_pow­er’)|float < 2 and now().hour > 19 %}

uten­sils,Run the dish­washer!

{% en­dif %}

In the month since adding the helper, it re­minded me twice when I’d have oth­er­wise for­got­ten. And I did­n’t have to com­mit or de­ploy any code!

Since mov­ing into our new home, we’ve come to rely on the real-time func­tion­al­ity much more sig­nif­i­cantly. Effectively, we’ve turned the top-left cor­ner of the dis­plays into a sta­tus in­di­ca­tor for the house. For ex­am­ple, it shows what doors are open/​un­locked:

Or whether the laun­dry is done:

It has a pow­er­ful func­tion: if the sta­tus on the dis­play is blank, the house is in a healthy” state and does not need any at­ten­tion. This ap­proach of only show­ing what in­for­ma­tion is rel­e­vant in a given mo­ment flies right in the face of how most smart homes ap­proach com­mu­ni­cat­ing their sta­tus:

The sin­gle sta­tus in­di­ca­tor re­moves the need to scan an en­tire screen. This change in ap­proach is pos­si­ble be­cause of one key dif­fer­ence: we have sep­a­rated the con­trol of our de­vices from the dis­play of their sta­tus.

I con­tinue to re­ceive sig­nif­i­cant in­ter­est in the pro­ject and re­main fo­cused on bring­ing it to mar­ket. A few key is­sues re­main:

While I have made sig­nif­i­cant progress in han­dling run­time er­rors grace­fully, I have plenty to learn about cre­at­ing em­bed­ded sys­tems that do not need main­te­nance.

There are still sev­eral data sources I fetch di­rectly out­side of Home Assistant. Once HA is the sole source of data, I’ll be able to have Timeframe be a Home Assistant App, mak­ing it sig­nif­i­cantly eas­ier to dis­trib­ute.

The cur­rent hard­ware setup is not ready for adop­tion by the av­er­age con­sumer. The 25” Boox dis­play is ex­cel­lent but costs about $2000! It also does­n’t in­clude the hard­ware needed to drive the dis­play. There are a cou­ple of po­ten­tial op­tions to con­sider, such as Android-powered de­vices from Boox and Philips or low-cost op­tions from TRMNL.

Building Timeframe con­tin­ues to be a pas­sion of mine. While my day job has me build­ing soft­ware for over a hun­dred mil­lion peo­ple, it’s re­fresh­ing to work on a pro­ject that im­proves my fam­i­ly’s daily life.

...

Read the original on hawksley.org »

4 1,367 shares, 53 trendiness

Facebook is absolutely cooked

And I don’t just mean that no­body uses it any­more. Like, I knew every­one un­der 50 had moved on, but I did­n’t re­al­ize the ex­tent of the slop con­veyor belt that’s re­placed us.

I logged on for the first time in ~8 years to see if there was a group for my neigh­bor­hood (there was­n’t). Out of cu­rios­ity I thought I’d scroll a bit down the main feed.

The first post was the lat­est xkcd (a page I fol­low). The next ten posts were not by friends or pages I fol­low. They were ba­si­cally all thirst traps of young women, mostly AI-generated, with generic cap­tions. Here’s a sam­pler — mildly NSFW, but I did leave out a cou­ple of the lewder ones:

Yikes. Again, I don’t fol­low any of these pages. This is all just what Facebook is push­ing on me.

I know Twitter/X has worse prob­lems with spam bots in the replies, but this is the News Feed! It’s the main page of the site! It’s the prod­uct that de­fined mod­ern so­cial me­dia!

It was­n’t all like that, though. There was also an AI video of a po­lice­man con­fis­cat­ing a lit­tle boy’s bike, only to bring him a brand new one:

And there were some sloppy memes and jokes, mostly about re­la­tion­ships, like this (admittedly not AI) video sketch where a woman de­cides to in­ten­tion­ally start a fight with her boyfriend be­cause she’s on her pe­riod:

Maybe that is­n’t lit­er­ally about sex, but I’d clas­sify it as the same sort of lizard-brain-rot en­gage­ment bait as those self­ies.

Several com­menters have vouched that Yoleendadong makes funny, high-qual­ity con­tent and should­n’t be lumped in with AI slop. I’m just say­ing I think there’s a rea­son this par­tic­u­lar video of hers popped up, and it’s prob­a­bly the kind of en­gage­ment cre­ated by the premise.

Meta even gives us some help­ful ideas for sex­ist ques­tions we can ask their AI about the video:

Yep, that’s an­other yikes” from me. To be fair, though, some­times that sug­gested ques­tions fea­ture is pretty use­ful! Like with this post, for ex­am­ple:

Why is she wear­ing pink heels? What is her per­son­al­ity? Great ques­tions, Meta.

I said these were mostly” AI-generated. The truth is with how good the mod­els are get­ting these days, it’s hard to tell, and I think a cou­ple of them might be real peo­ple.

Still, some of these are pretty ob­vi­ously AI. Here’s one with a bunch of alien text and man­gled lo­gos on the score­board in the back­ground:

Hmm, I won­der if any­one has no­ticed this is AI? Let’s check out the com­ments and see if any­one’s pointed that ou—

…never mind. (I dunno, maybe those are all bots too.)

So: is this just some­thing wacky with my al­go­rithm?

I mean… maybe? That’s part of the whole thing with these al­go­rith­mic feeds; it’s hard to know if any­one else is see­ing what I’m see­ing.

On the one hand, I doubt most (straight) wom­en’s feeds would look like this. But on the other hand, I had­n’t logged in in nearly a decade! I hate to think what the feed looks like for some lonely old guy who’s been scrolling the lightly-clothed AI gooni­verse for hours every day.

Did every­one but me know it was like this? I’d seen screen­caps of stuff like the Jesus-statue-made-out-of-broccoli slop a year or two ago, but I thought that only hap­pened to grand­mas. I had­n’t heard it was this bad.

I won­der if this evo­lu­tion was less no­tice­able for peo­ple who are log­ging in every day. Or maybe it only gets this bad when there aren’t any posts from your ac­tual friends?

In any case, I stopped ex­plor­ing af­ter I saw a cou­ple more of those AI-generated pic­tures but with girls that looked like they were about ~14, which made me sick to my stom­ach. So long Facebook, see you never, un­til one day I in­ex­plic­a­bly need to use your plat­form to get up­dates from my kid’s school.

...

Read the original on pilk.website »

5 1,270 shares, 46 trendiness

Introducing Sonnet 4.6

Claude Sonnet 4.6 is our most ca­pa­ble Sonnet model yet. It’s a full up­grade of the mod­el’s skills across cod­ing, com­puter use, long-con­text rea­son­ing, agent plan­ning, knowl­edge work, and de­sign. Sonnet 4.6 also fea­tures a 1M to­ken con­text win­dow in beta. For those on our Free and Pro plans, Claude Sonnet 4.6 is now the de­fault model in claude.ai and Claude Cowork. Pricing re­mains the same as Sonnet 4.5, start­ing at $3/$15 per mil­lion to­kens.Son­net 4.6 brings much-im­proved cod­ing skills to more of our users. Improvements in con­sis­tency, in­struc­tion fol­low­ing, and more have made de­vel­op­ers with early ac­cess pre­fer Sonnet 4.6 to its pre­de­ces­sor by a wide mar­gin. They of­ten even pre­fer it to our smartest model from November 2025, Claude Opus 4.5.Performance that would have pre­vi­ously re­quired reach­ing for an Opus-class model—in­clud­ing on real-world, eco­nom­i­cally valu­able of­fice tasks—is now avail­able with Sonnet 4.6. The model also shows a ma­jor im­prove­ment in com­puter use skills com­pared to prior Sonnet mod­els.As with every new Claude model, we’ve run ex­ten­sive safety eval­u­a­tions of Sonnet 4.6, which over­all showed it to be as safe as, or safer than, our other re­cent Claude mod­els. Our safety re­searchers con­cluded that Sonnet 4.6 has a broadly warm, hon­est, proso­cial, and at times funny char­ac­ter, very strong safety be­hav­iors, and no signs of ma­jor con­cerns around high-stakes forms of mis­align­ment.”Al­most every or­ga­ni­za­tion has soft­ware it can’t eas­ily au­to­mate: spe­cial­ized sys­tems and tools built be­fore mod­ern in­ter­faces like APIs ex­isted. To have AI use such soft­ware, users would pre­vi­ously have had to build be­spoke con­nec­tors. But a model that can use a com­puter the way a per­son does changes that equa­tion.In October 2024, we were the first to in­tro­duce a gen­eral-pur­pose com­puter-us­ing model. At the time, we wrote that it was still ex­per­i­men­tal—at times cum­ber­some and er­ror-prone,” but we ex­pected rapid im­prove­ment. OSWorld, the stan­dard bench­mark for AI com­puter use, shows how far our mod­els have come. It pre­sents hun­dreds of tasks across real soft­ware (Chrome, LibreOffice, VS Code, and more) run­ning on a sim­u­lated com­puter. There are no spe­cial APIs or pur­pose-built con­nec­tors; the model sees the com­puter and in­ter­acts with it in much the same way a per­son would: click­ing a (virtual) mouse and typ­ing on a (virtual) key­board.Across six­teen months, our Sonnet mod­els have made steady gains on OSWorld. The im­prove­ments can also be seen be­yond bench­marks: early Sonnet 4.6 users are see­ing hu­man-level ca­pa­bil­ity in tasks like nav­i­gat­ing a com­plex spread­sheet or fill­ing out a multi-step web form, be­fore pulling it all to­gether across mul­ti­ple browser tabs.The model cer­tainly still lags be­hind the most skilled hu­mans at us­ing com­put­ers. But the rate of progress is re­mark­able nonethe­less. It means that com­puter use is much more use­ful for a range of work tasks—and that sub­stan­tially more ca­pa­ble mod­els are within reach.Scores prior to Claude Sonnet 4.5 were mea­sured on the orig­i­nal OSWorld; scores from Sonnet 4.5 on­ward use OSWorld-Verified. OSWorld-Verified (released July 2025) is an in-place up­grade of the orig­i­nal OSWorld bench­mark, with up­dates to task qual­ity, eval­u­a­tion grad­ing, and in­fra­struc­ture.At the same time, com­puter use poses risks: ma­li­cious ac­tors can at­tempt to hi­jack the model by hid­ing in­struc­tions on web­sites in what’s known as a prompt in­jec­tion at­tack. We’ve been work­ing to im­prove our mod­els’ re­sis­tance to prompt in­jec­tions—our safety eval­u­a­tions show that Sonnet 4.6 is a ma­jor im­prove­ment com­pared to its pre­de­ces­sor, Sonnet 4.5, and per­forms sim­i­larly to Opus 4.6. You can find out more about how to mit­i­gate prompt in­jec­tions and other safety con­cerns in our API docs.Be­yond com­puter use, Claude Sonnet 4.6 has im­proved on bench­marks across the board. It ap­proaches Opus-level in­tel­li­gence at a price point that makes it more prac­ti­cal for far more tasks. You can find a full dis­cus­sion of Sonnet 4.6’s ca­pa­bil­i­ties and its safety-re­lated be­hav­iors in our sys­tem card; a sum­mary and com­par­i­son to other re­cent mod­els is be­low.In Claude Code, our early test­ing found that users pre­ferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. Users re­ported that it more ef­fec­tively read the con­text be­fore mod­i­fy­ing code and con­sol­i­dated shared logic rather than du­pli­cat­ing it. This made it less frus­trat­ing to use over long ses­sions than ear­lier mod­els.Users even pre­ferred Sonnet 4.6 to Opus 4.5, our fron­tier model from November, 59% of the time. They rated Sonnet 4.6 as sig­nif­i­cantly less prone to ov­erengi­neer­ing and laziness,” and mean­ing­fully bet­ter at in­struc­tion fol­low­ing. They re­ported fewer false claims of suc­cess, fewer hal­lu­ci­na­tions, and more con­sis­tent fol­low-through on multi-step tasks.Son­net 4.6’s 1M to­ken con­text win­dow is enough to hold en­tire code­bases, lengthy con­tracts, or dozens of re­search pa­pers in a sin­gle re­quest. More im­por­tantly, Sonnet 4.6 rea­sons ef­fec­tively across all that con­text. This can make it much bet­ter at long-hori­zon plan­ning. We saw this par­tic­u­larly clearly in the Vending-Bench Arena eval­u­a­tion, which tests how well a model can run a (simulated) busi­ness over time—and which in­cludes an el­e­ment of com­pe­ti­tion, with dif­fer­ent AI mod­els fac­ing off against each other to make the biggest prof­its.Son­net 4.6 de­vel­oped an in­ter­est­ing new strat­egy: it in­vested heav­ily in ca­pac­ity for the first ten sim­u­lated months, spend­ing sig­nif­i­cantly more than its com­peti­tors, and then piv­oted sharply to fo­cus on prof­itabil­ity in the fi­nal stretch. The tim­ing of this pivot helped it fin­ish well ahead of the com­pe­ti­tion.Son­net 4.6 out­per­forms Sonnet 4.5 on Vending-Bench Arena by in­vest­ing in ca­pac­ity early, then piv­ot­ing to prof­itabil­ity in the fi­nal stretch.Early cus­tomers also re­ported broad im­prove­ments, with fron­tend code and fi­nan­cial analy­sis stand­ing out. Customers in­de­pen­dently de­scribed vi­sual out­puts from Sonnet 4.6 as no­tably more pol­ished, with bet­ter lay­outs, an­i­ma­tions, and de­sign sen­si­bil­ity than those from pre­vi­ous mod­els. Customers also needed fewer rounds of it­er­a­tion to reach pro­duc­tion-qual­ity re­sults.Claude Sonnet 4.6 matches Opus 4.6 per­for­mance on OfficeQA, which mea­sures how well a model can read en­ter­prise doc­u­ments (charts, PDFs, ta­bles), pull the right facts, and rea­son from those facts. It’s a mean­ing­ful up­grade for doc­u­ment com­pre­hen­sion work­loads.The per­for­mance-to-cost ra­tio of Claude Sonnet 4.6 is ex­tra­or­di­nary—it’s hard to over­state how fast Claude mod­els have been evolv­ing in re­cent months. Sonnet 4.6 out­per­forms on our or­ches­tra­tion evals, han­dles our most com­plex agen­tic work­loads, and keeps im­prov­ing the higher you push the ef­fort set­tings.Claude Sonnet 4.6 is a no­table im­prove­ment over Sonnet 4.5 across the board, in­clud­ing long-hori­zon tasks and more dif­fi­cult prob­lems.Out of the gate, Claude Sonnet 4.6 is al­ready ex­celling at com­plex code fixes, es­pe­cially when search­ing across large code­bases is es­sen­tial. For teams run­ning agen­tic cod­ing at scale, we’re see­ing strong res­o­lu­tion rates and the kind of con­sis­tency de­vel­op­ers need.Claude Sonnet 4.6 has mean­ing­fully closed the gap with Opus on bug de­tec­tion, let­ting us run more re­view­ers in par­al­lel, catch a wider va­ri­ety of bugs, and do it all with­out in­creas­ing cost.For the first time, Sonnet brings fron­tier-level rea­son­ing in a smaller and more cost-ef­fec­tive form fac­tor. It pro­vides a vi­able al­ter­na­tive if you are a heavy Opus user.Claude Sonnet 4.6 mean­ing­fully im­proves the an­swer re­trieval be­hind our core prod­uct—we saw a sig­nif­i­cant jump in an­swer match rate com­pared to Sonnet 4.5 in our Financial Services Benchmark, with bet­ter re­call on the spe­cific work­flows our cus­tomers de­pend on.Box eval­u­ated how Claude Sonnet 4.6 per­forms when tested on deep rea­son­ing and com­plex agen­tic tasks across real en­ter­prise doc­u­ments. It demon­strated sig­nif­i­cant im­prove­ments, out­per­form­ing Claude Sonnet 4.5 in heavy rea­son­ing Q&A by 15 per­cent­age points.Claude Sonnet 4.6 hit 94% on our in­sur­ance bench­mark, mak­ing it the high­est-per­form­ing model we’ve tested for com­puter use. This kind of ac­cu­racy is mis­sion-crit­i­cal to work­flows like sub­mis­sion in­take and first no­tice of loss.Claude Sonnet 4.6 de­liv­ers fron­tier-level re­sults on com­plex app builds and bug-fix­ing. It’s be­com­ing our go-to for the kind of deep code­base work that used to re­quire more ex­pen­sive mod­els.Claude Sonnet 4.6 pro­duced the best iOS code we’ve tested for Rakuten AI. Better spec com­pli­ance, bet­ter ar­chi­tec­ture, and it reached for mod­ern tool­ing we did­n’t ask for, all in one shot. The re­sults gen­uinely sur­prised us.

Sonnet 4.6 is a sig­nif­i­cant leap for­ward on rea­son­ing through dif­fi­cult tasks. We find it es­pe­cially strong on branched and multi-step tasks like con­tract rout­ing, con­di­tional tem­plate se­lec­tion, and CRM co­or­di­na­tion—ex­actly where our cus­tomers need strong model sense and re­li­a­bil­ity.We’ve been im­pressed by how ac­cu­rately Claude Sonnet 4.6 han­dles com­plex com­puter use. It’s a clear im­prove­ment over any­thing else we’ve tested in our evals.Claude Sonnet 4.6 has per­fect de­sign taste when build­ing fron­tend pages and data re­ports, and it re­quires far less hand-hold­ing to get there than any­thing we’ve tested be­fore.Claude Sonnet 4.6 was ex­cep­tion­ally re­spon­sive to di­rec­tion — de­liv­er­ing pre­cise fig­ures and struc­tured com­par­isons when asked, while also gen­er­at­ing gen­uinely use­ful ideas on trial strat­egy and ex­hibit prepa­ra­tion.On the Claude Developer Platform, Sonnet 4.6 sup­ports both adap­tive think­ing and ex­tended think­ing, as well as con­text com­paction in beta, which au­to­mat­i­cally sum­ma­rizes older con­text as con­ver­sa­tions ap­proach lim­its, in­creas­ing ef­fec­tive con­text length.On our API, Claude’s web search and fetch tools now au­to­mat­i­cally write and ex­e­cute code to fil­ter and process search re­sults, keep­ing only rel­e­vant con­tent in con­text—im­prov­ing both re­sponse qual­ity and to­ken ef­fi­ciency. Additionally, code ex­e­cu­tion, mem­ory, pro­gram­matic tool call­ing, tool search, and tool use ex­am­ples are now gen­er­ally avail­able.Son­net 4.6 of­fers strong per­for­mance at any think­ing ef­fort, even with ex­tended think­ing off. As part of your mi­gra­tion from Sonnet 4.5, we rec­om­mend ex­plor­ing across the spec­trum to find the ideal bal­ance of speed and re­li­able per­for­mance, de­pend­ing on what you’re build­ing.We find that Opus 4.6 re­mains the strongest op­tion for tasks that de­mand the deep­est rea­son­ing, such as code­base refac­tor­ing, co­or­di­nat­ing mul­ti­ple agents in a work­flow, and prob­lems where get­ting it just right is para­mount.For Claude in Excel users, our add-in now sup­ports MCP con­nec­tors, let­ting Claude work with the other tools you use day-to-day, like S&P Global, LSEG, Daloopa, PitchBook, Moody’s, and FactSet. You can ask Claude to pull in con­text from out­side your spread­sheet with­out ever leav­ing Excel. If you’ve al­ready set up MCP con­nec­tors in Claude.ai, those same con­nec­tions will work in Excel au­to­mat­i­cally. This is avail­able on Pro, Max, Team, and Enterprise plans.How to use Claude Sonnet 4.6Claude Sonnet 4.6 is avail­able now on all Claude plans, Claude Cowork, Claude Code, our API, and all ma­jor cloud plat­forms. We’ve also up­graded our free tier to Sonnet 4.6 by de­fault—it now in­cludes file cre­ation, con­nec­tors, skills, and com­paction.If you’re a de­vel­oper, you can get started quickly by us­ing claude-son­net-4-6 via the Claude API.

...

Read the original on www.anthropic.com »

6 1,116 shares, 44 trendiness

break free from Google and Apple [ENG 🇬🇧]

🇬🇧->🇵🇱 Przejdź do pol­skiej wer­sji tego wpisu / Go to pol­ish ver­sion of this post

Just a year ago, I was re­ally deep into the Apple ecosys­tem. It seemed like there was no turn­ing back from the or­chard for me. Phone, lap­top, watch, tablet, video and mu­sic stream­ing, cloud stor­age, and even a key tracker. All from one man­u­fac­turer. Plus shared fam­ily photo al­bums, cal­en­dars, and even shop­ping lists.

However, at some point, I dis­cov­ered Plenti, a com­pany that rents a re­ally wide range of dif­fer­ent de­vices at quite rea­son­able prices. Casually, I threw the phrase samsung fold” into the search en­gine on their web­site and it turned out that the Samsung Galaxy Z Fold 6 could be rented for just 250-300 PLN per month. That was quite an in­ter­est­ing op­tion, as I was in­sanely cu­ri­ous about how it is to live with a fold­able phone, which af­ter un­fold­ing be­comes the equiv­a­lent of a tablet. Plus, I would never dare to buy this type of de­vice, be­cause firstly, their price is as­tro­nom­i­cal, and sec­ondly, I have se­ri­ous doubts about the longevity of the fold­ing screen. I checked the rental con­di­tions from Plenti and noth­ing raised my sus­pi­cions. Renting seemed like a re­ally cool op­tion, so I de­cided to get the Fold 6 for half a year. That’s how I broke out of the or­chard and slightly re­opened the doors to my heart for so­lu­tions with­out the ap­ple logo. I even wrote a post about the whole process - I be­trayed #TeamApple for bro­ken phone. What I’m get­ting at is that this is how Android re­turned to my liv­ing room and I think I started lik­ing it anew.

My ad­ven­ture with Samsung ended af­ter the planned 6 months. The Galaxy Z Fold 6 is a good phone, and the abil­ity to un­fold it to the size of a tablet is an amaz­ing fea­ture. However, what both­ered me about it was:

pay­ing 300 PLN (~80 USD) for rent is a good short-term so­lu­tion to get some­thing to test, but not in the long run.

All the points above made me give up on ex­tend­ing the rental and start won­der­ing what to do next. Interestingly, I liked Android enough that I did­n’t nec­es­sar­ily want to go back to iOS. Around this time, an ar­ti­cle hit my RSS reader: Creators of the most se­cure ver­sion of Android fear France. Travel ban for the whole team (I think it was this one, but I’m not en­tirely sure, it does­n’t re­ally mat­ter). It talked about how France wants to get its hands on the GrapheneOS sys­tem and thus carry out a very se­ri­ous at­tack on the pri­vacy of its users. I thought then, Hey! A European coun­try wants to force a back­door into the sys­tem, be­cause it is too well se­cured to sur­veil its users. Either this is ar­ti­fi­cially blow­ing the topic out of pro­por­tion, or there is ac­tu­ally some­thing spe­cial about this sys­tem!”. At that mo­ment, a some­what for­got­ten nerd gene ig­nited in me. I de­cided to aban­don not only iOS, but also main­stream Android, and try a com­pletely al­ter­na­tive sys­tem.

GrapheneOS is a cus­tom, open-source op­er­at­ing sys­tem de­signed with the idea of pro­vid­ing users with the high­est level of pri­vacy and se­cu­rity. It is based on the Android Open Source Project (AOSP), but dif­fers sig­nif­i­cantly from stan­dard soft­ware ver­sions found in smart­phones. Its cre­ators com­pletely elim­i­nated in­te­gra­tion with Google ser­vices at the sys­tem level, which avoids track­ing and data col­lec­tion by cor­po­ra­tions, while of­fer­ing a mod­ern and sta­ble work­ing en­vi­ron­ment.

The sys­tem is dis­tin­guished by ad­vanced hardening” of the ker­nel and key com­po­nents, which min­i­mizes vul­ner­a­bil­ity to hack­ing at­tacks and ex­ploits. A unique fea­ture of GrapheneOS is the abil­ity to run Google Play Services in an iso­lated en­vi­ron­ment (sandbox), al­low­ing the user to use pop­u­lar ap­pli­ca­tions with­out grant­ing them broad sys­tem per­mis­sions. Currently, the pro­ject fo­cuses on sup­port­ing Google Pixel se­ries phones, uti­liz­ing their ded­i­cated Titan M se­cu­rity chips for full data pro­tec­tion.

When I used to read about GrapheneOS, the list of com­pat­i­ble de­vices in­cluded items from sev­eral dif­fer­ent man­u­fac­tur­ers. Now it’s only Google Pixel de­vices. This does­n’t mean you can’t run this sys­tem on a Samsung, for ex­am­ple, but the cre­ators sim­ply don’t guar­an­tee it will work prop­erly, and you have to deal with po­ten­tially port­ing the ver­sion your­self. Note that it’s quite funny that a sys­tem freed from Google ser­vices should be run ex­actly on Google de­vices. If any­one wants to read more about why Pixels are the best for GrapheneOS, I rec­om­mend check­ing out the fol­low­ing key­words - Verified Boot, Titan M, IOMMU, MTE.

At the stage of choos­ing a de­vice to test GrapheneOS on, I was­n’t yet sure if such a so­lu­tion would work for me at all and if I’d last with it in the long run. So it would be un­rea­son­able to lay out a sig­nif­i­cant amount of money. Because of this, prob­a­bly the only sen­si­ble choice was the Google Pixel 9a. This was a few months ago, when not enough time had passed since the pre­miere of the 10 se­ries mod­els for them to make it onto the fully sup­ported de­vices list. At that time, the Pixel 9a was the fresh­est de­vice on the list (offering up to 7 YEARS of sup­port!) and on top of that, it was very at­trac­tively priced, as I bought it for around 1600 PLN (~450 USD).

In ret­ro­spect, I still con­sider it a good choice and def­i­nitely rec­om­mend this path to any­one who is cur­rently at the stage of de­cid­ing on what hard­ware to start their GrapheneOS ad­ven­ture. The only thing that both­ers me a bit about the Pixel 9a is the qual­ity of the pho­tos it takes. I switched to it hav­ing pre­vi­ously had the iPhone 15 Pro and Samsung Galaxy Z Fold 6, which are ex­cel­lent in this re­gard, so it’s no won­der I’m a bit spoiled, be­cause I was sim­ply used to a com­pletely dif­fer­ent level of cam­eras. Now I also know that GrapheneOS will stay with me for longer, so it’s pos­si­ble that know­ing then what I know now, I would have opted for some more ex­pen­sive gear. However, this is­n’t im­por­tant to me now, be­cause for the time be­ing I don’t plan to switch to an­other de­vice, and by the time that changes, the mar­ket sit­u­a­tion and the list of avail­able op­tions will cer­tainly have changed too. Besides, I’m pos­i­tively sur­prised by the bat­tery life and over­all per­for­mance of this phone.

A suit­able smart­phone - in my case, it’s a Google Pixel 9a.

A ca­ble to con­nect the phone to a com­puter; it can’t be just any ca­ble, but one that is used not only for charg­ing but also for data trans­mis­sion. It’s best to just use the ca­ble that came with the phone.

A com­puter with a Chromium-based browser (e.g., Google Chrome, Brave, Microsoft Edge, Vivaldi?). Unfortunately, I must rec­om­mend Windows 10/11 here, be­cause then you don’t have to mess around with any dri­vers; it’s the sim­plest op­tion.

If it’s new, we take it out of the box and turn it on. If it was pre­vi­ously used, we re­store it to fac­tory set­tings (Settings -> System -> Reset op­tions -> Erase all data (factory re­set) -> Erase all data). I think it’s stat­ing the ob­vi­ous, but I’ll write it any­way - a fac­tory re­set re­sults in the dele­tion of all user data from the de­vice, so if you have any­thing im­por­tant on it, you need to back it up.

We must go through the ba­sic setup un­til we see the home screen. We do the ab­solute min­i­mum. Here is a break­down of the steps:

we don’t con­nect to Wi-Fi, so we skip this step too

we don’t need to do any­thing with the war­ranty terms, so just the Next but­ton

there is no need to waste time set­ting up bio­met­rics, so we po­litely de­cline and skip fin­ger­print and face scan

First of all, we need to make sure that our phone’s soft­ware is up­dated to the lat­est avail­able ver­sion. For this pur­pose, we go to Settings -> System -> System up­date. If nec­es­sary, we up­date.

Next, we go to Settings -> About phone -> find the Build num­ber field and tap it 7 times un­til we see the mes­sage You are now a de­vel­oper. In the mean­time, the phone will ask for the PIN we set dur­ing the phone setup.

We go back and now en­ter Settings -> System -> Developer op­tions -> turn on the OEM un­lock­ing op­tion. The phone will ask for the PIN again. After en­ter­ing it, we still have to con­firm that we def­i­nitely want to re­move the lock.

When the screen goes com­pletely dark, we si­mul­ta­ne­ously press and hold the power and vol­ume down but­tons un­til the text-based Fastboot Mode in­ter­face ap­pears. If the phone starts up nor­mally, it means we per­formed one of the ear­lier steps in­cor­rectly.

We go to the com­puter and open the browser (based on the Chromium en­gine) to the ad­dress https://​graphe­neos.org/​in­stall/​web.

A win­dow with a list of de­vices to choose from will pop up in the browser. There should ba­si­cally be only one item on it, and that should be our Pixel. We se­lect it and press the Connect but­ton.

Changes will oc­cur on the phone’s dis­play. A mes­sage will ap­pear ask­ing to con­firm that we ac­tu­ally want to un­lock the boot­loader. To do this, we must press one of the vol­ume but­tons so that in­stead of Do not un­lock the boot­lader, Unlock the boot­lader ap­pears. At this point, we can con­firm by press­ing the power but­ton.

On the GrapheneOS web­site, we scroll down to the Obtaining fac­tory im­ages sec­tion and press the Download re­lease but­ton. If the phone is still con­nected to the com­puter, the web­site will de­cide on its own which sys­tem im­age to down­load.

We wait for the down­load to fin­ish. It is ob­vi­ous that the time needed for this de­pends di­rectly on the speed of the in­ter­net con­nec­tion.

Locking the boot­loader is cru­cial be­cause it en­ables the full op­er­a­tion of the Verified Boot fea­ture. It also pre­vents the use of fast­boot mode to flash, for­mat, or wipe par­ti­tions. Verified Boot de­tects any mod­i­fi­ca­tions to the OS par­ti­tions and blocks the read­ing of any al­tered or cor­rupted data. If changes are de­tected, the sys­tem uses er­ror cor­rec­tion data to at­tempt to re­cover the orig­i­nal data, which is then ver­i­fied again — thanks to this mech­a­nism, the sys­tem is re­silient to ac­ci­den­tal (non-malicious) file cor­rup­tion.

Being in Fastboot Mode, when we see the Start mes­sage, we press the power but­ton, which will cause the sys­tem to start nor­mally. If we don’t see Start at the height of the power but­ton, we have to press the vol­ume but­tons and find this op­tion.

This is a stan­dard pro­ce­dure, so we will only go through it briefly:

I rec­om­mend turn­ing off the lo­ca­tion ser­vice, be­cause it’s bet­ter to con­fig­ure it calmly later by grant­ing per­mis­sions only to apps that re­ally need it

se­cur­ing the phone with a fin­ger­print; I per­son­ally am an ad­vo­cate of this so­lu­tion, so I rec­om­mend us­ing it, GrapheneOS does not (yet) sup­port face un­lock, so fin­ger­print and a stan­dard pass­word are the only meth­ods we have to choose from (of course I re­ject pat­tern un­lock right at the start as a form of screen lock that can­not even in good con­science be called any se­cu­rity)

I as­sume that if you are read­ing this post, you are a graphene fresh­man and you have no backup to re­store, so we just skip this step

We land back in Fastboot Mode. I as­sume the phone was con­nected to the com­puter the whole time (if not, re­con­nect it). We re­turn to the browser on the com­puter. We find the Locking the boot­loader sec­tion and press the Lock boot­loader but­ton.

Again, con­fir­ma­tion of this op­er­a­tion on the phone is re­quired. It looks anal­o­gous to un­lock­ing, ex­cept this time, us­ing the vol­ume but­tons, we have to make the Lock the boot­loader op­tion ac­tive and con­firm it with the power but­ton.

Just like when re­mov­ing the lock, we go to Settings -> About phone -> find the Build num­ber field and tap it 7 times un­til we see the mes­sage You are now a de­vel­oper. In the mean­time, the phone will ask for the PIN we set dur­ing the phone setup.

We go back and now en­ter Settings -> System -> Developer op­tions -> turn off the OEM un­lock­ing op­tion. The phone will ask us to restart to change this set­ting, but for now we can­cel this re­quest, be­cause we still want to com­pletely turn off Developer op­tions, which is done by uncheck­ing the box next to the first op­tion at the very top, Use de­vel­oper op­tions.

Now the real fun be­gins. You’ll hear/​read as many opin­ions on what you should and should­n’t do re­gard­ing GrapheneOS hard­en­ing as there are peo­ple. Some are con­ser­v­a­tive, while oth­ers ap­proach the topic a bit more lib­er­ally. In my opin­ion, there is no one right path, and every­one should dig around, test things out, and de­cide what suits them and fits their se­cu­rity pro­file. You’ll quickly find out that GrapheneOS is re­ally one big com­pro­mise be­tween con­ve­nience and pri­vacy. While this same rule ap­plies to every­thing be­long­ing to the dig­i­tal world, it’s only in this case that you’ll truly no­tice it, be­cause GrapheneOS will show you how many things you can con­trol, which you can’t do us­ing con­ven­tional Android. I don’t in­tend to use this post to pro­mote some one and only” method of us­ing GrapheneOS. I’ll sim­ply pre­sent how I use this sys­tem. This way, I’ll show the ba­sics to peo­ple fresh to the topic, maybe I’ll man­age to sug­gest an in­ter­est­ing trick they did­n’t know to those who have been users for a while, and on a third note, maybe some ex­pert will show up who, af­ter read­ing my ram­blings, will sug­gest some­thing in­ter­est­ing or point out what I’m do­ing wrong / could do bet­ter. I’m sure that’s the case, since my ad­ven­ture with GrapheneOS has prac­ti­cally only been go­ing on for 3 months. I warn you right away that I’m not sure if I’ll be able to main­tain a log­i­cal train of thought, as I’ll prob­a­bly jump around top­ics a bit. The sub­ject of GrapheneOS is vast and in to­day’s post I’ll only man­age to slightly touch upon it.

One of the first things I did af­ter boot­ing up the freshly in­stalled sys­tem was to cre­ate a sec­ond user pro­file. This is done in Settings -> System -> Multiple users. The idea is for this fea­ture to al­low two (or more) peo­ple to use one phone, each hav­ing a sep­a­rate pro­file with their own set­tings, apps, etc. Who in their right mind does that? While I can imag­ine shar­ing a home tablet, shar­ing a phone com­pletely eludes me. It there­fore seems like a dead fea­ture, but noth­ing could be fur­ther from the truth.

For me, it works like this: on the Owner user, be­cause that’s the name of the main ac­count cre­ated au­to­mat­i­cally with the sys­tem, I in­stalled the Google Play Store along with Google Play ser­vices and GmsCompatConfig. This is done through the App Store ap­pli­ca­tion, which is a com­po­nent of the GrapheneOS sys­tem. Please don’t con­fuse this with Apple’s app store, even though the name is the same. From the Play Store I only in­stalled the fol­low­ing ap­pli­ca­tions:

And that’s it. As you can see, this pro­file serves me only for apps that ab­solutely re­quire in­te­gra­tion with Google ser­vices. In prac­tice, I switch to it only when I want to pay con­tact­lessly in a store, which I ac­tu­ally do rarely lately, be­cause if there’s an op­tion, I pay us­ing BLIK codes. Right af­ter switch­ing from Samsung there were more apps on this pro­file, but one by one I suc­ces­sively gave up on those that made me de­pen­dent on the big G.

It’s on the sec­ond pro­file, which let’s as­sume I called Tommy, that I keep my en­tire dig­i­tal life. What does this give me? For in­stance, the main pro­file can­not be eas­ily deleted, but the ad­di­tional one can. Let’s imag­ine a sit­u­a­tion where I need to quickly wipe my phone, but in a way that its ba­sic func­tions still work, i.e., with­out a full fac­tory re­set. An ex­am­ple could be, say, ar­riv­ing in the USA and un­der­go­ing im­mi­gra­tion con­trol. They want ac­cess to my phone, so I delete the Tommy user, switch to the Owner user, and hand them the phone. It makes calls, sends SMS mes­sages, even has a bank­ing app, so the­o­ret­i­cally it should­n’t arouse sus­pi­cion. However, it lacks all my con­tacts, a browser with my vis­ited pages his­tory, a pass­word man­ager, and mes­sen­gers with chat his­to­ries. This is rather a dras­tic sce­nario, but not re­ally that im­prob­a­ble, as ac­tions like search­ing a phone upon ar­rival in the States are some­thing that hap­pens on a daily ba­sis. Besides, the ba­sic rule of se­cu­rity is not to use an ac­count with ad­min­is­tra­tor priv­i­leges on a daily ba­sis.

On GrapheneOS, Obtainium is my pri­mary ag­gre­ga­tor for ob­tain­ing .apk in­stal­la­tion files and au­tomat­ing app up­dates. It’s like the Google Play Store, but pri­vacy-re­spect­ing and for open-source ap­pli­ca­tions. It would be a sin to use GrapheneOS and not at least try to switch to open-source apps. Below I pre­sent a list of apps that I use. Additionally, I’m toss­ing in links to the source code repos­i­to­ries of each of them.

To un­der­stand how Obtainium works and how to use it, I rec­om­mend check­ing out this video guide.

I have a few apps that are not open-source, but I still need them. In this case, I don’t down­load them from the Google Play Store, but ex­actly from the Aurora Store, which I men­tioned above.

Aurora Store is an open-source client of the Google Play store (I guess you could call it a fron­tend) that al­lows down­load­ing ap­pli­ca­tions from Google servers with­out need­ing Google ser­vices (GMS) on the phone.

* Privacy - you don’t need to log in with a Google ac­count to down­load free apps (you can use built-in anony­mous ac­counts).

With these anony­mous ac­counts, the thing is that some­times they work, and some­times they don’t, due to lim­its that are un­reach­able with a nor­mal ac­count used by one per­son, but when a thou­sand peo­ple down­load apps from one ac­count at once, it starts to get sus­pi­cious, and the lim­its are ex­ceeded quite quickly. Using Aurora Store vi­o­lates the Google Play Store terms of ser­vice, so on the other hand if we use our Google ac­count, it might be tem­porar­ily blocked or per­ma­nently banned. Some op­tion here is to cre­ate a burner” ac­count just for this, but that takes away some of our pri­vacy, be­cause Google can still in­dex us based on what we down­loaded. Anonymous ac­counts in this case pro­vide al­most com­plete anonymity, be­cause then we are just a drop in the ocean.

When it comes to se­cu­rity, yes, in the­ory we down­load .apk files from a ver­i­fied source, but only un­der the con­di­tion that the Aurora Store cre­ators don’t serve us a Man in the Middle at­tack. The de­ci­sion whether you trust the cre­ators of this app is up to you.

Below I pre­sent a list of ap­pli­ca­tions that I down­loaded from the Aurora Store, checked, and can con­firm that they work with­out GMS (Google Mobile Services).

* My mu­nic­i­pal­i­ty’s app - be­cause I need to know when they’ll col­lect my trash :)

* OpenVPN - I use it as a tun­nel to my home net­work

* Perplexity - I switched to Gemini, but I con­firm it works

* Synology Photos - my home photo gallery on a NAS

* Pocket Casts - pod­casts, I plan to mi­grate to AntennaPod

* TickTick - to-do lists, it’s hard for me to find a sen­si­ble al­ter­na­tive that is mul­ti­plat­form and has all the fea­tures I need

Has any­one ever won­dered if all apps on a phone need Internet ac­cess? Indeed, in the vast ma­jor­ity of cases, a mo­bile app with­out net­work ac­cess is use­less, but you can’t gen­er­al­ize like that, be­cause for ex­am­ple, the pre­vi­ously men­tioned FUTO Voice Input uses a lo­cal LLM to con­vert speech to text, which works of­fline on the de­vice. Why would such an app need Internet ac­cess then? For noth­ing, so it should­n’t have such per­mis­sion. Now let’s take apps like FairScan (document scan­ning), Catima (loyalty card ag­gre­ga­tor), Collabora Office (office suite), or Librera (ebook reader). They too do not need Internet ac­cess!

The sit­u­a­tion looks even more bizarre when you look at which apps ac­tu­ally need ac­cess to all of our de­vice’s sen­sors. If we think about it calmly, we’ll con­clude that in this spe­cific case it’s com­pletely the op­po­site of the pre­vi­ous one, mean­ing prac­ti­cally no app needs this in­for­ma­tion. And I re­mind you that by de­fault on Android with Google ser­vices, all apps have such per­mis­sions.

To man­age a given ap­pli­ca­tion’s per­mis­sions, just tap and hold on its icon, se­lect App info from the pop-up menu, and find the Permissions tab. A list cat­e­go­rized by things like - Allowed, Ask every time, and Not al­lowed will ap­pear. I rec­om­mend re­view­ing this list for each app sep­a­rately right af­ter in­stalling it. This is the foun­da­tion of GrapheneOS hard­en­ing.

A col­lec­tive menu where you can view spe­cific per­mis­sions and which apps have them granted is avail­able in Settings -> Security & pri­vacy -> Privacy -> Permission man­ager. Another in­ter­est­ing place is the Privacy dash­board avail­able in the same lo­ca­tion. It’s a tool that shows not only app per­mis­sions, but also how of­ten a given app reaches for the per­mis­sions granted to it.

In GrapheneOS we don’t only have user pro­files, but each user can also have some­thing called a Private space. I en­coun­tered some­thing sim­i­lar on Samsung, where it was called Secure Folder, so I as­sume this might just be an Android fea­ture im­ple­mented dif­fer­ently by each man­u­fac­turer.

Private space is turned on in Settings -> Security & pri­vacy -> Private space. It acts like a sort of sep­a­rated sand­box that is part of the en­vi­ron­ment you use, but at the same time is iso­lated from it. For me, it’s a place that gives me quick ac­cess to apps that nev­er­the­less re­quire Google ser­vices. You might ask - why then do I keep the mBank and T-Mobile apps on the Owner user if I could keep them here? Well, for rea­sons un­known to me, I’m un­able to con­fig­ure my pri­vate space so that pay­ing with con­tact­less BLIK via NFC works cor­rectly in it. The same goes for Magenta Moments from T-Mobile, which don’t work cor­rectly de­spite GMS be­ing in­stalled in the pri­vate space.

* Google Drive - I use it as a cloud to share files with clients

* mOby­wa­tel - at first I kept this app in the main pro­file as down­loaded from Aurora Store and every­thing some­what worked, but every now and then the app caught a to­tal freeze and stopped re­spond­ing, I think it might be re­lated to the fact that it does send some Google ser­vices-re­lated re­quests in the back­ground and does­n’t re­spond un­til such a re­quest times out, I have this on my list to in­ves­ti­gate

* Play Store - I have to down­load all these apps from some­where, do­ing it via Aurora Store in the pri­vate space would­n’t make sense since I have the whole Google ser­vices pack­age in­stalled here any­way

* XTB - an­other in­vest­ing app… works with­out GMS, but like I said, I keep all fi­nan­cial ones in one place

Oof… I did it again, sorry. I’m just count­ing the char­ac­ters and it comes out to just un­der 35,000… I’ll prob­a­bly break that bar­rier with these next few sen­tences. Well, long again, but purely meaty again, so I don’t think any­one has rea­son to com­plain. As I men­tioned ear­lier, I’ve only touched upon the topic of GrapheneOS, which is ex­ten­sive, and it’s a good thing, be­cause it’s a great sys­tem, and the biggest re­spect goes to the peo­ple be­hind this pro­ject. It’s thanks to them that we even have the op­tion of at least par­tially free­ing our­selves from Google (Android) and Apple (iOS). Therefore, I highly in­vite you to the fi­nal chap­ter of this post.

Finally, I would like to en­cour­age you to sup­port the GrapheneOS pro­ject. The de­vel­op­ers be­hind it are do­ing a re­ally great job and in my opin­ion de­serve to have some money thrown at them. Information on where and how this can be done can be found here.

...

Read the original on blog.tomaszdunia.pl »

7 978 shares, 37 trendiness

15+ years later, Microsoft morged my diagram

A few days ago, peo­ple started tag­ging me on Bluesky and Hacker News about a di­a­gram on Microsoft’s Learn por­tal. It looked… fa­mil­iar.

In 2010, I wrote A suc­cess­ful Git branch­ing

model and cre­ated a di­a­gram to go with it. I de­signed that di­a­gram in Apple Keynote, at the time ob­sess­ing over the col­ors, the curves, and the lay­out un­til it clearly com­mu­ni­cated how branches re­late to each other over time. I also pub­lished the source file so oth­ers could build on it. That di­a­gram has since spread every­where: in books, talks, blog posts, team wikis, and YouTube videos. I never minded. That was the whole point: shar­ing knowl­edge and let­ting the in­ter­net take it by storm!

What I did not ex­pect was for Microsoft, a tril­lion-dol­lar com­pany, some 15+ years later, to ap­par­ently run it through an AI im­age gen­er­a­tor and pub­lish the re­sult on their of­fi­cial Learn por­tal, with­out any credit or link back to the orig­i­nal.

The AI rip-off was not just ugly. It was care­less, bla­tantly am­a­teuris­tic, and lack­ing any am­bi­tion, to put it gen­tly. Microsoft un­wor­thy. The care­fully crafted vi­sual lan­guage and lay­out of the orig­i­nal, the branch col­ors, the lane de­sign, the dot and bub­ble align­ment that made the orig­i­nal so read­able—all of it had been mud­dled into a laugh­able form. Proper AI slop.

Arrows miss­ing and point­ing in the wrong di­rec­tion, and the ob­vi­ous continvoucly morged” text quickly gave it away as a cheap AI ar­ti­fact.

It had the rough shape of my di­a­gram though. Enough ac­tu­ally so that peo­ple rec­og­nized the orig­i­nal in it and started call­ing Microsoft out on it and reach­ing out to me. That so many peo­ple were up­set about this was re­ally nice, hon­estly. That, and continvoucly morged” was a very fun meme—thank you, in­ter­net! 😄

Oh god yes, Microsoft con­tin­voucly morged my di­a­gram there for sure 😬— Vincent Driessen (@nvie.com) 2026-02-16T20:55:54.762Z

Other than that, I find this whole thing mostly very sad­den­ing. Not be­cause some com­pany used my di­a­gram. As I said, it’s been every­where for 15 years and I’ve al­ways been fine with that. What’s dispir­it­ing is the (lack of) process

and care: take some­one’s care­fully crafted work, run it through a ma­chine to wash off the fin­ger­prints, and ship it as your own. This is­n’t a case of be­ing in­spired by some­thing and build­ing on it. It’s the op­po­site of that. It’s tak­ing some­thing that worked and mak­ing it worse. Is there even a goal here be­yond generating con­tent”?

What’s slightly wor­ry­ing me is that this time around, the di­a­gram was both well-known enough and ob­vi­ously AI-slop-y enough that it was easy to spot as pla­gia­rism. But we all know there will just be more and more con­tent like this that is­n’t so well-known or soon will get mu­tated or dis­guised in more ad­vanced ways that this pla­gia­rism no longer will be rec­og­niz­able as such.

I don’t need much here. A sim­ple link back and at­tri­bu­tion to the orig­i­nal ar­ti­cle would be a good start. I would also be in­ter­ested in un­der­stand­ing how this Learn page at Microsoft came to be, what the goals were here, and what the process has been that led to the cre­ation of this ugly as­set, and how there seem­ingly has not been any form of proof-read­ing for a doc­u­ment used as a learn­ing re­source by many de­vel­op­ers.

...

Read the original on nvie.com »

8 921 shares, 30 trendiness

A smarter model for your most complex tasks

Your browser does not sup­port the au­dio el­e­ment.

This con­tent is gen­er­ated by Google AI. Generative AI is ex­per­i­men­tal

Last week, we re­leased a ma­jor up­date to Gemini 3 Deep Think to solve mod­ern chal­lenges across sci­ence, re­search and en­gi­neer­ing. Today, we’re re­leas­ing the up­graded core in­tel­li­gence that makes those break­throughs pos­si­ble: Gemini 3.1 Pro. We are ship­ping 3.1 Pro across our con­sumer and de­vel­oper prod­ucts to bring this progress in in­tel­li­gence to your every­day ap­pli­ca­tions. For de­vel­op­ers in pre­view via the Gemini API in Google AI Studio, Gemini CLI, our agen­tic de­vel­op­ment plat­form Google Antigravity and Android StudioFor en­ter­prises in Vertex AI and Gemini EnterpriseFor con­sumers via the Gemini app and NotebookLMBuilding on the Gemini 3 se­ries, 3.1 Pro rep­re­sents a step for­ward in core rea­son­ing. 3.1 Pro is a smarter, more ca­pa­ble base­line for com­plex prob­lem-solv­ing. This is re­flected in our progress on rig­or­ous bench­marks. On ARC-AGI-2, a bench­mark that eval­u­ates a mod­el’s abil­ity to solve en­tirely new logic pat­terns, 3.1 Pro achieved a ver­i­fied score of 77.1%. This is more than dou­ble the rea­son­ing per­for­mance of 3 Pro.

3.1 Pro is de­signed for tasks where a sim­ple an­swer is­n’t enough, tak­ing ad­vanced rea­son­ing and mak­ing it use­ful for your hard­est chal­lenges. This im­proved in­tel­li­gence can help in prac­ti­cal ap­pli­ca­tions — whether you’re look­ing for a clear, vi­sual ex­pla­na­tion of a com­plex topic, a way to syn­the­size data into a sin­gle view, or bring­ing a cre­ative pro­ject to life.

Code-based an­i­ma­tion: 3.1 Pro can gen­er­ate web­site-ready, an­i­mated SVGs di­rectly from a text prompt. Because these are built in pure code rather than pix­els, they re­main crisp at any scale and main­tain in­cred­i­bly small file sizes com­pared to tra­di­tional video.

Complex sys­tem syn­the­sis: 3.1 Pro uti­lizes ad­vanced rea­son­ing to bridge the gap be­tween com­plex APIs and user-friendly de­sign. In this ex­am­ple, the model built a live aero­space dash­board, suc­cess­fully con­fig­ur­ing a pub­lic teleme­try stream to vi­su­al­ize the International Space Station’s or­bit.

Interactive de­sign: 3.1 Pro codes a com­plex 3D star­ling mur­mu­ra­tion. It does­n’t just gen­er­ate the vi­sual code; it builds an im­mer­sive ex­pe­ri­ence where users can ma­nip­u­late the flock with hand-track­ing and lis­ten to a gen­er­a­tive score that shifts based on the birds’ move­ment. For re­searchers and de­sign­ers, this pro­vides a pow­er­ful way to pro­to­type sen­sory-rich in­ter­faces.

Creative cod­ing: 3.1 Pro can trans­late lit­er­ary themes into func­tional code. When prompted to build a mod­ern per­sonal port­fo­lio for Emily Brontë’s Wuthering Heights,” the model did­n’t just sum­ma­rize the text. It rea­soned through the nov­el’s at­mos­pheric tone to de­sign a sleek, con­tem­po­rary in­ter­face, cre­at­ing a web­site that cap­tures the essence of the pro­tag­o­nist.

Since re­leas­ing Gemini 3 Pro in November, your feed­back and the pace of progress have dri­ven these rapid im­prove­ments. We are re­leas­ing 3.1 Pro in pre­view to­day to val­i­date these up­dates and con­tinue to make fur­ther ad­vance­ments in ar­eas such as am­bi­tious agen­tic work­flows be­fore we make it gen­er­ally avail­able soon.Start­ing to­day, Gemini 3.1 Pro in the Gemini app is rolling out with higher lim­its for users with the Google AI Pro and Ultra plans. 3.1 Pro is also now avail­able on NotebookLM ex­clu­sively for Pro and Ultra users. And de­vel­op­ers and en­ter­prises can ac­cess 3.1 Pro now in pre­view in the Gemini API via AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI and Android Studio.We can’t wait to see what you build and dis­cover with it.

...

Read the original on blog.google »

9 895 shares, 34 trendiness

Google Cloud console

Your page may be load­ing slowly be­cause you’re build­ing op­ti­mized sources. If you in­tended on us­ing un­com­piled sources, please click this link.

Google Cloud Console has failed to load JavaScript sources from www.gsta­tic.com.

Possible rea­sons are:www.gsta­tic.com or its IP ad­dresses are blocked by your net­work ad­min­is­tra­tor­Google has tem­porar­ily blocked your ac­count or net­work due to ex­ces­sive au­to­mated re­quest­sPlease con­tact your net­work ad­min­is­tra­tor for fur­ther as­sis­tance.

...

Read the original on console.cloud.google.com »

10 871 shares, 36 trendiness

Boris Tane

The Workflow in One Sentence I’ve been us­ing Claude Code as my pri­mary de­vel­op­ment tool for ap­prox 9 months, and the work­flow I’ve set­tled into is rad­i­cally dif­fer­ent from what most peo­ple do with AI cod­ing tools. Most de­vel­op­ers type a prompt, some­times use plan mode, fix the er­rors, re­peat. The more ter­mi­nally on­line are stitch­ing to­gether ralph loops, mcps, gas towns (remember those?), etc. The re­sults in both cases are a mess that com­pletely falls apart for any­thing non-triv­ial.

The work­flow I’m go­ing to de­scribe has one core prin­ci­ple: never let Claude write code un­til you’ve re­viewed and ap­proved a writ­ten plan. This sep­a­ra­tion of plan­ning and ex­e­cu­tion is the sin­gle most im­por­tant thing I do. It pre­vents wasted ef­fort, keeps me in con­trol of ar­chi­tec­ture de­ci­sions, and pro­duces sig­nif­i­cantly bet­ter re­sults with min­i­mal to­ken us­age than jump­ing straight to code.

flow­chart LR

R[Research] –> P[Plan]

P –> A[Annotate]

A –>|repeat 1-6x| A

A –> T[Todo List]

T –> I[Implement]

I –> F[Feedback & Iterate]

Every mean­ing­ful task starts with a deep-read di­rec­tive. I ask Claude to thor­oughly un­der­stand the rel­e­vant part of the code­base be­fore do­ing any­thing else. And I al­ways re­quire the find­ings to be writ­ten into a per­sis­tent mark­down file, never just a ver­bal sum­mary in the chat.

read this folder in depth, un­der­stand how it works deeply, what it does and all its speci­fici­ties. when that’s done, write a de­tailed re­port of your learn­ings and find­ings in re­search.md

study the no­ti­fi­ca­tion sys­tem in great de­tails, un­der­stand the in­tri­ca­cies of it and write a de­tailed re­search.md doc­u­ment with every­thing there is to know about how no­ti­fi­ca­tions work

go through the task sched­ul­ing flow, un­der­stand it deeply and look for po­ten­tial bugs. there def­i­nitely are bugs in the sys­tem as it some­times runs tasks that should have been can­celled. keep re­search­ing the flow un­til you find all the bugs, don’t stop un­til all the bugs are found. when you’re done, write a de­tailed re­port of your find­ings in re­search.md

Notice the lan­guage: deeply”, in great de­tails”, intricacies”, go through every­thing”. This is­n’t fluff. Without these words, Claude will skim. It’ll read a file, see what a func­tion does at the sig­na­ture level, and move on. You need to sig­nal that sur­face-level read­ing is not ac­cept­able.

The writ­ten ar­ti­fact (research.md) is crit­i­cal. It’s not about mak­ing Claude do home­work. It’s my re­view sur­face. I can read it, ver­ify Claude ac­tu­ally un­der­stood the sys­tem, and cor­rect mis­un­der­stand­ings be­fore any plan­ning hap­pens. If the re­search is wrong, the plan will be wrong, and the im­ple­men­ta­tion will be wrong. Garbage in, garbage out.

This is the most ex­pen­sive fail­ure mode with AI-assisted cod­ing, and it’s not wrong syn­tax or bad logic. It’s im­ple­men­ta­tions that work in iso­la­tion but break the sur­round­ing sys­tem. A func­tion that ig­nores an ex­ist­ing caching layer. A mi­gra­tion that does­n’t ac­count for the ORMs con­ven­tions. An API end­point that du­pli­cates logic that al­ready ex­ists else­where. The re­search phase pre­vents all of this.

Once I’ve re­viewed the re­search, I ask for a de­tailed im­ple­men­ta­tion plan in a sep­a­rate mark­down file.

I want to build a new fea­ture that ex­tends the sys­tem to per­form . write a de­tailed plan.md doc­u­ment out­lin­ing how to im­ple­ment this. in­clude code snip­pets

the list end­point should sup­port cur­sor-based pag­i­na­tion in­stead of off­set. write a de­tailed plan.md for how to achieve this. read source files be­fore sug­gest­ing changes, base the plan on the ac­tual code­base

The gen­er­ated plan al­ways in­cludes a de­tailed ex­pla­na­tion of the ap­proach, code snip­pets show­ing the ac­tual changes, file paths that will be mod­i­fied, and con­sid­er­a­tions and trade-offs.

I use my own .md plan files rather than Claude Code’s built-in plan mode. The built-in plan mode sucks. My mark­down file gives me full con­trol. I can edit it in my ed­i­tor, add in­line notes, and it per­sists as a real ar­ti­fact in the pro­ject.

One trick I use con­stantly: for well-con­tained fea­tures where I’ve seen a good im­ple­men­ta­tion in an open source repo, I’ll share that code as a ref­er­ence along­side the plan re­quest. If I want to add sortable IDs, I paste the ID gen­er­a­tion code from a pro­ject that does it well and say this is how they do sortable IDs, write a plan.md ex­plain­ing how we can adopt a sim­i­lar ap­proach.” Claude works dra­mat­i­cally bet­ter when it has a con­crete ref­er­ence im­ple­men­ta­tion to work from rather than de­sign­ing from scratch.

But the plan doc­u­ment it­self is­n’t the in­ter­est­ing part. The in­ter­est­ing part is what hap­pens next.

This is the most dis­tinc­tive part of my work­flow, and the part where I add the most value.

flow­chart TD

W[Claude writes plan.md] –> R[I re­view in my ed­i­tor]

R –> N[I add in­line notes]

N –> S[Send Claude back to the doc­u­ment]

S –> U[Claude up­dates plan]

U –> D{Satisfied?}

D –>|No| R

D –>|Yes| T[Request todo list]

After Claude writes the plan, I open it in my ed­i­tor and add in­line notes di­rectly into the doc­u­ment. These notes cor­rect as­sump­tions, re­ject ap­proaches, add con­straints, or pro­vide do­main knowl­edge that Claude does­n’t have.

The notes vary wildly in length. Sometimes a note is two words: not op­tional” next to a pa­ra­me­ter Claude marked as op­tional. Other times it’s a para­graph ex­plain­ing a busi­ness con­straint or past­ing a code snip­pet show­ing the data shape I ex­pect.

use driz­zle:gen­er­ate for mi­gra­tions, not raw SQL — do­main knowl­edge Claude does­n’t have

no — this should be a PATCH, not a PUT — cor­rect­ing a wrong as­sump­tion

remove this sec­tion en­tirely, we don’t need caching here” — re­ject­ing a pro­posed ap­proach

the queue con­sumer al­ready han­dles re­tries, so this retry logic is re­dun­dant. re­move it and just let it fail” — ex­plain­ing why some­thing should change

this is wrong, the vis­i­bil­ity field needs to be on the list it­self, not on in­di­vid­ual items. when a list is pub­lic, all items are pub­lic. re­struc­ture the schema sec­tion ac­cord­ingly” — redi­rect­ing an en­tire sec­tion of the plan

Then I send Claude back to the doc­u­ment:

I added a few notes to the doc­u­ment, ad­dress all the notes and up­date the doc­u­ment ac­cord­ingly. don’t im­ple­ment yet

This cy­cle re­peats 1 to 6 times. The ex­plicit don’t im­ple­ment yet” guard is es­sen­tial. Without it, Claude will jump to code the mo­ment it thinks the plan is good enough. It’s not good enough un­til I say it is.

Why This Works So Well

The mark­down file acts as shared mu­ta­ble state be­tween me and Claude. I can think at my own pace, an­no­tate pre­cisely where some­thing is wrong, and re-en­gage with­out los­ing con­text. I’m not try­ing to ex­plain every­thing in a chat mes­sage. I’m point­ing at the ex­act spot in the doc­u­ment where the is­sue is and writ­ing my cor­rec­tion right there.

This is fun­da­men­tally dif­fer­ent from try­ing to steer im­ple­men­ta­tion through chat mes­sages. The plan is a struc­tured, com­plete spec­i­fi­ca­tion I can re­view holis­ti­cally. A chat con­ver­sa­tion is some­thing I’d have to scroll through to re­con­struct de­ci­sions. The plan wins every time.

Three rounds of I added notes, up­date the plan” can trans­form a generic im­ple­men­ta­tion plan into one that fits per­fectly into the ex­ist­ing sys­tem. Claude is ex­cel­lent at un­der­stand­ing code, propos­ing so­lu­tions, and writ­ing im­ple­men­ta­tions. But it does­n’t know my prod­uct pri­or­i­ties, my users’ pain points, or the en­gi­neer­ing trade-offs I’m will­ing to make. The an­no­ta­tion cy­cle is how I in­ject that judge­ment.

add a de­tailed todo list to the plan, with all the phases and in­di­vid­ual tasks nec­es­sary to com­plete the plan - don’t im­ple­ment yet

This cre­ates a check­list that serves as a progress tracker dur­ing im­ple­men­ta­tion. Claude marks items as com­pleted as it goes, so I can glance at the plan at any point and see ex­actly where things stand. Especially valu­able in ses­sions that run for hours.

When the plan is ready, I is­sue the im­ple­men­ta­tion com­mand. I’ve re­fined this into a stan­dard prompt I reuse across ses­sions:

im­ple­ment it all. when you’re done with a task or phase, mark it as com­pleted in the plan doc­u­ment. do not stop un­til all tasks and phases are com­pleted. do not add un­nec­es­sary com­ments or js­docs, do not use any or un­known types. con­tin­u­ously run type­check to make sure you’re not in­tro­duc­ing new is­sues.

This sin­gle prompt en­codes every­thing that mat­ters:

implement it all”: do every­thing in the plan, don’t cherry-pick

mark it as com­pleted in the plan doc­u­ment”: the plan is the source of truth for progress

do not stop un­til all tasks and phases are com­pleted”: don’t pause for con­fir­ma­tion mid-flow

do not add un­nec­es­sary com­ments or js­docs”: keep the code clean

do not use any or un­known types”: main­tain strict typ­ing

continuously run type­check”: catch prob­lems early, not at the end

I use this ex­act phras­ing (with mi­nor vari­a­tions) in vir­tu­ally every im­ple­men­ta­tion ses­sion. By the time I say implement it all,” every de­ci­sion has been made and val­i­dated. The im­ple­men­ta­tion be­comes me­chan­i­cal, not cre­ative. This is de­lib­er­ate. I want im­ple­men­ta­tion to be bor­ing. The cre­ative work hap­pened in the an­no­ta­tion cy­cles. Once the plan is right, ex­e­cu­tion should be straight­for­ward.

Without the plan­ning phase, what typ­i­cally hap­pens is Claude makes a rea­son­able-but-wrong as­sump­tion early on, builds on top of it for 15 min­utes, and then I have to un­wind a chain of changes. The don’t im­ple­ment yet” guard elim­i­nates this en­tirely.

Once Claude is ex­e­cut­ing the plan, my role shifts from ar­chi­tect to su­per­vi­sor. My prompts be­come dra­mat­i­cally shorter.

flow­chart LR

I[Claude im­ple­ments] –> R[I re­view / test]

R –> C{Correct?}

C –>|No| F[Terse cor­rec­tion]

F –> I

C –>|Yes| N{More tasks?}

N –>|Yes| I

N –>|No| D[Done]

Where a plan­ning note might be a para­graph, an im­ple­men­ta­tion cor­rec­tion is of­ten a sin­gle sen­tence:

You built the set­tings page in the main app when it should be in the ad­min app, move it.”

Claude has the full con­text of the plan and the on­go­ing ses­sion, so terse cor­rec­tions are enough.

Frontend work is the most it­er­a­tive part. I test in the browser and fire off rapid cor­rec­tions:

For vi­sual is­sues, I some­times at­tach screen­shots. A screen­shot of a mis­aligned table com­mu­ni­cates the prob­lem faster than de­scrib­ing it.

this table should look ex­actly like the users table, same header, same pag­i­na­tion, same row den­sity.”

This is far more pre­cise than de­scrib­ing a de­sign from scratch. Most fea­tures in a ma­ture code­base are vari­a­tions on ex­ist­ing pat­terns. A new set­tings page should look like the ex­ist­ing set­tings pages. Pointing to the ref­er­ence com­mu­ni­cates all the im­plicit re­quire­ments with­out spelling them out. Claude would typ­i­cally read the ref­er­ence file(s) be­fore mak­ing the cor­rec­tion.

When some­thing goes in a wrong di­rec­tion, I don’t try to patch it. I re­vert and re-scope by dis­card­ing the git changes:

I re­verted every­thing. Now all I want is to make the list view more min­i­mal — noth­ing else.”

Narrowing scope af­ter a re­vert al­most al­ways pro­duces bet­ter re­sults than try­ing to in­cre­men­tally fix a bad ap­proach.

Even though I del­e­gate ex­e­cu­tion to Claude, I never give it to­tal au­ton­omy over what gets built. I do the vast ma­jor­ity of the ac­tive steer­ing in the plan.md doc­u­ments.

This mat­ters be­cause Claude will some­times pro­pose so­lu­tions that are tech­ni­cally cor­rect but wrong for the pro­ject. Maybe the ap­proach is over-en­gi­neered, or it changes a pub­lic API sig­na­ture that other parts of the sys­tem de­pend on, or it picks a more com­plex op­tion when a sim­pler one would do. I have con­text about the broader sys­tem, the prod­uct di­rec­tion, and the en­gi­neer­ing cul­ture that Claude does­n’t.

flow­chart TD

P[Claude pro­poses changes] –> E[I eval­u­ate each item]

E –> A[Accept as-is]

E –> M[Modify ap­proach]

E –> S[Skip / re­move]

E –> O[Override tech­ni­cal choice]

A & M & S & O –> R[Refined im­ple­men­ta­tion scope]

Cherry-picking from pro­pos­als: When Claude iden­ti­fies mul­ti­ple is­sues, I go through them one by one: for the first one, just use Promise.all, don’t make it overly com­pli­cated; for the third one, ex­tract it into a sep­a­rate func­tion for read­abil­ity; ig­nore the fourth and fifth ones, they’re not worth the com­plex­ity.” I’m mak­ing item-level de­ci­sions based on my knowl­edge of what mat­ters right now.

Trimming scope: When the plan in­cludes nice-to-haves, I ac­tively cut them. remove the down­load fea­ture from the plan, I don’t want to im­ple­ment this now.” This pre­vents scope creep.

Protecting ex­ist­ing in­ter­faces: I set hard con­straints when I know some­thing should­n’t change: the sig­na­tures of these three func­tions should not change, the caller should adapt, not the li­brary.”

Overriding tech­ni­cal choices: Sometimes I have a spe­cific pref­er­ence Claude would­n’t know about: use this model in­stead of that one” or use this li­brary’s built-in method in­stead of writ­ing a cus­tom one.” Fast, di­rect over­rides.

Claude han­dles the me­chan­i­cal ex­e­cu­tion, while I make the judge­ment calls. The plan cap­tures the big de­ci­sions up­front, and se­lec­tive guid­ance han­dles the smaller ones that emerge dur­ing im­ple­men­ta­tion.

I run re­search, plan­ning, and im­ple­men­ta­tion in a sin­gle long ses­sion rather than split­ting them across sep­a­rate ses­sions. A sin­gle ses­sion might start with deep-read­ing a folder, go through three rounds of plan an­no­ta­tion, then run the full im­ple­men­ta­tion, all in one con­tin­u­ous con­ver­sa­tion.

I am not see­ing the per­for­mance degra­da­tion every­one talks about af­ter 50% con­text win­dow. Actually, by the time I say implement it all,” Claude has spent the en­tire ses­sion build­ing un­der­stand­ing: read­ing files dur­ing re­search, re­fin­ing its men­tal model dur­ing an­no­ta­tion cy­cles, ab­sorb­ing my do­main knowl­edge cor­rec­tions.

When the con­text win­dow fills up, Claude’s auto-com­paction main­tains enough con­text to keep go­ing. And the plan doc­u­ment, the per­sis­tent ar­ti­fact, sur­vives com­paction in full fi­delity. I can point Claude to it at any point in time.

The Workflow in One Sentence

Read deeply, write a plan, an­no­tate the plan un­til it’s right, then let Claude ex­e­cute the whole thing with­out stop­ping, check­ing types along the way.

That’s it. No magic prompts, no elab­o­rate sys­tem in­struc­tions, no clever hacks. Just a dis­ci­plined pipeline that sep­a­rates think­ing from typ­ing. The re­search pre­vents Claude from mak­ing ig­no­rant changes. The plan pre­vents it from mak­ing wrong changes. The an­no­ta­tion cy­cle in­jects my judge­ment. And the im­ple­men­ta­tion com­mand lets it run with­out in­ter­rup­tion once every de­ci­sion has been made.

Try my work­flow, you’ll won­der how you ever shipped any­thing with cod­ing agents with­out an an­no­tated plan doc­u­ment sit­ting be­tween you and the code.

The Workflow in One Sentence

...

Read the original on boristane.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.