10 interesting stories served every morning and every evening.




1 510 shares, 54 trendiness

Your Logs Are Lying To You

Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

INFO HttpServer started suc­cess­fully bind­ing=0.0.0.0:3000 pid=28471 env=pro­duc­tion ver­sion=2.4.1 node_env=pro­duc­tion clus­ter_­mode=en­abled work­ers=4

de­bug PostgreSQL con­nec­tion pool ini­tial­ized host=db.in­ter­nal:5432 data­base=main pool_­size=20 ss­l_­mode=re­quire idle_­time­out=10000ms max_life­time=1800000ms

INFO Incoming re­quest method=GET path=/​api/​v1/​users/​me ip=192.168.1.42 user_a­gent=“Mozilla/​5.0” re­quest_id=re­q_8f7a2b3c trace_id=abc123de­f456

de­bug JWT to­ken val­i­da­tion started is­suer=auth.com­pany.com au­di­ence=api.com­pany.com exp=1703044800 iat=1703041200 sub=user_abc123 scope=“read write”

WARN Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

de­bug Redis cache lookup failed key=users:org_12345:list:v2 ttl_sec­onds=3600 fall­back­_s­trat­egy=data­base cache_­clus­ter=re­dis-prod-01 la­ten­cy_ms=2

info Request com­pleted suc­cess­fully sta­tus=200 du­ra­tion_ms=1247 bytes_sent=48291 re­quest_id=re­q_8f7a2b3c cache_hit=false db_­queries=3 ex­ter­nal_­calls=1

ERROR Database con­nec­tion pool ex­hausted ac­tive_­con­nec­tions=20 wait­ing_re­quests=147 time­out_ms=30000 ser­vice=post­gres sug­ges­tion=“Con­sider in­creas­ing pool_­size or op­ti­miz­ing queries”

warn Retrying failed HTTP re­quest at­tempt=1 max_at­tempts=3 back­of­f_ms=100 er­ror_­code=ETIMED­OUT tar­get_ser­vice=pay­ment-gate­way end­point=/​v1/​charges cir­cuit_s­tate=closed

INFO Circuit breaker state tran­si­tion ser­vice=pay­ment-api pre­vi­ous_s­tate=closed cur­ren­t_s­tate=open fail­ure_­count=5 fail­ure_thresh­old=5 re­set_­time­out_ms=30000

de­bug Background job ex­e­cuted suc­cess­fully job_id=job_9x8w7v6u type=week­ly_e­mail_di­gest du­ra­tion_ms=2341 email­s_sent=1847 fail­ures=3 queue=de­fault pri­or­ity=low

ERROR Memory pres­sure crit­i­cal heap_used_bytes=1932735283 heap_lim­it_bytes=2147483648 gc_­pause_ms=847 gc_­type=ma­jor rss_bytes=2415919104 ex­ter­nal_bytes=8847291

WARN Rate limit thresh­old ap­proach­ing user_id=user_abc123 cur­ren­t_re­quests=890 limit=1000 win­dow_sec­onds=60 re­main­ing=110 re­set_at=2024-12-20T03:15:00Z

info WebSocket con­nec­tion es­tab­lished clien­t_id=ws_7f8g9h2j pro­to­col=wss rooms=[“team_up­dates”,“no­ti­fi­ca­tions”,“pres­ence”] user_id=user_abc123 ip=192.168.1.42

de­bug Kafka mes­sage con­sumed suc­cess­fully topic=user-events par­ti­tion=3 off­set=1847291 key=user_abc123 con­sumer_­group=api-con­sumers lag=12 pro­cess­ing_­time_ms=45

INFO Health check passed ser­vice=api-gate­way up­time_sec­onds=847291 ac­tive_­con­nec­tions=142 mem­o­ry_us­age_per­cent=73 cpu_us­age_per­cent=45 sta­tus=healthy ver­sion=2.4.1

de­bug S3 up­load com­pleted bucket=com­pany-up­loads key=avatars/​user_abc123/​pro­file.jpg size_bytes=245891 con­tent_­type=im­age/​jpeg du­ra­tion_ms=892 re­gion=us-east-1

warn Deprecated API ver­sion de­tected end­point=/​api/​v1/​legacy/​users ver­sion=v1 rec­om­mend­ed_ver­sion=v3 dep­re­ca­tion_­date=2025-01-15 clien­t_id=mo­bile-app-ios

INFO HttpServer started suc­cess­fully bind­ing=0.0.0.0:3000 pid=28471 env=pro­duc­tion ver­sion=2.4.1 node_env=pro­duc­tion clus­ter_­mode=en­abled work­ers=4

de­bug PostgreSQL con­nec­tion pool ini­tial­ized host=db.in­ter­nal:5432 data­base=main pool_­size=20 ss­l_­mode=re­quire idle_­time­out=10000ms max_life­time=1800000ms

INFO Incoming re­quest method=GET path=/​api/​v1/​users/​me ip=192.168.1.42 user_a­gent=“Mozilla/​5.0” re­quest_id=re­q_8f7a2b3c trace_id=abc123de­f456

de­bug JWT to­ken val­i­da­tion started is­suer=auth.com­pany.com au­di­ence=api.com­pany.com exp=1703044800 iat=1703041200 sub=user_abc123 scope=“read write”

WARN Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

de­bug Redis cache lookup failed key=users:org_12345:list:v2 ttl_sec­onds=3600 fall­back­_s­trat­egy=data­base cache_­clus­ter=re­dis-prod-01 la­ten­cy_ms=2

info Request com­pleted suc­cess­fully sta­tus=200 du­ra­tion_ms=1247 bytes_sent=48291 re­quest_id=re­q_8f7a2b3c cache_hit=false db_­queries=3 ex­ter­nal_­calls=1

ERROR Database con­nec­tion pool ex­hausted ac­tive_­con­nec­tions=20 wait­ing_re­quests=147 time­out_ms=30000 ser­vice=post­gres sug­ges­tion=“Con­sider in­creas­ing pool_­size or op­ti­miz­ing queries”

warn Retrying failed HTTP re­quest at­tempt=1 max_at­tempts=3 back­of­f_ms=100 er­ror_­code=ETIMED­OUT tar­get_ser­vice=pay­ment-gate­way end­point=/​v1/​charges cir­cuit_s­tate=closed

INFO Circuit breaker state tran­si­tion ser­vice=pay­ment-api pre­vi­ous_s­tate=closed cur­ren­t_s­tate=open fail­ure_­count=5 fail­ure_thresh­old=5 re­set_­time­out_ms=30000

de­bug Background job ex­e­cuted suc­cess­fully job_id=job_9x8w7v6u type=week­ly_e­mail_di­gest du­ra­tion_ms=2341 email­s_sent=1847 fail­ures=3 queue=de­fault pri­or­ity=low

ERROR Memory pres­sure crit­i­cal heap_used_bytes=1932735283 heap_lim­it_bytes=2147483648 gc_­pause_ms=847 gc_­type=ma­jor rss_bytes=2415919104 ex­ter­nal_bytes=8847291

WARN Rate limit thresh­old ap­proach­ing user_id=user_abc123 cur­ren­t_re­quests=890 limit=1000 win­dow_sec­onds=60 re­main­ing=110 re­set_at=2024-12-20T03:15:00Z

info WebSocket con­nec­tion es­tab­lished clien­t_id=ws_7f8g9h2j pro­to­col=wss rooms=[“team_up­dates”,“no­ti­fi­ca­tions”,“pres­ence”] user_id=user_abc123 ip=192.168.1.42

de­bug Kafka mes­sage con­sumed suc­cess­fully topic=user-events par­ti­tion=3 off­set=1847291 key=user_abc123 con­sumer_­group=api-con­sumers lag=12 pro­cess­ing_­time_ms=45

INFO Health check passed ser­vice=api-gate­way up­time_sec­onds=847291 ac­tive_­con­nec­tions=142 mem­o­ry_us­age_per­cent=73 cpu_us­age_per­cent=45 sta­tus=healthy ver­sion=2.4.1

de­bug S3 up­load com­pleted bucket=com­pany-up­loads key=avatars/​user_abc123/​pro­file.jpg size_bytes=245891 con­tent_­type=im­age/​jpeg du­ra­tion_ms=892 re­gion=us-east-1

warn Deprecated API ver­sion de­tected end­point=/​api/​v1/​legacy/​users ver­sion=v1 rec­om­mend­ed_ver­sion=v3 dep­re­ca­tion_­date=2025-01-15 clien­t_id=mo­bile-app-ios

And here’s how to make it bet­ter.

Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

INFO HttpServer started suc­cess­fully bind­ing=0.0.0.0:3000 pid=28471 env=pro­duc­tion ver­sion=2.4.1 node_env=pro­duc­tion clus­ter_­mode=en­abled work­ers=4

de­bug PostgreSQL con­nec­tion pool ini­tial­ized host=db.in­ter­nal:5432 data­base=main pool_­size=20 ss­l_­mode=re­quire idle_­time­out=10000ms max_life­time=1800000ms

INFO Incoming re­quest method=GET path=/​api/​v1/​users/​me ip=192.168.1.42 user_a­gent=“Mozilla/​5.0” re­quest_id=re­q_8f7a2b3c trace_id=abc123de­f456

de­bug JWT to­ken val­i­da­tion started is­suer=auth.com­pany.com au­di­ence=api.com­pany.com exp=1703044800 iat=1703041200 sub=user_abc123 scope=“read write”

WARN Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

de­bug Redis cache lookup failed key=users:org_12345:list:v2 ttl_sec­onds=3600 fall­back­_s­trat­egy=data­base cache_­clus­ter=re­dis-prod-01 la­ten­cy_ms=2

info Request com­pleted suc­cess­fully sta­tus=200 du­ra­tion_ms=1247 bytes_sent=48291 re­quest_id=re­q_8f7a2b3c cache_hit=false db_­queries=3 ex­ter­nal_­calls=1

ERROR Database con­nec­tion pool ex­hausted ac­tive_­con­nec­tions=20 wait­ing_re­quests=147 time­out_ms=30000 ser­vice=post­gres sug­ges­tion=“Con­sider in­creas­ing pool_­size or op­ti­miz­ing queries”

warn Retrying failed HTTP re­quest at­tempt=1 max_at­tempts=3 back­of­f_ms=100 er­ror_­code=ETIMED­OUT tar­get_ser­vice=pay­ment-gate­way end­point=/​v1/​charges cir­cuit_s­tate=closed

INFO Circuit breaker state tran­si­tion ser­vice=pay­ment-api pre­vi­ous_s­tate=closed cur­ren­t_s­tate=open fail­ure_­count=5 fail­ure_thresh­old=5 re­set_­time­out_ms=30000

de­bug Background job ex­e­cuted suc­cess­fully job_id=job_9x8w7v6u type=week­ly_e­mail_di­gest du­ra­tion_ms=2341 email­s_sent=1847 fail­ures=3 queue=de­fault pri­or­ity=low

ERROR Memory pres­sure crit­i­cal heap_used_bytes=1932735283 heap_lim­it_bytes=2147483648 gc_­pause_ms=847 gc_­type=ma­jor rss_bytes=2415919104 ex­ter­nal_bytes=8847291

WARN Rate limit thresh­old ap­proach­ing user_id=user_abc123 cur­ren­t_re­quests=890 limit=1000 win­dow_sec­onds=60 re­main­ing=110 re­set_at=2024-12-20T03:15:00Z

info WebSocket con­nec­tion es­tab­lished clien­t_id=ws_7f8g9h2j pro­to­col=wss rooms=[“team_up­dates”,“no­ti­fi­ca­tions”,“pres­ence”] user_id=user_abc123 ip=192.168.1.42

de­bug Kafka mes­sage con­sumed suc­cess­fully topic=user-events par­ti­tion=3 off­set=1847291 key=user_abc123 con­sumer_­group=api-con­sumers lag=12 pro­cess­ing_­time_ms=45

INFO Health check passed ser­vice=api-gate­way up­time_sec­onds=847291 ac­tive_­con­nec­tions=142 mem­o­ry_us­age_per­cent=73 cpu_us­age_per­cent=45 sta­tus=healthy ver­sion=2.4.1

de­bug S3 up­load com­pleted bucket=com­pany-up­loads key=avatars/​user_abc123/​pro­file.jpg size_bytes=245891 con­tent_­type=im­age/​jpeg du­ra­tion_ms=892 re­gion=us-east-1

warn Deprecated API ver­sion de­tected end­point=/​api/​v1/​legacy/​users ver­sion=v1 rec­om­mend­ed_ver­sion=v3 dep­re­ca­tion_­date=2025-01-15 clien­t_id=mo­bile-app-ios

INFO HttpServer started suc­cess­fully bind­ing=0.0.0.0:3000 pid=28471 env=pro­duc­tion ver­sion=2.4.1 node_env=pro­duc­tion clus­ter_­mode=en­abled work­ers=4

de­bug PostgreSQL con­nec­tion pool ini­tial­ized host=db.in­ter­nal:5432 data­base=main pool_­size=20 ss­l_­mode=re­quire idle_­time­out=10000ms max_life­time=1800000ms

INFO Incoming re­quest method=GET path=/​api/​v1/​users/​me ip=192.168.1.42 user_a­gent=“Mozilla/​5.0” re­quest_id=re­q_8f7a2b3c trace_id=abc123de­f456

de­bug JWT to­ken val­i­da­tion started is­suer=auth.com­pany.com au­di­ence=api.com­pany.com exp=1703044800 iat=1703041200 sub=user_abc123 scope=“read write”

WARN Slow data­base query de­tected du­ra­tion_ms=847 query=“SE­LECT u.*, o.name FROM users u JOIN orgs o ON u.org_id = o.id WHERE u.org_id = $1 AND u.delet­ed_at IS NULL rows_re­turned=2847

de­bug Redis cache lookup failed key=users:org_12345:list:v2 ttl_sec­onds=3600 fall­back­_s­trat­egy=data­base cache_­clus­ter=re­dis-prod-01 la­ten­cy_ms=2

info Request com­pleted suc­cess­fully sta­tus=200 du­ra­tion_ms=1247 bytes_sent=48291 re­quest_id=re­q_8f7a2b3c cache_hit=false db_­queries=3 ex­ter­nal_­calls=1

ERROR Database con­nec­tion pool ex­hausted ac­tive_­con­nec­tions=20 wait­ing_re­quests=147 time­out_ms=30000 ser­vice=post­gres sug­ges­tion=“Con­sider in­creas­ing pool_­size or op­ti­miz­ing queries”

warn Retrying failed HTTP re­quest at­tempt=1 max_at­tempts=3 back­of­f_ms=100 er­ror_­code=ETIMED­OUT tar­get_ser­vice=pay­ment-gate­way end­point=/​v1/​charges cir­cuit_s­tate=closed

INFO Circuit breaker state tran­si­tion ser­vice=pay­ment-api pre­vi­ous_s­tate=closed cur­ren­t_s­tate=open fail­ure_­count=5 fail­ure_thresh­old=5 re­set_­time­out_ms=30000

de­bug Background job ex­e­cuted suc­cess­fully job_id=job_9x8w7v6u type=week­ly_e­mail_di­gest du­ra­tion_ms=2341 email­s_sent=1847 fail­ures=3 queue=de­fault pri­or­ity=low

ERROR Memory pres­sure crit­i­cal heap_used_bytes=1932735283 heap_lim­it_bytes=2147483648 gc_­pause_ms=847 gc_­type=ma­jor rss_bytes=2415919104 ex­ter­nal_bytes=8847291

WARN Rate limit thresh­old ap­proach­ing user_id=user_abc123 cur­ren­t_re­quests=890 limit=1000 win­dow_sec­onds=60 re­main­ing=110 re­set_at=2024-12-20T03:15:00Z

info WebSocket con­nec­tion es­tab­lished clien­t_id=ws_7f8g9h2j pro­to­col=wss rooms=[“team_up­dates”,“no­ti­fi­ca­tions”,“pres­ence”] user_id=user_abc123 ip=192.168.1.42

de­bug Kafka mes­sage con­sumed suc­cess­fully topic=user-events par­ti­tion=3 off­set=1847291 key=user_abc123 con­sumer_­group=api-con­sumers lag=12 pro­cess­ing_­time_ms=45

INFO Health check passed ser­vice=api-gate­way up­time_sec­onds=847291 ac­tive_­con­nec­tions=142 mem­o­ry_us­age_per­cent=73 cpu_us­age_per­cent=45 sta­tus=healthy ver­sion=2.4.1

de­bug S3 up­load com­pleted bucket=com­pany-up­loads key=avatars/​user_abc123/​pro­file.jpg size_bytes=245891 con­tent_­type=im­age/​jpeg du­ra­tion_ms=892 re­gion=us-east-1

warn Deprecated API ver­sion de­tected end­point=/​api/​v1/​legacy/​users ver­sion=v1 rec­om­mend­ed_ver­sion=v3 dep­re­ca­tion_­date=2025-01-15 clien­t_id=mo­bile-app-ios

And here’s how to make it bet­ter.

Your logs are ly­ing to you. Not ma­li­ciously. They’re just not equipped to tell the truth.

You’ve prob­a­bly spent hours grep-ing through logs try­ing to un­der­stand why a user could­n’t check out, why that web­hook failed, or why your p99 la­tency spiked at 3am. You found noth­ing use­ful. Just time­stamps and vague mes­sages that mock you with their use­less­ness.

This is­n’t your fault. Logging, as it’s com­monly prac­ticed, is fun­da­men­tally bro­ken. And no, slap­ping OpenTelemetry on your code­base won’t mag­i­cally fix it.

Let me show you what’s wrong, and more im­por­tantly, how to fix it.

Logs were de­signed for a dif­fer­ent era. An era of mono­liths, sin­gle servers, and prob­lems you could re­pro­duce lo­cally. Today, a sin­gle user re­quest might touch 15 ser­vices, 3 data­bases, 2 caches, and a mes­sage queue. Your logs are still act­ing like it’s 2005.

Here’s what a typ­i­cal log­ging setup looks like:

That’s 13 log lines for a sin­gle suc­cess­ful re­quest. Now mul­ti­ply that by 10,000 con­cur­rent users. You’ve got 130,000 log lines per sec­ond. Most of them say­ing ab­solutely noth­ing use­ful.

But here’s the real prob­lem: when some­thing goes wrong, these logs won’t help you. They’re miss­ing the one thing you need: con­text.

When a user re­ports I can’t com­plete my pur­chase,” your first in­stinct is to search your logs. You type their email, or maybe their user ID, and hit en­ter.

String search treats logs as bags of char­ac­ters. It has no un­der­stand­ing of struc­ture, no con­cept of re­la­tion­ships, no way to cor­re­late events across ser­vices.

When you search for user-123”, you might find it logged 47 dif­fer­ent ways across your code­base:

And those are just the logs that in­clude the user ID. What about the down­stream ser­vice that only logged the or­der ID? Now you need a sec­ond search. And a third. You’re play­ing de­tec­tive with one hand tied be­hind your back.

The fun­da­men­tal prob­lem: logs are op­ti­mized for writ­ing, not for query­ing.

Developers write con­sole.log(“Pay­ment failed”) be­cause it’s easy in the mo­ment. Nobody thinks about the poor soul who’ll be search­ing for this at 2am dur­ing an out­age.

Before I show you the fix, let me de­fine some terms. These get thrown around a lot, of­ten in­cor­rectly.

Structured Logging: Logs emit­ted as key-value pairs (usually JSON) in­stead of plain strings. {“event”: payment_failed”, user_id”: 123″} in­stead of Payment failed for user 123”. Structured log­ging is nec­es­sary but not suf­fi­cient.

Cardinality: The num­ber of unique val­ues a field can have. user_id has high car­di­nal­ity (millions of unique val­ues). http_method has low car­di­nal­ity (GET, POST, PUT, DELETE, etc.). High car­di­nal­ity fields are what make logs ac­tu­ally use­ful for de­bug­ging.

Dimensionality: The num­ber of fields in your log event. A log with 5 fields has low di­men­sion­al­ity. A log with 50 fields has high di­men­sion­al­ity. More di­men­sions = more ques­tions you can an­swer.

Wide Event: A sin­gle, con­text-rich log event emit­ted per re­quest per ser­vice. Instead of 13 log lines for one re­quest, you emit 1 line with 50+ fields con­tain­ing every­thing you might need to de­bug.

Canonical Log Line: Another term for wide event, pop­u­lar­ized by Stripe. One log line per re­quest that serves as the au­thor­i­ta­tive record of what hap­pened.

I see this take con­stantly: Just use OpenTelemetry and your ob­serv­abil­ity prob­lems are solved.”

No. OpenTelemetry is a pro­to­col and a set of SDKs. It stan­dard­izes how teleme­try data (logs, traces, met­rics) is col­lected and ex­ported. This is gen­uinely use­ful: it means you’re not locked into a spe­cific ven­dor’s for­mat.

...

Read the original on loggingsucks.com »

2 472 shares, 55 trendiness

Mullvad VPN (@mullvadnet@mastodon.online)

To use the Mastodon web ap­pli­ca­tion, please en­able JavaScript. Alternatively, try one of the na­tive apps for Mastodon for your plat­form.

...

Read the original on mastodon.online »

3 431 shares, 1 trendiness

Privacy is Marketing. Anonymity is Architecture.

Every com­pany says they care about your pri­vacy.” It’s in every pri­vacy pol­icy, every mar­ket­ing page, every in­vestor deck. But if I can re­set your pass­word via email, I know who you are. If I log your IP, I know where you are. If I re­quire phone ver­i­fi­ca­tion, I have lever­age over you.

That’s not pri­vacy. That’s per­for­mance art.

In 2025, privacy” has be­come the most abused word in tech. It’s slapped on prod­ucts that re­quire gov­ern­ment IDs, ser­vices that log every­thing, and plat­forms that could­n’t pro­tect user data if they tried.

Real anonymity is­n’t a mar­ket­ing claim. It’s an ar­chi­tec­tural de­ci­sion that makes it im­pos­si­ble to com­pro­mise users, even if you wanted to. Even if some­one put a gun to your head. Even if a three-let­ter agency showed up with a war­rant.

Let me show you the dif­fer­ence.

Here’s how the av­er­age privacy-focused” ser­vice ac­tu­ally works:

User Journey:

1. Enter email ad­dress

2. Verify email (now we have your email)

3. Create pass­word (now we can re­set it via email)

4. Add phone for security” (now we have your phone)

5. Confirm iden­tity for fraud pre­ven­tion” (now we have your ID)

6. Enable 2FA (more iden­tity vec­tors)

Privacy Policy:

We care deeply about your pri­vacy and only col­lect

nec­es­sary in­for­ma­tion to pro­vide our ser­vices…”

Translation:

We have every­thing. We log every­thing.

We just promise to be care­ful with it.

The prob­lem is­n’t mal­ice. Most ser­vices gen­uinely try to pro­tect user data. But pro­tec­tion im­plies pos­ses­sion. And pos­ses­sion is the vul­ner­a­bil­ity.

You can’t leak what you don’t have. You can’t be forced to hand over what does­n’t ex­ist.

In 2023, Swedish po­lice raided Mullvad VPNs of­fices with a search war­rant. They wanted user data. Customer in­for­ma­tion. Connection logs. Anything.

Not be­cause Mullvad re­fused to co­op­er­ate. Not be­cause they hid the data. But be­cause there was no data to give. Mullvad’s en­tire iden­tity sys­tem is a ran­domly gen­er­ated ac­count num­ber. No email. No name. No records.

Mullvad’s en­tire au­then­ti­ca­tion sys­tem: 16 ran­dom dig­its. That’s it. That’s the whole iden­tity.

When the po­lice re­al­ized this, they could­n’t even ar­gue. The ar­chi­tec­ture made com­pli­ance im­pos­si­ble. Not dif­fi­cult. Impossible.

That’s what real anonymity looks like.

How We Built The Same Thing

When we de­signed Servury, we asked our­selves: what’s the min­i­mum in­for­ma­tion needed to run a cloud host­ing plat­form?

Turns out, not much:

// What we DON’T col­lect:

- Email ad­dress (no re­cov­ery, no mar­ket­ing, no leaks)

- Name (we don’t care who you are)

- IP ad­dresses (not logged, not stored, not tracked)

- Payment in­for­ma­tion (handled by proces­sors, not us)

- Usage pat­terns (no an­a­lyt­ics, no teleme­try, noth­ing)

- Device fin­ger­prints (your browser, your busi­ness)

- Geographic data (beyond what’s needed for server se­lec­tion)

// What we DO store:

- 32-character cre­den­tial (random al­phanu­meric string)

- Account bal­ance (need to know if you can de­ploy)

- Active ser­vices (servers and prox­ies you’re run­ning)

That’s it. Three data points.

No forgot pass­word” link. No email ver­i­fi­ca­tion. No phone num­ber for account se­cu­rity.” Because every one of those fea­tures re­quires stor­ing iden­tity, and iden­tity is the at­tack sur­face.

The Trade-Off Nobody Talks About

Here’s the part where other privacy” com­pa­nies qui­etly change the sub­ject: lose your cre­den­tial, you’re done.

No re­cov­ery process. No sup­port ticket that can re­store ac­cess. No verify your iden­tity” work­flow. If that 32-character string dis­ap­pears, so does your ac­count.

And you know what? That’s ex­actly the point.

Traditional ser­vice: We can help you re­cover your ac­count by ver­i­fy­ing your iden­tity”

Translation: We know who you are, and we can prove it.

Servury: We lit­er­ally can­not help you re­cover your ac­count”

Translation: We have no idea who you are, and that’s by de­sign.

The in­con­ve­nience of mem­o­riz­ing (or se­curely stor­ing) a ran­dom string is the cost of anonymity. Anyone who tells you that you can have both per­fect anonymity AND easy ac­count re­cov­ery is ly­ing or does­n’t un­der­stand the threat model.

What This Actually Means In Practice

Hi Servury sup­port, I lost ac­cess to my ac­count. Can you help me re­cover it?”

I’m sorry, but we have no way to ver­ify ac­count own­er­ship. If you don’t have your cre­den­tial, the ac­count is in­ac­ces­si­ble to every­one, in­clud­ing us.”

But I can prove it’s me! Here’s my pay­ment re­ceipt, my IP ad­dress, the ex­act time I signed up—”

We don’t store any of that in­for­ma­tion. There’s noth­ing to match against.”

Is this frus­trat­ing for users who lose their cre­den­tials? Absolutely. Is it a fea­ture? Absolutely.

Because on the flip side:

Hackers can’t phish or re­set your cre­den­tials via email

We can’t ac­ci­den­tally leak your per­sonal in­for­ma­tion (because we don’t have it)

No gov­ern­ment can force us to re­veal who you are (because we gen­uinely don’t know)

Email ad­dresses are the orig­i­nal sin of mod­ern in­ter­net iden­tity. They seem harm­less. Universal. Convenient. And they com­pletely de­stroy anonymity.

Why email kills anonymity:

1. Email IS iden­tity

- Tied to phone num­bers

- Tied to pay­ment meth­ods

- Tied to other ser­vices

- Recovery mech­a­nisms ex­pose you

2. Email IS track­able

- Read re­ceipts

- Link track­ing

- Metadata analy­sis

- Cross-service cor­re­la­tion

3. Email IS per­sis­tent

- Exists be­yond sin­gle ser­vice

- Archived for­ever

- Subpoenaed retroac­tively

- Leaked in breaches

4. Email IS so­cial en­gi­neer­ing

- Phishing vec­tor

- Password re­set vul­ner­a­bil­ity

- Support ticket ex­ploita­tion

- Impersonation risk

The mo­ment you re­quire an email ad­dress, you’re not build­ing for anonymity. You’re build­ing for ac­count­abil­ity. And some­times that’s fine! Banks should know who you are. Government ser­vices should ver­ify iden­tity. But cloud in­fra­struc­ture? VPNs? Proxy ser­vices?

We should­n’t need to know a damn thing about you.

Crypto Payments: Not Just For Criminals

We ac­cept cryp­tocur­rency not be­cause we’re try­ing to hide from au­thor­i­ties. We ac­cept it be­cause tra­di­tional pay­ment sys­tems are sur­veil­lance in­fra­struc­ture.

Every credit card trans­ac­tion cre­ates a per­ma­nent record link­ing your iden­tity to your pur­chase. Your bank knows. The pay­ment proces­sor knows. The mer­chant knows. And they all store it. Forever.

Cryptocurrency breaks that chain. Not per­fectly—blockchain analy­sis is a thing—but enough to de­cou­ple pay­ment from per­sis­tent iden­tity. Especially when com­bined with no-email reg­is­tra­tion.

And for those who need tra­di­tional pay­ments? We sup­port Stripe. Because prag­ma­tism mat­ters. But we don’t pre­tend that credit card pay­ments are anony­mous. We’re hon­est about the trade-offs.

Let’s be crys­tal clear about what we’re NOT claim­ing:

Anonymity ≠ Impunity

If you use our servers for il­le­gal ac­tiv­ity, law en­force­ment can still in­ves­ti­gate. They just can’t start with who owns this ac­count” be­cause we can’t an­swer that ques­tion.

Anonymity ≠ Security

Your cre­den­tial is just a ran­dom string. If you save it in plain­text on your desk­top, that’s on you. Anonymity from us does­n’t mean anonymity from your own bad opsec.

Anonymity ≠ Invisibility

Your server has an IP ad­dress. Your proxy con­nec­tions are vis­i­ble. We’re not magic. We just don’t tie those tech­ni­cal iden­ti­fiers back to your per­sonal iden­tity.

Anonymity ≠ Zero Trust Required

You still have to trust that we’re ac­tu­ally do­ing what we say. Open source code, trans­parency re­ports, and in­de­pen­dent au­dits help, but per­fect trust­less­ness is im­pos­si­ble in hosted in­fra­struc­ture.

...

Read the original on servury.com »

4 369 shares, 45 trendiness

Ibrahim Diallo

Microsoft won’t let you dis­miss the up­grade no­ti­fi­ca­tion

So sup­port for Windows 10 has ended. Yes, mil­lions of users are still on it. One of my main lap­tops runs Windows 10. I can’t up­date to Windows 11 be­cause of the hard­ware re­quire­ments. It’s not that I don’t have enough RAM, stor­age, or CPU power. The hard­ware lim­i­ta­tion is specif­i­cally TPM 2.0.

What is TPM 2.0, you say? It stands for Trusted Platform Module. It’s ba­si­cally a se­cu­rity chip on the moth­er­board that en­ables some se­cu­rity fea­tures. It’s good and all, but Windows says my lap­top does­n’t sup­port it. Great! Now leave me alone.

Well, every time I turn on my com­puter, I get a re­minder that I need to up­date to Windows 11. OK, at this point a Windows ma­chine only be­longs to you in name. Microsoft can run ar­bi­trary code on it. They al­ready ran the code to de­cide that my com­puter does­n’t sup­port Windows 11. So why do they keep both­er­ing me?

Fine, I’m frus­trated. That’s why I’m com­plain­ing. I’ve ac­cepted the fact that my pow­er­ful, yet 10-year-old lap­top won’t get the lat­est up­date. But if Microsoft’s own sys­tems have de­ter­mined my hard­ware is in­com­pat­i­ble, why are they ha­rass­ing? I’ll just have to dis­miss this no­ti­fi­ca­tion and call it a day.

But wait a minute. How do I dis­miss it?

I can­not dis­miss it. I can only be re­minded later or… I have to learn more. If I click remind me later,” I’m ba­si­cally telling Microsoft that I con­sent to be­ing shown the same mes­sage again when­ever they feel like it. If I click learn more”? I’m taken to the Windows Store, where I’m shown ads for dif­fer­ent lap­tops I can buy in­stead. Apparently, I’m also prob­a­bly giv­ing them con­sent to show me this ad the next time I log in.

It’s one thing to be at the fore­front of en­shit­ti­fi­ca­tion, but Microsoft is now ac­tively hos­tile to its users. I’ve writ­ten about this pas­sive-ag­gres­sive il­lu­sion of choice be­fore. They are ba­si­cally ask­ing Do you want to buy a new lap­top?” And the op­tions they are pre­sent­ing are Yes” and OK.”

This is­n’t a bug. This is in­ten­tional de­sign. Microsoft has de­lib­er­ately re­moved the abil­ity to de­cline.

Listen. You said my de­vice does­n’t sup­port Windows 11. You’re right. Now leave me alone. I have an­other de­vice run­ning Windows 11. It’s fes­tered with ads, and you’re try­ing every­thing in your power to get me to cre­ate a Microsoft ac­count.

I paid for that com­puter. I also paid for a pro ver­sion of the OS. I don’t want OneDrive. I don’t want to sign up with my Microsoft ac­count. Whether I use my com­puter on­line or of­fline is none of your busi­ness. In fact, if you want me to cre­ate an ac­count on your servers, you are first re­quired to reg­is­ter your OS on my own web­site. The terms and con­di­tions are sim­ple. Every time you per­form any net­work ac­cess, you have to send a copy of the pay­load and re­sponse back to my server. Either that, or you’re in breach of my terms.

By the way, the ap­pli­ca­tion show­ing this no­ti­fi­ca­tion is called Reusable UX Interaction Manager some­times. Other times it ap­pears as Campaign Manager.

...

Read the original on idiallo.com »

5 366 shares, 19 trendiness

Ruby Programming Language

Want to learn more or try Ruby?

Try Ruby

Why do pro­gram­mers around the world love Ruby? What makes it fun?

Rich gems sup­port all kinds of de­vel­op­ment.

Mature tool­ing ready to use.

Ruby has a vast col­lec­tion of li­braries called gems, sup­port­ing every­thing from web de­vel­op­ment to data pro­cess­ing.

With ma­ture frame­works like Rails and com­pre­hen­sive tool­chains, you can com­bine ex­cel­lent ex­ist­ing re­sources

to build high-qual­ity ap­pli­ca­tions quickly with­out rein­vent­ing the wheel.

When I re­leased Ruby to the world, I never imag­ined such a rich ecosys­tem would grow from it.

Over 200,000 gems, Ruby on Rails, RSpec, Bundler—it was the com­mu­nity that cre­ated and nur­tured all of these.

My wish to make pro­gram­mers happy” has been re­al­ized in ways I could never have achieved alone.

Easy to write, easy to read.

Natural syn­tax like spo­ken lan­guage.

Ruby has a sim­ple and in­tu­itive syn­tax that reads like nat­ural lan­guage.

By elim­i­nat­ing com­plex sym­bols and ver­bose con­structs, Ruby’s de­sign phi­los­o­phy al­lows you to ex­press what you want di­rectly.

With min­i­mal boil­er­plate and high read­abil­ity, it’s friendly to be­gin­ners and main­tain­able for ex­pe­ri­enced de­vel­op­ers.

Ruby is just the most beau­ti­ful pro­gram­ming lan­guage I have ever seen.

And I pay a fair amount of at­ten­tion to new pro­gram­ming lan­guages that are com­ing up,

new en­vi­ron­ments, new frame­works, and I’ve still yet to see any­thing that meets or beats Ruby in its pure­ness of its de­sign.

Do more with less code.

Intuitive syn­tax ac­cel­er­ates de­vel­op­ment.

Ruby’s ex­pres­sive syn­tax al­lows you to write com­plex logic con­cisely.

By lever­ag­ing pow­er­ful fea­tures like metapro­gram­ming and blocks, you can re­duce rep­e­ti­tion and fo­cus on solv­ing core prob­lems.

With rich test­ing frame­works, you can main­tain qual­ity while achiev­ing rapid de­vel­op­ment cy­cles.

Ruby turns ideas into code fast.

Its sim­plic­ity keeps me fo­cused; its ex­pres­sive­ness lets me write the way I think.

It feels like the lan­guage gets out of the way, leav­ing just me and the prob­lem.

With great tools and li­braries, ideas quickly be­come run­ning, el­e­gant, code.

The Ruby com­mu­nity em­braces the cul­ture of Matz is nice and so we are nice (MINASWAN),”

wel­com­ing every­one from be­gin­ners to ex­perts. Conferences and mee­tups around the world fos­ter knowl­edge shar­ing and con­nec­tions.

It’s a warm, sus­tain­able com­mu­nity where peo­ple help each other and grow to­gether.

The Ruby com­mu­nity is filled with tal­ent and cre­ativ­ity, de­vel­op­ers at­tracted to Ruby’s el­e­gant syn­tax who pro­gram for the joy of it.

It’s a vi­brant, wel­com­ing com­mu­nity will­ing to share this love of pro­gram­ming with every­one.

This spirit of warmth and col­lab­o­ra­tion is hands down Ruby’s great­est as­set.

People who en­gage with Ruby be­yond be­ing just users are called Rubyists.

Rubyists who love Ruby are all nice #rubyfriends. Community ac­tiv­i­ties are thriv­ing and fun.

The uni­ver­sal motto is MINASWAN — Matz is nice and so we are nice

Learn more about the com­mu­nity

We are pleased to an­nounce the re­lease of Ruby 4.0.0-preview3.

Ruby 4.0 in­tro­duces Ruby::Box and ZJIT, and adds many im­prove­ments.

We are pleased to an­nounce the re­lease of Ruby 4.0.0-preview2. Ruby 4.0 up­dates its Unicode ver­sion to 17,0.0, and so on.

CVE-2025-24294: Possible Denial of Service in re­solv gem

...

Read the original on www.ruby-lang.org »

6 350 shares, 34 trendiness

HackerNews Readings

...

Read the original on hackernews-readings-613604506318.us-west1.run.app »

7 305 shares, 25 trendiness

WalletWallet — Create Apple Passes for Free

A sim­ple util­ity to con­vert phys­i­cal bar­codes into dig­i­tal passes for Apple Wallet®. Entirely free and runs di­rectly from your browser.

Position the QR code within the frame

...

Read the original on walletwallet.alen.ro »

8 241 shares, 43 trendiness

[Revised] You Don’t Need to Spend $100/mo on Claude Code

[Revised] You Don’t Need to Spend $100/mo on Claude Code: Your Guide to Local Coding ModelsWhat you need to know about lo­cal model tool­ing and the steps for set­ting one up your­self[Edit 1] This ar­ti­cle has been edited af­ter ini­tial re­lease for clar­ity. Both the tl;dr and the end sec­tion have added in­for­ma­tion.[Edit 2] This hy­poth­e­sis was ac­tu­ally wrong and thank you to every­one who com­mented! Here’s a full ex­pla­na­tion of where I went wrong. I want to ad­dress this mis­take as I re­al­ize it might have a mean­ing­ful im­pact on some­one’s fi­nan­cial po­si­tion. I’m not edit­ing the ac­tual ar­ti­cle ex­cept where ab­solutely nec­es­sary so it does­n’t look like I’m cov­er­ing up the mis­take—I want to ad­dress it. Instead, I’ve in­cluded the im­por­tant in­for­ma­tion be­low. There is one take­away this ar­ti­cle pro­vides that def­i­nitely holds true:Lo­cal mod­els are far more ca­pa­ble than they’re given credit for, even for cod­ing.It also ex­plains the process of set­ting up a lo­cal cod­ing model and tech­ni­cal in­for­ma­tion about do­ing so which is help­ful for any­one want­ing to set up a lo­cal cod­ing model. I would still rec­om­mend do­ing so.But do I want some­one read­ing this to im­me­di­ately drop their cod­ing sub­scrip­tion and buy a maxed out MacBook Pro? No, and for that rea­son I need to cor­rect my hy­poth­e­sis from Yes, with caveats’ to No’.This ar­ti­cle was not an em­pir­i­cal as­sess­ment, but should have been to make these claims. Here’s where I went wrong:While lo­cal mod­els can likely com­plete ~90% of the soft­ware de­vel­op­ment tasks that some­thing like Claude Code can, the last 10% is the most im­por­tant. When it comes to your job, that last 10% is worth pay­ing more for to get that last bit of per­for­mance.I re­al­ized I looked at this more from the an­gle of a hob­bi­est pay­ing for these cod­ing tools. Someone do­ing lit­tle side pro­jects—not some­one in a pro­duc­tion set­ting. I did this be­cause I see a lot of peo­ple sign­ing up for $100/mo or $200/mo cod­ing sub­scrip­tions for per­sonal pro­jects when they likely don’t need to. I would not rec­om­mend run­ning lo­cal mod­els as a com­pany in­stead of giv­ing em­ploy­ees ac­cess to a tool like Claude Code.While larger lo­cal mod­els are very ca­pa­ble, as soon as you run other de­vel­op­ment tools (Docker, etc.) that also eat into your RAM, your model needs to be much smaller and be­comes a lot less ca­pa­ble. I did­n’t fac­tor this in in my ex­per­i­ment.So, re­ally, the take­away should be that these are in­cred­i­ble sup­ple­men­tal mod­els to fron­tier mod­els when cod­ing and could po­ten­tially save you on your sub­scrip­tion by drop­ping it down a tier, but prac­ti­cally they’re not worth the ef­fort in sit­u­a­tions that might af­fect your liveli­hood.Ex­actly a month ago, I made a hy­poth­e­sis: Instead of pay­ing $100/mo+ for an AI cod­ing sub­scrip­tion, my money would be bet­ter spent up­grad­ing my hard­ware so I can run lo­cal cod­ing mod­els at a frac­tion of the price (and have bet­ter hard­ware too!).So, to cre­ate by far the most ex­pen­sive ar­ti­cle I’ve ever writ­ten, I put my money where my mouth is and bought a MacBook Pro with 128 GB of RAM to get to work. My idea was sim­ple: Over the life of the MacBook I’d re­coup the costs of it by not pay­ing for an AI cod­ing sub­scrip­tion.Af­ter weeks of ex­per­i­ment­ing and set­ting up lo­cal AI mod­els and cod­ing tools, I’ve come to the con­clu­sion that my hy­poth­e­sis was cor­rect, with nu­ance, not cor­rect [see edit 2 above] which I’ll get into later in this ar­ti­cle.In this ar­ti­cle, we cover:Why lo­cal mod­els mat­ter and the ben­e­fits they pro­vide.How to view mem­ory us­age and make es­ti­mates for which mod­els can run on your ma­chine and the RAM de­mands for cod­ing ap­pli­ca­tions.Walk through set­ting up your own lo­cal cod­ing model and tool step-by-step.Don’t worry if you don’t have a high-RAM ma­chine! You can still fol­low this guide. I’ve in­cluded some mod­els to try out with a lower mem­ory al­lot­ment. I think you’ll be sur­prised at how per­for­mant even the small­est of mod­els is. In fact, there has­n’t re­ally been a time dur­ing this ex­per­i­ment that I’ve been dis­ap­pointed with model per­for­mance.If you’re only here for the lo­cal cod­ing tool setup, skip to the sec­tion at the bot­tom. I’ve even in­cluded a link to my mod­elfiles in that sec­tion to make setup even eas­ier for you. Otherwise, let’s get into what you need to know.Lo­cal cod­ing mod­els are very ca­pa­ble. Using the right model and the right tool­ing feels only half a gen­er­a­tion be­hind the fron­tier cloud tools. I would say that for about 90% of de­vel­oper work lo­cal mod­els are more than suf­fi­cient. Even small 7B pa­ra­me­ter mod­els can be very ca­pa­ble. [Edited to add in this next part] Local mod­els won’t com­pete with fron­tier mod­els at the peak of per­for­mance, but can com­plete many cod­ing tasks just as well for a frac­tion of the cost. They’re worth run­ning to bring costs down on plenty of tasks but po­ten­tially not worth us­ing if there’s a free tier avail­able that per­forms bet­ter.Tools mat­ter a lot. This is where I ex­pe­ri­enced the most dis­ap­point­ment. I tried many dif­fer­ent tools with many dif­fer­ent mod­els and spent a lot of time tin­ker­ing. I ran into sit­u­a­tions where the mod­els would­n’t call tools prop­erly or their think­ing traces would­n’t close. Both of these ren­dered the tool es­sen­tially use­less. Currently, tool­ing seems very finicky and if there’s any­thing de­vel­op­ers need to be suc­cess­ful, it’s good tools.There’s a lot to con­sider when you’re ac­tu­ally work­ing within hard­ware con­straints. We take the tool­ing set up for us in the cloud for granted. When set­ting up lo­cal mod­els, I had to think a lot about trade-offs in per­for­mance ver­sus mem­ory us­age, how dif­fer­ent tools com­pared and af­fected per­for­mance, nu­ances in types of mod­els, how to quan­tize, and other user-fac­ing fac­tors such as time-to-first-to­ken and to­kens per sec­ond.Google threw a wrench into my hy­poth­e­sis. The lo­cal setup is al­most a no-brainer when com­pared to a $100/mo+ sub­scrip­tion. Compared to free or nearly-free tool­ing (such as Gemini CLI, Jules, or Antigravity) there is­n’t quite as strong of a mon­e­tary jus­ti­fi­ca­tion to spend more on hard­ware. There are ben­e­fits to lo­cal mod­els out­side of code, though, and I dis­cuss those be­low. If the tl;dr was help­ful, don’t for­get to sub­scribe to get more in your in­box.You might won­der why lo­cal mod­els are worth in­vest­ing in at all. The ob­vi­ous an­swer is cost. By us­ing your own hard­ware, you don’t need to pay a sub­scrip­tion fee to a cloud provider for your tool. There are also a few less ob­vi­ous and un­der­rated rea­sons that make lo­cal mod­els use­ful.First: Reliability. Each week there seems to be com­plaints about per­for­mance re­gres­sion within AI cod­ing tools. Many spec­u­late com­pa­nies are pulling tricks to save re­sources that hurt model per­for­mance. With cloud providers, you’re at the mercy of the provider for when this hap­pens. With lo­cal mod­els, this only hap­pens when you cause it to.Sec­ond: Local mod­els can ap­ply to far more ap­pli­ca­tions. Just the other day I was hav­ing a dis­cus­sion with my dad about AI tool­ing he could use to stream­line his work. His job re­quires study­ing a lot of data—a per­fect ap­pli­ca­tion for an LLM-based tool—but his com­pany blocks tools like Gemini and ChatGPT be­cause a lot of this analy­sis is done on in­tel­lec­tual prop­erty. Unfortunately, he is­n’t pro­vided a suit­able al­ter­na­tive to use.With a lo­cal model, he would­n’t have to worry about these IP is­sues. He could run his analy­ses with­out data ever leav­ing his ma­chine. Of course, any tool call­ing would also need to en­sure data never leaves the ma­chine, but lo­cal mod­els get around one of the largest hur­dles for use­ful en­ter­prise AI adop­tion. Running mod­els on a lo­cal ma­chine opens up an en­tire world of pri­vacy- and se­cu­rity-cen­tric AI ap­pli­ca­tions that are ex­pen­sive for cloud providers to pro­vide.Fi­nally: Availability. Local mod­els are avail­able to you as long as your ma­chine is. This means no wor­ry­ing about your provider be­ing down or rate lim­it­ing you due to high traf­fic. It also means us­ing AI cod­ing tools on planes or in other sit­u­a­tions where in­ter­net ac­cess is locked down (think highly se­cure net­works).While lo­cal mod­els do pro­vide sig­nif­i­cant cost sav­ings, the flex­i­bil­ity and re­li­a­bil­ity they pro­vide can be even more valu­able.To get go­ing with lo­cal mod­els you must un­der­stand the mem­ory needed to run them on your ma­chine. Obviously, if you have more mem­ory you’ll be able to run bet­ter mod­els, but un­der­stand­ing the nu­ances of that mem­ory man­age­ment will help you pick out the right model for your use case.Lo­cal AI has two parts that eat up your mem­ory: The model it­self and the mod­el’s con­text win­dow.The ac­tual model has bil­lions of pa­ra­me­ters and all those pa­ra­me­ters need to fit into your mem­ory at once. Excellent lo­cal cod­ing mod­els start at around 30 bil­lion (30B, for short) pa­ra­me­ters in size. By de­fault, these mod­els use 16 bits to rep­re­sent pa­ra­me­ters. At 16 bits with 30B pa­ra­me­ters, a model will take 60 GB of space in RAM (16 bits = 2 bytes per pa­ra­me­ter, 30 bil­lion pa­ra­me­ters = 60 bil­lion bytes which equals about 60 GB).The sec­ond (and po­ten­tially larger) mem­ory con­sum­ing part of lo­cal AI is the mod­el’s con­text win­dow. This is the model in­puts and out­puts that are stored so the model can ref­er­ence them in fu­ture re­quests. This gives the model mem­ory.When cod­ing with AI, we pre­fer this win­dow to be as large as it can be­cause we need to fit our code­base (or pieces of it) within our con­text win­dow. This means we tar­get a con­text win­dow of 64,000 to­kens or larger. All of these to­kens will also be stored in RAM.The im­por­tant thing to un­der­stand about con­text win­dows is that the mem­ory re­quire­ment per-to­ken for a model de­pends on the size of that model. Models with more pa­ra­me­ters tend to have large ar­chi­tec­tures (more hid­den lay­ers and larger di­men­sions to those lay­ers). Larger ar­chi­tec­tures mean the model must store more in­for­ma­tion for each to­ken within its key-value cache (context win­dow) be­cause it stores in­for­ma­tion for each to­ken for each layer.This means choos­ing an 80B pa­ra­me­ter model over a 30B pa­ra­me­ter model re­quires more mem­ory for the model it­self and also more mem­ory for the same size con­text win­dow. For ex­am­ple, a 30B pa­ra­me­ter model might have a hid­den di­men­sion of 5120 with 64 lay­ers while an 80B model has a hid­den di­men­sion of 8192 with 80 lay­ers. Doing some back-of-the-nap­kin math shows us that the larger model re­quires ap­prox­i­mately 2x more RAM to main­tain the same con­text win­dow as the 30B pa­ra­me­ter model (see for­mula be­low).Luck­ily, there are tricks to bet­ter man­age mem­ory. First, there are ar­chi­tec­tural changes that can be made to make model in­fer­ence more ef­fi­cient so it re­quires less mem­ory. The model we set up at the end of this ar­ti­cle uses Hybrid Attention which en­ables a much smaller KV cache en­abling us to fit our model and con­text win­dow in less mem­ory. I won’t get into more de­tail in this ar­ti­cle, but you can read more about that model and how it works here.The sec­ond trick is quan­tiz­ing the val­ues you’re work­ing with. Quantization means con­vert­ing a con­tin­u­ous set of val­ues into a smaller amount of dis­tinct val­ues. In our case, that means tak­ing a set of num­bers rep­re­sented by a cer­tain num­ber of bits (16, for ex­am­ple) and re­duc­ing it to a set of num­bers rep­re­sented by fewer bits (8, for ex­am­ple). To put it sim­ply, in our case we’re con­vert­ing the num­bers rep­re­sent­ing our model to a smaller bit rep­re­sen­ta­tion to save mem­ory while keep­ing the value rep­re­sen­ta­tions within the model rel­a­tively equal.You can quan­tize both your model weights and the val­ues stored in your con­text win­dow. When you quan­tize your model weights, you remove in­tel­li­gence” from the model be­cause it’s less pre­cise in its rep­re­sen­ta­tion of in­nate in­for­ma­tion. I’ve also found the per­for­mance hit when go­ing from 16 to 8 bits within the model to be much less than 8 to 4.We can also quan­tize the val­ues in our con­text win­dow to re­duce its mem­ory re­quire­ment. This means we’re less pre­cisely rep­re­sent­ing the mod­el’s mem­ory. Generally speak­ing, KV cache (context win­dow) quan­ti­za­tion is con­sid­ered more de­struc­tive to model per­for­mance than weight quan­ti­za­tion be­cause it causes the model to for­get de­tails in long rea­son­ing traces. Thus, you should test quan­tiz­ing the KV cache to en­sure it does­n’t de­grade model per­for­mance for your spe­cific task.In re­al­ity, like the rest of ma­chine learn­ing, op­ti­miz­ing lo­cal model per­for­mance is an ex­per­i­men­ta­tion process and real-world ma­chine learn­ing re­quires un­der­stand­ing the prac­ti­cal lim­i­ta­tions and ca­pa­bil­i­ties of mod­els when ap­plied to spe­cific ap­pli­ca­tions.Here are a few more fac­tors to un­der­stand when set­ting up a lo­cal cod­ing model on your hard­ware:In­struct mod­els are post-trained to be well-suited for chat-based in­ter­ac­tions. They’re given chat pair­ings in their train­ing to be op­ti­mized for ex­cel­lent back-and-forth chat out­put. Non-instruct mod­els are still trained LLMs, but fo­cus on next-to­ken pre­dic­tion in­stead of chat­ting with a user. For our case, when us­ing a chat-based cod­ing tool (CLI or chat agent in your IDE) we need to use an in­struct model. If you’re set­ting up an au­to­com­plete model, you’ll want to find a model specif­i­cally post-trained for it (such as Qwen2.5-Coder-Base or DeepSeek-Coder-V2).You need a tool to serve your lo­cal LLM for your cod­ing tool to send it re­quests. On a MacBook, there are two pri­mary op­tions: MLX and Ollama.Ollama is the in­dus­try stan­dard and works on non-Mac hard­ware. It’s a great serv­ing setup on top of llama.cpp that makes model serv­ing al­most plug-and-play. Users can down­load model weights from Ollama eas­ily and can con­fig­ure mod­elfiles with cus­tom pa­ra­me­ters for serv­ing. Ollama can also serve a model once and make it avail­able to mul­ti­ple tools.MLX is a Mac-specific frame­work for ma­chine learn­ing that is op­ti­mized specif­i­cally for Mac hard­ware. It also re­trieves mod­els for the user from a com­mu­nity col­lec­tion. I’ve found Ollama to be very re­li­able in its model cat­a­log, while MLXs cat­a­log is com­mu­nity sourced and can some­times be miss­ing spe­cific mod­els. Models are sourced from the com­mu­nity so a user can con­vert a model to MLX for­mat them­selves. MLX re­quires a bit more setup on the user’s end, but serves mod­els faster be­cause it does­n’t have a layer pro­vid­ing the niceties of Ollama on top of it.Ei­ther of these is great, but I chose MLX to max­i­mize what I can get with my RAM, but Ollama is prob­a­bly the more be­gin­ner-friendly tool here.Time-to-first-to­ken and to­kens per sec­ondIn real-world LLM ap­pli­ca­tions it’s im­por­tant that the model is able to serve its first to­ken for a re­quest in a rea­son­able amount of time and con­tinue serv­ing to­kens at a speed that en­ables the user to use the model for its given pur­pose. If we have a high-per­for­mance model run­ning lo­cally, but it only serves a few to­kens per sec­ond, it would­n’t be use­ful for cod­ing.This is some­thing taken for granted with cloud-hosted mod­els that is a real con­sid­er­a­tion when work­ing lo­cally on con­strained hard­ware. Another rea­son I chose MLX as my serv­ing plat­form is be­cause it served to­kens up to 20% faster than Ollama. In re­al­ity, Ollama served to­kens fast enough so I don’t think us­ing MLX is nec­es­sary specif­i­cally for this rea­son for the mod­els I tried.There are many ways to op­ti­mize lo­cal mod­els and save RAM. It’s dif­fi­cult to know which op­ti­miza­tion method works best and the im­pact each has on a model es­pe­cially when us­ing them in tan­dem with other meth­ods.The right op­ti­miza­tion method also de­pends on the ap­pli­ca­tion. In my ex­pe­ri­ence, I find it best to pri­or­i­tize larger mod­els with more ag­gres­sive model quan­ti­za­tion over smaller mod­els with more pre­cise model weights. Since our ap­pli­ca­tion is cod­ing, I would also pri­or­i­tize a less-quan­tized KV cache and us­ing a smaller model to en­sure rea­son­ing works prop­erly while not sac­ri­fic­ing the size of our con­text win­dow.There are many tools to code with lo­cal mod­els and I sug­gest try­ing un­til you find one you like. Some top rec­om­men­da­tions are OpenCode, Aider, Qwen Code, Roo Code, and Continue. Make sure to use a tool com­pat­i­ble with OpenAI’s API stan­dard. While this should be most tools, this en­sures a con­sis­tent model/​tool con­nec­tion. This makes it eas­ier to switch be­tween tools and mod­els as needed.I’ll spare you the trial and er­ror I ex­pe­ri­enced get­ting this set up. The one thing I learned is that tool­ing mat­ters a lot. Not all cod­ing tools are cre­ated equal and not all of the mod­els in­ter­act with tools equally. I ex­pe­ri­enced many times where tool call­ing or even run­ning a tool at all was bro­ken. I also had to tin­ker quite a bit with many of them to get them to work.If you’re a PC en­thu­si­ast, an apt com­par­i­son to set­ting up lo­cal cod­ing tools ver­sus us­ing the cloud of­fer­ings avail­able is the dif­fer­ence be­tween set­ting up a MacBook ver­sus a Linux Laptop. With the Linux lap­top, you might get well through the dis­tro in­stal­la­tion only to find that the dri­vers for your track­pad aren’t yet sup­ported. Sometimes it felt like that with lo­cal mod­els and hook­ing them to cod­ing tools.For my tool, I ended up go­ing with Qwen Code. It was pretty plug-and-play as it’s a fork of Gemini CLI. It sup­ports the OpenAI com­pat­i­bil­ity stan­dard so I can eas­ily sub in dif­fer­ent mod­els and af­fords me all of the niceties built into Gemini CLI that I’m fa­mil­iar with us­ing. I also know it’ll be sup­ported be­cause both the Qwen team and Google DeepMind are be­hind the tool. The tool is also open source so any­one can sup­port it as needed.For mod­els, I fo­cused on GPT-OSS and Qwen3 mod­els since they were around the size I was look­ing for and had great re­views for cod­ing. I ended up de­cid­ing to use Qwen3-Coder mod­els be­cause I found it per­formed best and be­cause GPT-OSS fre­quently gave me I can­not ful­fill this re­quest” re­sponses when I asked it to build fea­tures.I de­cided to serve my lo­cal mod­els on MLX, but if you’re us­ing a non-Mac de­vice give Ollama a shot. A MacBook is an ex­cel­lent ma­chine for serv­ing lo­cal mod­els be­cause of its uni­fied mem­ory ar­chi­tec­ture. This means the RAM can be al­lot­ted to the CPU or GPU as needed. MacBooks can also be con­fig­ured with a ton of RAM. For serv­ing lo­cal cod­ing mod­els, more is al­ways bet­ter.I’ve shared my mod­elfiles repo for you to ref­er­ence and use as needed. I’ve got a script set up that au­to­mates much of the be­low process. Feel free to fork it and cre­ate your own mod­elfiles or star it to come back later.In­stall MLX or down­load Ollama (the rest of this guide will con­tinue with MLX but de­tails for serv­ing on Ollama can be found here).In­crease the VRAM lim­i­ta­tion on your MacBook. ma­cOS will au­to­mat­i­cally limit VRAM to 75% of the to­tal RAM. We want to use more than that. Run sudo sysctl iogpu.wired_lim­it_mb=110000 in your ter­mi­nal to set this up (adjust the mb set­ting ac­cord­ing to the RAM on your MacBook). This needs to be set each time you restart your MacBook.Serve the model as an OpenAI com­pat­i­ble API us­ing python -m mlx_lm.server –model mlx-com­mu­nity/​Qwen3-Next-80B-A3B-In­struct-8bit. This com­mand both runs the server and down­loads the model for you if you haven’t yet. This par­tic­u­lar model is what I’m us­ing with 128GB of RAM. If you have less RAM, check out smaller mod­els such as mlx-com­mu­nity/​Qwen3-4B-In­struct-2507-4bit (8 GB RAM), mlx-com­mu­nity/​Qwen2.5-14B-In­struct-4bit (16 GB RAM), mlx-com­mu­nity/​Qwen3-Coder-30B-A3B-In­struct-4bit (32 GB RAM), or mlx-com­mu­nity/​Qwen3-Next-80B-A3B-In­struct-4bit (64-96 GB RAM).Download Qwen Code. You might need to in­stall Node Package Manager for this. I rec­om­mend us­ing Node Version Manager (nvm) for man­ag­ing your npm ver­sion.Set up your tool to ac­cess an OpenAI com­pat­i­ble API by en­ter­ing the fol­low­ing set­tings:Base URL: http://​lo­cal­host:8080/​v1 (should be the de­fault MLX serves your model at)Model Name: mlx-com­mu­nity/​Qwen3-Next-80B-A3B-In­struct-8bit (or whichever model you chose).Voila! Your cod­ing model tool should be work­ing with your lo­cal cod­ing model.I rec­om­mend open­ing Activity Monitor on your Mac to mon­i­tor mem­ory us­age. I’ve had cases where I thought a model should fit within my mem­ory al­lot­ment but it did­n’t and I ended up us­ing a lot of swap mem­ory. When this hap­pens your model will run very slowly.One tip I have for us­ing lo­cal cod­ing mod­els: Focus on man­ag­ing your con­text. This is a great skill even with cloud-based mod­els. People tend to YOLO their chats and fill their con­text win­dow, but I’ve found greater per­for­mance by en­sur­ing that just what my model needs is sit­ting in my con­text win­dow. This is even more im­por­tant with lo­cal mod­els that may need an ex­tra boost in per­for­mance and are lim­ited in their con­text.My orig­i­nal hy­poth­e­sis was: Instead of pay­ing $100/mo+ for an AI cod­ing sub­scrip­tion, my money would be bet­ter spent up­grad­ing my hard­ware so I can run lo­cal cod­ing mod­els at a frac­tion of the price.I would ar­gue thatno [see edit 2 above], it is cor­rect. If we crunch the num­bers, a MacBook with 128 GB is $4700 plus tax. If I spend $100/mo for 5 years, a cod­ing sub­scrip­tion would cost $6000 in that same amount of time. Not only do I save money, but I also get a much more ca­pa­ble ma­chine for any­thing else I want to do with it.[This para­graph was added in af­ter ini­tial re­lease of this ar­ti­cle] It’s im­por­tant to note that lo­cal mod­els will not reach the peak per­for­mance of fron­tier mod­els; how­ever, they will likely be able to do most tasks just as well. The value of us­ing a lo­cal model does­n’t come from raw per­for­mance, but from sup­ple­ment­ing the cost of higher per­for­mance mod­els. A lo­cal model could very well let you drop your sub­scrip­tion tier for a fron­tier cod­ing tool or uti­lize a free tier as needed for bet­ter per­for­mance and run the rest of your tasks for free.It’s also im­por­tant to note that lo­cal mod­els are only go­ing to get bet­ter and smaller. This is the worst your lo­cal cod­ing model will per­form. I also would­n’t be sur­prised if cloud-based AI cod­ing tools get more ex­pen­sive. If you fig­ure you’re us­ing greater than the $100/mo tier right now or that the $100/mo tier will cost $200/mo in the fu­ture, the pur­chase is a no-brainer. It’s just dif­fi­cult to stom­ach the up­front cost.From a per­for­mance stand­point, I would say the max­i­mum model run­ning on my 128 GB RAM MacBook right now feels about half a gen­er­a­tion be­hind the fron­tier cod­ing tools. That’s ex­cel­lent, but some­thing to keep in mind as that half a gen­er­a­tion might mat­ter to you.One wrench thrown into my ex­per­i­ment is how much free quota Google hands out with their dif­fer­ent AI cod­ing tools. It’s easy to pur­chase ex­pen­sive hard­ware when it saves you money in the long run. It’s much more dif­fi­cult when the al­ter­na­tive is free.Ini­tially, I con­sid­ered my lo­cal cod­ing setup to be a great pair to Google’s free tier. It def­i­nitely per­forms bet­ter than Gemini 2.5 Flash and makes a great com­pan­ion to Gemini 3 Pro. Gemini 3 Pro can solve more com­plex tasks with the lo­cal model do­ing every­thing else. This not only saves quota on 3 Pro but also pro­vides a very ca­pa­ble fall­back for when quota is hit.How­ever, this is foiled a bit now that Gemini 3 Flash was just an­nounced a few days ago. It shows bench­mark num­bers much more ca­pa­ble than Gemini 2.5 Flash (and even 2.5 Pro!) and I’ve been very im­pressed with its per­for­mance. If that’s the free tier Google of­fers, it makes lo­cal cod­ing mod­els less fis­cally rea­son­able. The jury is still out on how well Gemini 3 Flash will per­form and how quota will be struc­tured, but we’ll have to see if lo­cal mod­els can keep up.I’m very cu­ri­ous to hear what you think! Tell me about your lo­cal cod­ing setup or ask any ques­tions be­low.

...

Read the original on www.aiforswes.com »

9 226 shares, 23 trendiness

You're Not Burnt Out. You're Existentially Starving.

Those who have a Why’ to live, can bear with al­most any How’.”

Your life is go­ing pretty darn well by any ob­jec­tive met­ric.

Nice place to live. More than enough stuff. Family and friends who love you.

But you’re tired, burnt out, and more.

It feels like you’re stuck in the or­di­nary when all you want to do is chase great­ness.

Viktor Frankl calls this feel­ing the existential vac­uum” in his fa­mous book Man’s Search for Meaning. Frankl was a psy­chol­o­gist who sur­vived the Holocaust, and in this book he ex­plains that the in­mates who sur­vived with him found and fo­cused on a higher pur­pose in life, like car­ing for other in­mates and promis­ing to stay alive to re­con­nect with loved ones out­side the camps. But these sur­vivors also strug­gled in their new lives af­ter the war, des­per­ately search­ing for mean­ing when every de­ci­sion was no longer life or death.

Frankl re­al­ized that this ex­is­ten­tial anx­i­ety is not a nui­sance to elim­i­nate, but ac­tu­ally an im­por­tant sig­nal point­ing us to­wards our need for mean­ing. Similarly, while Friedrich Nietzsche would ar­gue that life in­her­ently lacks mean­ing, he’d also im­plore us to zoom out and find our high­est pur­pose now:

This is the most ef­fec­tive way: to let the youth­ful soul look back on life with the ques­tion, What have you up to now truly loved, what has drawn your soul up­ward, mas­tered it and blessed it too?’… for your true be­ing lies not deeply hid­den within you, but an in­fi­nite height above you, or at least above that which you com­monly take to be your­self.“

Nihilists get both Nietzsche and YOLO wrong. Neither mean that you give up. Instead, both mean that your ef­forts are every­thing.

So when you get those Sunday Scaries, the ex­is­ten­tial anx­i­ety that your time is end­ing and the rest of your life is spent work­ing for some­one else, the an­swer is­n’t es­capism.

Instead, vi­su­al­ize your ideal self, the truest child­hood dream of who you wanted to be when you grew up. What would that per­son be do­ing now? Go do that thing!

When fac­ing the ex­is­ten­tial vac­uum, there’s only one way out — up, to­wards your high­est pur­pose.

On a 0-10 scale, how happy did you feel when you started work­ing this Monday?

You got the great job. You built the startup. You took the va­ca­tions. But that’s not what you re­ally needed. You kept com­ing back Monday af­ter Monday re­al­iz­ing you were do­ing the same job again.

So you tried to im­prove your­self. You op­ti­mized your morn­ing rou­tine. You per­fected your pro­duc­tiv­ity sys­tem. You bought a sleep mask and mouth tape. Yet you’re still drag­ging your­self out of bed each Monday morn­ing tired and un­mo­ti­vated.

We’re op­ti­miz­ing for less suf­fer­ing in­stead of more mean­ing. We’ve con­fused com­fort with ful­fill­ment. And we’re get­ting re­ally, re­ally good at it. Millennials are the first gen­er­a­tion in his­tory to ex­pect our jobs to pro­vide a higher mean­ing be­yond sur­vival. That’s a good thing. It means that the es­sen­tials of life are nearly uni­ver­sally avail­able now.

But, as I write in my book Positive Politics:

The last two hun­dred years of progress pulled most of the world’s pop­u­la­tion over the poverty line. The next hun­dred years is about lift­ing every­one above the abun­dance line… Positive Politics seeks to de­moc­ra­tize this abun­dance.“

Those of us who have al­ready achieved abun­dance in our own lives now have two re­spon­si­bil­i­ties:

Spread that abun­dance to as many other peo­ple as pos­si­ble.

Find some­thing more mean­ing­ful to do than chase more stuff.

The ex­is­ten­tial vac­uum is a wide­spread phe­nom­e­non of the twen­ti­eth cen­tury

When I was a kid, I knew ex­actly what I wanted to do — the most im­por­tant job in the world. And I was­n’t afraid to tell you ei­ther. At five years old, I would talk your ear off about train­ing to be goalie for the St. Louis Blues. By seven, it was as­tro­naut for NASA. By eleven, it was President of the United States. Then mid­dle school hit, I got made fun of more than a few times, and that voice went silent.

After three star­tups, three non­prof­its, and es­pe­cially three kids knocked the im­poster syn­drome out of me, I spent a lot of time train­ing my in­ner voice to get loud again. And what I heard re­in­forced what I knew all along — that my high­est pur­pose is way above where I com­monly take my­self now.

Imposter syn­drome can be a good thing. That ex­ter­nal voice say­ing this is not you” may ac­tu­ally be telling you the truth. I got into the test­ing lab in­dus­try to save our fam­ily busi­ness. Fifteen years and three star­tups later, I had be­come the lab ex­pert” to the world. But I cringed at that la­bel. First, there was no room to grow. I had al­ready done it. I did­n’t want to be eighty and still run­ning labs. Second, and most im­por­tantly, I knew that my skills could be used for much more than money.

I’d love to say I trans­formed overnight, but re­ally it took 5+ years from 2020 to 2025 for me to fully em­body my new iden­tity. You can see it in my writ­ing, which be­came much more am­bi­tious in 2020, when I re­launched this site and started blog­ging con­sis­tently. That led to my World’s Biggest Problems pro­ject, which con­vinced me that Positive Politics is the #1 so­lu­tion we need now!

There are two key com­po­nents to my high­est mis­sion now:

Be a model for the pur­suit of great­ness.

That means con­sis­tently chas­ing my high­est pur­pose — helping am­bi­tious op­ti­mists get into pol­i­tics! After nearly a decade of do­ing this be­hind the scenes as a po­lit­i­cal vol­un­teer and ad­vi­sor, 2025 was the first year where I went full-time in pol­i­tics. Leading MCFN and pub­lish­ing Positive Politics at the same time was a ton of work. But noth­ing en­er­gizes me more than fight­ing two of the biggest bat­tles in the world now — an­ti­cor­rup­tion and Positive Politics!

I love pol­i­tics be­cause it’s full of meta so­lu­tions — solutions that cre­ate more so­lu­tions. My Positive Politics Accelerator is a clas­sic ex­am­ple — recruiting and train­ing more am­bi­tious op­ti­mists into pol­i­tics will lead to them mak­ing pos­i­tive po­lit­i­cal change at all lev­els of gov­ern­ment. But I’ve also tack­led chal­lenges like in­de­pen­dent test­ing with star­tups and led a non­profit to drive in­ves­tiga­tive jour­nal­ism.

There are so many paths to pos­i­tive im­pact, in­clud­ing pol­i­tics, star­tups, non­prof­its, med­i­cine, law, ed­u­ca­tion, sci­ence, en­gi­neer­ing, jour­nal­ism, art, faith, par­ent­ing, men­tor­ship, and more! Choose the path that both best fits you now and is pointed to­wards your long-term high­est pur­pose.

I woke up to­day so ex­cited to get to work think­ing it was Monday morn­ing al­ready. Instead of jump­ing right into it, I spent all morn­ing mak­ing break­fast and play­ing with my kids, then wrote this post. When I’m writ­ing about some­thing per­sonal, 1,000+ words can eas­ily flow for me in an af­ter­noon. This part will be done just in time to go to a nerf bat­tle birth­day party with my boys and their friends.

Both the hus­tle and anti-hus­tle cul­tures get it wrong. Working long hours is­n’t in­her­ently good or bad. If I re­ally had to count how much I’m on” vs. do­ing what­ever I want, it’s easy 100+ hours per week. But that in­cludes every­thing from in­ves­tiga­tive jour­nal­ism and op­er­a­tions work for MCFN, so­cial me­dia and speak­ing events for Positive Politics, read­ing and writ­ing for my site, and 40+ hours every week with my kids.

I want to help more am­bi­tious op­ti­mists chase your high­est po­ten­tial! Whether the best so­lu­tion is in star­tups, pol­i­tics, non­prof­its, sci­ence, crypto, or some new tech­nol­ogy that’s yet to be in­vented, I’m happy to point you where I think you’ll be most pow­er­ful. I’ve thought, writ­ten, and worked on many of these ideas in my 15+ year ca­reer.

Now with 10+ years of writ­ing, I’ve fo­cused on pub­licly in­spir­ing more peo­ple to take on these chal­lenges too. We should be flex­i­ble on how we solve the prob­lems but firm in our re­solve to con­sis­tently or­ga­nize peo­ple and launch so­lu­tions.

As Steve Jobs said, Life can be much broader once you dis­cover one sim­ple fact, and that is every­thing around you that you call life’ was made up by peo­ple that were no smarter than you… You can change it, you can mold it… the most im­por­tant thing…is to shake off this er­ro­neous no­tion that life is there and you’re just go­ing to live in it, ver­sus em­brace it, change it, im­prove it, make your mark upon it… Once you learn that, you’ll never be the same again.”

Remember how it felt as a young child to openly tell the world about your dream job? Find the work that makes you feel this way and jump on what­ever rung of that ca­reer lad­der you can start now. The pay may be a lit­tle lower, but the ex­is­ten­tial pay­off will be ex­po­nen­tially higher for the rest of your life.

You don’t have to go all-in right away! In fact, af­ter a long diet of low ex­is­ten­tial work, it’s prob­a­bly best to ease into pub­lic work. You can even vol­un­teer one hour or less per week for a po­lit­i­cal cam­paign or non­profit to get started. Pick the small­est first step, and do it. Not in January, now. Do it be­fore the end of the year. And see how dif­fer­ent you feel when 2026 starts!

And you don’t have to choose pol­i­tics like me! Do you have the next great am­bi­tious op­ti­mistic sci­ence fic­tion novel in your head? That book could spark movies and move­ments that pos­i­tively change mil­lions of lives! Choose the path will in­spire and en­er­gize you for decades!

What mat­ters most is you go straight to­wards your high­est po­ten­tial right now. Pause once a month to make sure you’re still on the right track. Stop once a year to triple-check you’re on the right track. But never get off this path to­wards your high­est po­ten­tial. Anything else will starve you ex­is­ten­tially.

When you truly chase your high­est po­ten­tial, every­thing you thought was burnout will melt away. Because you weren’t suf­fer­ing from too much work, you were suf­fer­ing from too lit­tle truly im­por­tant work. Like a boy who thought he was full un­til dessert ar­rives, you’ll sud­denly find your hunger re­turn!

If you’re sick of pol­i­tics as usual and ready to change the sys­tem, join Positive Politics!

...

Read the original on neilthanedar.com »

10 222 shares, 8 trendiness

Measuring AI Ability to Complete Long Tasks

Summary: We pro­pose mea­sur­ing AI per­for­mance in terms of the length of tasks AI agents can com­plete. We show that this met­ric has been con­sis­tently ex­po­nen­tially in­creas­ing over the past 6 years, with a dou­bling time of around 7 months. Extrapolating this trend pre­dicts that, in un­der a decade, we will see AI agents that can in­de­pen­dently com­plete a large frac­tion of soft­ware tasks that cur­rently take hu­mans days or weeks.

We think that fore­cast­ing the ca­pa­bil­i­ties of fu­ture AI sys­tems is im­por­tant for un­der­stand­ing and prepar­ing for the im­pact of pow­er­ful AI. But pre­dict­ing ca­pa­bil­ity trends is hard, and even un­der­stand­ing the abil­i­ties of to­day’s mod­els can be con­fus­ing.

Current fron­tier AIs are vastly bet­ter than hu­mans at text pre­dic­tion and knowl­edge tasks. They out­per­form ex­perts on most exam-style prob­lems for a frac­tion of the cost. With some task-spe­cific adap­ta­tion, they can also serve as use­ful tools in many ap­pli­ca­tions. And yet the best AI agents are not cur­rently able to carry out sub­stan­tive pro­jects by them­selves or di­rectly sub­sti­tute for hu­man la­bor. They are un­able to re­li­ably han­dle even rel­a­tively low-skill, com­puter-based work like re­mote ex­ec­u­tive as­sis­tance. It is clear that ca­pa­bil­i­ties are in­creas­ing very rapidly in some sense, but it is un­clear how this cor­re­sponds to real-world im­pact.

We find that mea­sur­ing the length of tasks that mod­els can com­plete is a help­ful lens for un­der­stand­ing cur­rent AI ca­pa­bil­i­ties. This makes sense: AI agents of­ten seem to strug­gle with string­ing to­gether longer se­quences of ac­tions more than they lack skills or knowl­edge needed to solve sin­gle steps.

On a di­verse set of multi-step soft­ware and rea­son­ing tasks, we record the time needed to com­plete the task for hu­mans with ap­pro­pri­ate ex­per­tise. We find that the time taken by hu­man ex­perts is strongly pre­dic­tive of model suc­cess on a given task: cur­rent mod­els have al­most 100% suc­cess rate on tasks tak­ing hu­mans less than 4 min­utes, but suc­ceed

For each model, we can fit a lo­gis­tic curve to pre­dict model suc­cess prob­a­bil­ity us­ing hu­man task length. After fix­ing a suc­cess prob­a­bil­ity, we can then con­vert each mod­el’s pre­dicted suc­cess curve into a time du­ra­tion, by look­ing at the length of task where the pre­dicted suc­cess curve in­ter­sects with that prob­a­bil­ity. For ex­am­ple, here are fit­ted suc­cess curves for sev­eral mod­els, as well as the lengths of tasks where we pre­dict a 50% suc­cess rate:

We think these re­sults help re­solve the ap­par­ent con­tra­dic­tion be­tween su­per­hu­man per­for­mance on many bench­marks and the com­mon em­pir­i­cal ob­ser­va­tions that mod­els do not seem to be ro­bustly help­ful in au­tomat­ing parts of peo­ple’s day-to-day work: the best cur­rent mod­els—such as Claude 3.7 Sonnet—are ca­pa­ble of some tasks that take even ex­pert hu­mans hours, but can only re­li­ably com­plete tasks of up to a few min­utes long.

That be­ing said, by look­ing at his­tor­i­cal data, we see that the length of tasks that state-of-the-art mod­els can com­plete (with 50% prob­a­bil­ity) has in­creased dra­mat­i­cally over the last 6 years.

If we plot this on a log­a­rith­mic scale, we can see that the length of tasks mod­els can com­plete is well pre­dicted by an ex­po­nen­tial trend, with a dou­bling time of around 7 months.

Our es­ti­mate of the length of tasks that an agent can com­plete de­pends on method­olog­i­cal choices like the tasks used and the hu­mans whose per­for­mance is mea­sured. However, we’re fairly con­fi­dent that the over­all trend is roughly cor­rect, at around 1-4 dou­blings per year. If the mea­sured trend from the past 6 years con­tin­ues for 2-4 more years, gen­er­al­ist au­tonomous agents will be ca­pa­ble of per­form­ing a wide range of week-long tasks.

The steep­ness of the trend means that our fore­casts about when dif­fer­ent ca­pa­bil­i­ties will ar­rive are rel­a­tively ro­bust even to large er­rors in mea­sure­ment or in the com­par­isons be­tween mod­els and hu­mans. For ex­am­ple, if the ab­solute mea­sure­ments are off by a fac­tor of 10x, that only changes the ar­rival time by around 2 years.

We dis­cuss the lim­i­ta­tions of our re­sults, and de­tail var­i­ous ro­bust­ness checks and sen­si­tiv­ity analy­ses in the full pa­per. Briefly, we show that sim­i­lar trends hold (albeit more nois­ily) on:

Various sub­sets of our tasks that might rep­re­sent dif­fer­ent dis­tri­b­u­tions (very short soft­ware tasks vs the di­verse HCAST vs RE-Bench, and sub­sets fil­tered by length or qual­i­ta­tive as­sess­ments of messiness”).

A sep­a­rate dataset based on real tasks (SWE-Bench Verified), with in­de­pen­dently col­lected hu­man time data based on es­ti­mates rather than base­lines. This shows an even faster dou­bling time, of un­der 3 months.

We also show in the pa­per that our re­sults do not ap­pear to be es­pe­cially sen­si­tive to which tasks or mod­els we in­clude, nor to any other method­olog­i­cal choices or sources of noise that we in­ves­ti­gated:

However, there re­mains the pos­si­bil­ity of sub­stan­tial model er­ror. For ex­am­ple, there are rea­sons to think that re­cent trends in AI are more pre­dic­tive of fu­ture per­for­mance than pre-2024 trends. As shown above, when we fit a sim­i­lar trend to just the 2024 and 2025 data, this short­ens the es­ti­mate of when AI can com­plete month-long tasks with 50% re­li­a­bil­ity by about 2.5 years.

We be­lieve this work has im­por­tant im­pli­ca­tions for AI bench­marks, fore­casts, and risk man­age­ment.

First, our work demon­strates an ap­proach to mak­ing bench­marks more use­ful for fore­cast­ing: mea­sur­ing AI per­for­mance in terms of the length of tasks the sys­tem can com­plete (as mea­sured by how long the tasks take hu­mans). This al­lows us to mea­sure how mod­els have im­proved over a wide range of ca­pa­bil­ity lev­els and di­verse do­mains. At the same time, the di­rect re­la­tion­ship to real-world out­comes per­mits a mean­ing­ful in­ter­pre­ta­tion of ab­solute per­for­mance, not just rel­a­tive per­for­mance.

Second, we find a fairly ro­bust ex­po­nen­tial trend over years of AI progress on a met­ric which mat­ters for real-world im­pact. If the trend of the past 6 years con­tin­ues to the end of this decade, fron­tier AI sys­tems will be ca­pa­ble of au­tonomously car­ry­ing out month-long pro­jects. This would come with enor­mous stakes, both in terms of po­ten­tial ben­e­fits and po­ten­tial risks.

We’re very ex­cited to see oth­ers build on this work and push the un­der­ly­ing ideas for­ward, just as this re­search builds on prior work on eval­u­at­ing AI agents. As such, we have open sourced our in­fra­struc­ture, data and analy­sis code. As men­tioned above, this di­rec­tion could be highly rel­e­vant to the de­sign of fu­ture eval­u­a­tions, so repli­ca­tions or ex­ten­sions would be highly in­for­ma­tive for fore­cast­ing the real-world im­pacts of AI.

In ad­di­tion, METR is hir­ing! This pro­ject in­volved most staff at METR in some way, and we’re cur­rently work­ing on sev­eral other pro­jects we find sim­i­larly ex­cit­ing. If you or some­one that you know would be a good fit for this kind of work, please see the listed roles.

...

Read the original on metr.org »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.