10 interesting stories served every morning and every evening.
...
Read the original on hackerbook.dosaygo.com »
Donations are a key part of what keeps F-Droid independent and reliable and our latest hardware update is a direct result of your support. Thanks to donations from our incredible community, F-Droid has replaced one of its most critical pieces of infrastructure, our core server hardware. It was overdue for a refresh, and now we are happy to give you an update on the new server and how it impacts the project.
This upgrade touches a core part of the infrastructure that builds and publishes apps for the main F-Droid repository. If the server is slow, everything downstream gets slower too. If it is healthy, the entire ecosystem benefits.
This server replacement took a bit longer than we would have liked. The biggest reason is that sourcing reliable parts right now is genuinely hard. Ongoing global trade tensions have made supply chains unpredictable, and that hit the specific components we needed. We had to wait for quotes, review, replan, and wait again when quotes turned out to have unexpected long waits, before we finally managed to receive hardware that met our requirements.
Even with the delays, the priority never changed. We were looking for the right server set up for F-Droid, built to last for the long haul.
Another important part of this story is where the server lives and how it is managed. F-Droid is not hosted in just any data center where commodity hardware is managed by some unknown staff. We worked out a special arrangement so that this server is physically held by a long time contributor with a proven track record of securely hosting services. We can control it remotely, we know exactly where it is, and we know who has access. That level of transparency and trust is not common in infrastructure, but it is central to how we think about resilience and stewardship.
This was not the easiest path, and it required careful coordination and negotiation. But we are glad we did it this way. It fits our values and our threat model, and it keeps the project grounded in real people rather than anonymous systems.
The previous server was 12 year old hardware and had been running for about five years. In infrastructure terms, that is a lifetime. It served F-Droid well, but it was reaching the point where speed and maintenance overhead were becoming a daily burden.
The new system is already showing a huge improvement. Stats of the running cycles from the last two months suggest it can handle the full build and publish actions much faster than before. E.g. this year, between January and September, we published updates once every 3 or 4 days, that got down to once every 2 days in October, to every day in November and it’s reaching twice a day in December. (You can see this in the frequency of index publishing after October 18, 2025 in our f-droid.org transparency
log). That extra capacity gives us more breathing room and helps shorten the gap between when apps are updated and when those updates reach users. We can now build all the auto-updated
apps in the
(UTC) morning in one cycle, and all the newly included apps, fixed apps and manually updated apps, through the day, in the evening cycle.
We are being careful here, because real world infrastructure always comes with surprises. But the performance gains are real, and they are exciting.
This upgrade exists because of community support, pooled over time, turned into real infrastructure, benefiting everyone who relies on F-Droid.
A faster server does not just make our lives easier. It helps developers get timely builds. It reduces maintenance risk. It strengthens the health of the entire repository.
So thank you. Every donation, whether large or small, is part of how this project stays reliable, independent, and aligned with free software values.
...
Read the original on f-droid.org »
Weather has always significantly influenced my life. When I was a young athlete, knowing the forecast in advance would have allowed me to better plan my training sessions. As I grew older, I could choose whether to go to school on my motorcycle or, for safety reasons, have my grandfather drive me. And it was him, my grandfather, who was my go-to meteorologist. He followed all weather patterns and forecasts, a remnant of his childhood in the countryside and his life on the move. It’s to him that I dedicate FediMeteo.
The idea for FediMeteo started almost by chance while I was checking the holiday weather forecast to plan an outing. Suddenly, I thought how nice it would be to receive regular weather updates for my city directly in my timeline. After reflecting for a few minutes, I registered a domain and started planning.
The choice of operating system was almost automatic. The idea was to separate instances by country, and FreeBSD jails are one of the most useful tools for this purpose.
I initially thought the project would generate little interest. I was wrong. After all, weather affects many of our lives, directly or indirectly. So I decided to structure everything in this way:
* I would use a test VPS to see how things would go. The VPS was a small VM on a German provider with 4 shared cores, 4GB of RAM, 120GB of SSD disk space, and a 1Gbit/sec internet connection and now is a 4 euro per month VPS in Milano, Italy - 4 shared cores, 8 GB RAM and 75GB disk space.
* I would separate various countries into different instances, for both management and security reasons, as well as to have the possibility of relocating just some of them if needed.
* Weather data would come from a reliable and open-source friendly source. I narrowed it down to two options: wttr.in and Open-Meteo, two solutions I know and that have always given me reliable results.
* I would pay close attention to accessibility: forecasts would be in local languages, consultable via text browsers, with emojis to give an idea even to those who don’t speak local languages, and everything would be accessible without JavaScript or other requirements. One’s mother tongue is always more “familiar” than a second language, even if you’re fluent.
* I would manage everything according to Unix philosophy: small pieces working together. The more years pass, the more I understand how valuable this approach is.
* The software chosen to manage the instances is snac. Snac embodies my philosophy of minimal and effective software, perfect for this purpose. It provides clear web pages for those who want to consult via the web, “speaks” the ActivityPub protocol perfectly, produces RSS feeds for each user (i.e., city), has extremely low RAM and CPU consumption, compiles in seconds, and is stable. The developer is an extremely helpful and positive person, and in my opinion, this carries equal weight as everything else.
* I would do it for myself. If there was no interest, I would have kept it running anyway, without expanding it. So no anxiety or fear of failure.
I started setting up the first “pieces” during the days around Christmas 2024. The scheme was clear: each jail would handle everything internally. A Python script would download data, city by city, and produce markdown. The city coordinates would be calculated via the geopy library and passed to wttr.in and Open-Meteo. No data would be stored locally. This approach gives the ability to process all cities together. Just pass the city and country to the script, and the markdown would be served. At that point, snac comes into play: without the need to use external utilities, the “snac note” command allows posting from stdin by specifying the instance directory and the user to post from. No need to make API calls with external utilities, having to manage API keys, permissions, etc.
To simplify things, I first structured the jail for Italy. I made a list of the main cities, normalizing them. For example, La Spezia became la_spezia. Forlì, with an accent, became forli - this for maximum compatibility since each city would be a snac user. I then created a script that takes this list and creates snac users via “snac adduser.” At that point, after creating all the users, the script would modify the JSON of each user to convert the city name to uppercase, insert the bio (a standard text), activate the “bot” flag, and set the avatar, which was the same for all users at the time. This script is also able to add a new city: just run the script with the (normalized) name of the city, and it will add it - also adding it to the “cities.txt” file, so it will be updated in the next weather update cycle.
I then created the heart of the service. A Python application (initially only in Italian, then multilingual, separating the operational part from the text) able to receive (via command line) the name of a city and a country code (corresponding to the file with texts in the local language). The script determines the coordinates and then, using API calls, requests the current weather conditions, those for the next 12 hours, and the next 7 days. I conducted experiments with both wttr.in and Open-Meteo, and both gave good results. However, I settled on Open-Meteo because, for my uses, it has always provided very reliable results. This application directly provides an output in Markdown since snac supports it, at least partially.
The cities.txt file is also crucial for updates. I created a script - post.sh, in pure sh, that scrolls through all cities, and for each one, launches the FediMeteo application and publishes its output using snac directly via command line. Once the job is finished, it makes a call to my instance of Uptime-Kuma, which keeps an eye on the situation. In case of failure, the monitoring will alert me that there have been no recent updates, and I can check.
At this point, the system cron takes care of launching post.sh every 6 hours. The requests are serialized, so the cities will update one at a time, and the posts will be sent to followers.
After listing all Italian provincial capitals, I started testing everything. It worked perfectly. Of course, I had to make some adjustments at all levels. For example, one of the problems encountered was that snac did not set the language of the posts, and some users could have missed them. The developer was very quick and, as soon as I exposed the problem, immediately modified the program so that the post could keep the system language, set as an environment variable in the sh script.
After two days, I decided to start adding other countries and announce the project. And the announcement was unexpectedly well received: there were many boosts, and people started asking me to add their cities or countries. I tried to do what I could, within the limits of my physical condition, as in those days, I had the flu that kept me at home with a fever and illness for several days. I started adding many countries in the heart of Europe, translating the main indications into local languages but maintaining emojis so that everything would be understandable even to those who don’t speak the local language. There were some small problems reported by some users. One of them: not all weather conditions had been translated, so sometimes they appeared in Italian - as well as errors. In bilingual countries, I tried to include all local languages. Sometimes, unfortunately, making mistakes as I encountered dynamics unknown to me or difficult to interpret. For example, in Ireland, forecasts were published in Irish, but it was pointed out to me that not everyone speaks it, so I modified and published in English.
The turning point was when FediFollows (@FediFollows@social.growyourown.services - who also manages the site Fedi Directory) started publishing the list of countries and cities, highlighting the project. Many people became aware of FediMeteo and started following the various accounts, the various cities. And from here came requests to add new countries and some new information, such as wind speed. Moreover, I was asked (rightly, to avoid flooding timelines) to publish posts as unlisted - this way, followers would see the posts, but they wouldn’t fill local timelines. Snac didn’t support this, but again, the snac dev came to my rescue in a few hours.
But with new countries came new challenges. For example, in my original implementation, all units of measurement were in metric/decimal/Celsius - and this doesn’t adapt well to realities like the USA. Moreover, focusing on Europe, almost all countries were located in a single timezone, while for larger countries (such as Australia, USA, Canada, etc.), this is totally different. So I started developing a more complete and global version and, in the meantime, added almost all of Europe. The new version would have to be backward compatible, would have to take into account timezone differences for each city, different measurements (e.g., degrees C and F), as well as, initially more difficult part, being able to separate cities with the same name based on states or provinces. I had already seen a similar problem with the implementation of support for Germany, so it had to be addressed properly.
The original goal was to have a VPS for each continent, but I soon realized that thanks to the quality of snac’s code and FreeBSD’s efficient management, even keeping countries in separate jails, the load didn’t increase much. So I decided to challenge myself and the limits of the economical 4 euros per month VPS. That is, to insert as much as possible until seeing what the limits were. Limits that, to date, I have not yet reached. I would also soon exhaust the available API calls for Open-Meteo’s free accounts, so I tried to contact the team and explain everything. I was positively surprised to read that they appreciated the project and provided me with a dedicated API key.
Compatible with my free time, I managed to complete the richer and more complete version of my Python program. I’m not a professional dev, I’m more oriented towards systems, so the code is probably quite poor in the eyes of an expert dev. But, in the end, it just needs to take an input and give me an output. It’s not a daemon, it’s not a service that responds on the network. For that, snac takes care of it.
So I decided to start with a very important launch: the USA and Canada. A non-trivial part was identifying the main cities in order to cover, state by state, all the territory. In the end, I identified more than 1200 cities. A number that, by itself, exceeded the sum of all other countries (at that time). And the program, now, is able to take an input with a separator (two underscores: __) between city and state. In this way, it’s possible to perfectly understand the differences between city and state: new_york__new_york is an example I like to make, but there are many.
The launch of the USA was interesting: despite having had many previous requests, the reception was initially quite lukewarm, to my extreme surprise. The number of followers in Canada, in a few hours, far exceeded that of the USA. On the contrary, the country with the most followers (in a few days, more than 1000) was Germany. Followed by the UK - which I expected would have been the first.
The VPS held up well. Except for the moments when FediFollows launched (after fixing some FreeBSD tuning, the service slowed slightly but didn’t crash), the load remained extremely low. So I continued to expand: Japan, Australia, New Zealand, etc.
At the time of the last update of this article (30 December 2025), the supported countries are 38: Argentina, Australia, Austria, Belgium, Brazil, Bulgaria, Canada, Croatia, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, India, Ireland, Italy, Japan, Latvia, Lithuania, Malta, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, Taiwan, the United Kingdom, and the United States of America (with more regions coming soon!).
Direct followers in the Fediverse are around 7,707 and growing daily, excluding those who follow hashtags or cities via RSS, whose number I can’t estimate. However, a quick look at the logs suggests there are many more.
The cities currently covered are 2937 - growing based on new countries and requests.
There have been some problems. The most serious, by my fault, was the API key leak: I had left a debug code active and, the first time Open-Meteo had problems, the error message also included the API call - including the API key. Some users reported it to me (others just mocked) and I fixed the code and immediately reported everything to the Open-Meteo team, who kindly gave me a new API Key and deactivated the old one.
A further problem was related to geopy. It makes a call to Nominatim to determine coordinates. One of the times Nominatim didn’t respond, my program wasn’t able to determine the position and went into error. I solved this by introducing coordinate caching: now the program, the first time it encounters a city, requests and saves the coordinates. If present, they will be used in the future without making a new request via geopy. This is both lighter on their servers and faster and safer for us.
And the VPS? It has no problems and is surprisingly fast and effective. FreeBSD 14.3-RELEASE, BastilleBSD to manage the jails. Currently, there are 39 jails - one for haproxy, the FediMeteo website, so nginx, and the snac instance for FediMeteo announcements and support - the other 38 for the individual instances. Each of them, therefore, has its autonomous ZFS dataset. Every 15 minutes, there is a local snapshot of all datasets. Every hour, the homepage is regenerated: a small script calculates the number of followers (counting, instance by instance, the followers of individual cities, since I don’t publish except in aggregate to avoid possible triangulations and privacy leaks of users). Every hour, moreover, an external backup is made via zfs-autobackup (on encrypted at rest dataset), and once a day, a further backup is made in my datacenter, on disks encrypted with geli. The occupied RAM is 501 MB (yes, exactly: 501 MB), which rises slightly when updates are in progress. Updates normally occur every 6 hours. I have tried, as much as possible, to space them out to avoid overloads in timelines (or on the server itself). Only for the USA, I added a sleep of 5 seconds between one city and another, to give snac the opportunity to better organize the sending of messages. It probably wouldn’t be necessary, with the current numbers, but better safe than sorry. In this way, the USA is processed in about 2 and a half hours, but the other jails (thus countries) can work autonomously and send their updates.
The average load of the VPS (taking as reference both the last 24 hours and the last two weeks) is about 25%, as it rises to 70/75% when updates occur for larger instances (such as the USA), or when it is announced by FediFollows. Otherwise, it is on average less than 10%. So, the VPS still has huge margin, and new instances, with new nations, will still be inside it.
This article, although in some parts very conversational, aims to demonstrate how it’s possible to build solid, valid, and efficient solutions without the need to use expensive and complex services. Moreover, this is the demonstration of how it’s possible to have your online presence without the need to put your data in the hands of third parties or without necessarily having to resort to complex stacks. Sometimes, less is more.
The success of this project demonstrates, once again, that my grandfather was right: weather forecasts interest everyone. He worried about my health and, thanks to his concerns, we spent time together. In the same way, I see many followers and friends talking to me or among themselves about the weather, their experiences, what happens. Again, in my life, weather forecasts have helped sociality and socialization.
...
Read the original on it-notes.dragas.net »
Libsodium is now 13 years old!
I started that project to pursue Dan Bernstein’s desire to make cryptography simple to use. That meant exposing a limited set of high-level functions and parameters, providing a simple API, and writing documentation for users, not cryptographers. Libsodium’s goal was to expose APIs to perform operations, not low-level functions. Users shouldn’t even have to know or care about what algorithms are used internally. This is how I’ve always viewed libsodium.
Never breaking the APIs is also something I’m obsessed with. APIs may not be great, and if I could start over from scratch, I would have made them very different, but as a developer, the best APIs are not the most beautifully designed ones, but the ones that you don’t have to worry about because they don’t change and upgrades don’t require any changes in your application either. Libsodium started from the NaCl API, and still adheres to it.
These APIs exposed high-level functions, but also some lower-level functions that high-level functions wrap or depend on. Over the years, people started using these low-level functions directly. Libsodium started to be used as a toolkit of algorithms and low-level primitives.
That made me sad, especially since it is clearly documented that only APIs from builds with –enable-minimal are guaranteed to be tested and stable. But after all, it makes sense. When building custom protocols, having a single portable library with a consistent interface for different functions is far better than importing multiple dependencies, each with their own APIs and sometimes incompatibilities between them.
That’s a lot of code to maintain. It includes features and target platforms I don’t use but try to support for the community. I also maintain a large number of other open source projects.
Still, the security track record of libsodium is pretty good, with zero CVEs in 13 years even though it has gotten a lot of scrutiny.
However, while recently experimenting with adding support for batch signatures, I noticed inconsistent results with code originally written in Zig. The culprit was a check that was present in a function in Zig, but that I forgot to add in libsodium.
The function crypto_core_ed25519_is_valid_point(), a low-level function used to check if a given elliptic curve point is valid, was supposed to reject points that aren’t in the main cryptographic group, but some points were slipping through.
Edwards25519 is like a special mathematical playground where cryptographic operations happen.
It is used internally for Ed25519 signatures, and includes multiple subgroups of different sizes (order):
* Order L: the “main subgroup” (L = ~2^252 points) where all operations are expected to happen
* Order 2L, 4L, 8L: very large, but not prime order subgroups
The validation function was designed to reject points not in the main subgroup. It properly rejected points in the small-order subgroups, but not points in the mixed-order subgroups.
To check if a point is in the main subgroup (the one of order L), the function multiplies it by L. If the order is L, multiplying any point by L gives the identity point (the mathematical equivalent of zero). So, the code does the multiplication and checks that we ended up with the identity point.
Points are represented by coordinates. In the internal representation used here, there are three coordinates: X, Y, and Z. The identity point is represented internally with coordinates where X = 0 and Y = Z. Z can be anything depending on previous operations; it doesn’t have to be 1.
The old code only checked X = 0. It forgot to verify Y = Z. This meant some invalid points (where X = 0 but Y ≠ Z after the multiplication) were incorrectly accepted as valid.
Concretely: take any main-subgroup point Q (for example, the output of crypto_core_ed25519_random) and add the order-2 point (0, -1), or equivalently negate both coordinates. Every such Q + (0, -1) would have passed validation before the fix, even though it’s not in the main subgroup.
The fix is trivial and adds the missing check:
Now it properly verifies both conditions: X must be zero and Y must equal Z.
You may be affected if you:
* Use a point release <= 1.0.20 or a version of libsodium released before December 30, 2025.
* Use crypto_core_ed25519_is_valid_point() to validate points from untrusted sources
* Implement custom cryptography using arithmetic over the Edwards25519 curve
But don’t panic. Most users are not affected.
None of the high-level APIs (crypto_sign_*) are affected; they don’t even use or need that function. Scalar multiplication using crypto_scalarmult_ed25519 won’t leak anything even if the public key is not on the main subgroup. And public keys created with the regular crypto_sign_keypair and crypto_sign_seed_keypair functions are guaranteed to be on the correct subgroup.
Support for the Ristretto255 group was added to libsodium in 2019 specifically to solve cofactor-related issues. With Ristretto255, if a point decodes, it’s safe. No further validation is required.
If you implement custom cryptographic schemes doing arithmetic over a finite field group, using Ristretto255 is recommended. It’s easier to use, and as a bonus, low-level operations will run faster than over Edwards25519.
If you can’t update libsodium and need an application-level workaround, use the following function:
This issue was fixed immediately after discovery. All stable packages released after December 30, 2025 include the fix:
A new point release is also going to be tagged.
If libsodium is useful to you, please keep in mind that it is maintained by one person, for free, in time I could spend with my family or on other projects. The best way to help the project would be to consider sponsoring it, which helps me dedicate more time to improving it and making it great for everyone, for many more years to come.
...
Read the original on 00f.net »
MegaLag’s December 2024 video introduced 18 million viewers to serious questions about Honey, the widely-used browser shopping plug-in—in particular, whether Honey abides by the rules set by affiliate networks and merchants, and whether Honey takes commissions that should flow to other affiliates. I wrote in January that I thought Honey was out of line. In particular, I pointed out the contracts that limit when and how Honey may present affiliate links, and I applied those contracts to the behavior MegaLag documented. Honey was plainly breaking the rules.
As it turns out, Honey’s misconduct is considerably worse than MegaLag, I, or others knew. When Honey is concerned that a user may be a tester—a “network quality” employee, a merchant’s affiliate manager, an affiliate, or an enthusiast—Honey designs its software to honor stand down in full. But when Honey feels confident that it’s being used by an ordinary user, Honey defies stand down rules. Multiple methods support these conclusions: I extracted source code from Honey’s browser plugin and studied it at length, plus I ran Honey through a packet sniffer to collect its config files, and I cross-checked all of this with actual app behavior. Details below. MegaLag tested too, and has a new video with his updated assessment.
Behaving better when it thinks it’s being tested, Honey follows in Volkswagen’s “Dieselgate” footsteps. Like Volkswagen, the cover-up is arguably worse than the underlying conduct. Facing the allegations MegaLag presented last year, Honey could try to defend presenting its affiliate links willy-nilly—argue users want this, claim to be saving users money, suggest that network rules don’t apply or don’t mean what they say. But these new allegations are more difficult to defend. Designing its software to perform differently when under test, Honey reveals knowing what the rules require and knowing they’d be in trouble if caught. Hiding from testers reveals that Honey wanted to present affiliate links as widely as possible, despite the rules, so long as it doesn’t get caught. It’s not a good look. Affiliates, merchants, and networks should be furious.
The basic bargain of affiliate marketing is that a publisher presents a link to a user, who clicks, browses, and buys. If the user makes a purchase, commission flows to the publisher whose link was last clicked.
Shopping plugins and other client-side software undermine the basic bargain of affiliate marketing. If a publisher puts software on a user’s computer, that software can monitor where the user browses, present its affiliate link, and always (appear to) be “last”—even if it had minimal role in influencing the customer’s purchase decision.
Affiliate networks and merchants established rules to restore and preserve the bargain between what we might call “web affiliates” versus software affiliates. One, a user has to actually click a software affiliate’s link; decades ago, auto-clicks were common, but that’s long-since banned (yet nonetheless routine from “adware”-style browser plugins— example). Two, software must “stand down”—must not even show its link to users—when some prior web affiliate P has already referred a user to a given merchant. This reflects a balancing of interests: P wants a reasonable opportunity for the user to make a purchase, so P can get paid. If a shopping plugin could always present its offer, the shopping plugin would claim the commission that P had fairly earned. Meanwhile P wouldn’t get sufficient payment for its effort—and might switch to promoting some other merchant with rules P sees as more favorable. Merchants and networks need to maintain a balance in order to attract and retain web affiliates, which are understood to send traffic that’s substantially incremental (customers who wouldn’t have purchased anyway), whereas shopping plugins often take credit for nonincremental purchases. So if a merchant is unsure, it has good reason to err on the side of web affiliates.
All of this was known and understood literally decades ago. Stand-down rules were first established in 2002. Since then, they’ve been increasingly routine, and overall have become clearer and better enforced. Crucially, merchants and networks include stand-down rules in their contracts, making this not just a principle and a norm, but a binding contractual obligation.
How can Honey tell when a user may be a tester? Honey’s code and config files show that they’re using four criteria:
* New accounts. If an account is less than 30 days old, Honey concludes the user might be a tester, so it disables its prohibited behavior.
* Low earnings-to-date. In general, under Honey’s current rules, if an account has less than 65,000 points of Honey earning, Honey concludes the user might be a tester, so it disables its prohibited behavior. Since 1,000 points can be redeemed for $10 of gift cards, this threshold requires having earned $650 worth of points. That sounds like a high requirement, and it is. But it’s actually relatively new: As of June 2022, there was no points requirement for most merchants, and for merchants in Rakuten Advertising, the requirement was just 501 points (about $5 of points). (Details below.)
* Honey periodically checks a server-side blacklist. The server can condition its decision on any factor known to the server, including the user’s Honey ID and cookie, or IP address inside a geofence or on a ban list. Suppose the user has submitted prior complaints about Honey, as professional testers frequently do. Honey can blacklist the user ID, cookie, and IP or IP range. Then any further requests from that user, cookie, or IP will be treated as high-risk, and Honey disables its prohibited behavior.
* Affiliate industry cookies. Honey checks whether a user has cookies indicating having logged into key affiliate industry tools, including the CJ, Rakuten Advertising, and Awin dashboards. If the user has such a cookie, the user is particularly likely to be a tester, so Honey disables its prohibited behavior.
If even one of these factors indicates a user is high-risk, Honey honors stand-down. But if all four pass, then Honey ignores stand-down rules and presents its affiliate links regardless of a prior web publisher’s role and regardless of stand-down rules. This isn’t a probabilistic or uncertain dishonoring of stand-down (as plaintiffs posited in litigation against Honey). Rather, Honey’s actions are deterministic: If a high-risk factor hits, Honey will completely and in every instance honor stand-down; and if no such factor hits, then Honey will completely and in every instance dishonor stand-down (meaning, present its link despite networks’ rules).
These criteria indicate Honey’s attempt to obstruct and frankly frustrate testers. In my experience from two decades of testing affiliate misconduct, it is routine for a tester to install a new shopping plugin on a new PC, create a new account, and check for immediate wrongdoing. By always standing down on new accounts (
The requirement of achieving sufficiently many points similarly stymies many testers. Over the years, a handful of my clients granted a budget for test purchases. But most did not. If I wanted to perform test purchases, I would have needed to spend my own money. Meanwhile, there’s no proper reason why a shopping app should stand down differently for users with points versus new users, so if testers assumed good faith by Honey, they’d have no reason to build up a points history. This criteria certainly prevented testers from uncovering what Honey was doing.
The interaction of account age, points history, and blacklist further stymies testing. Once a tester reports a specific observation of misconduct, Honey can blacklist the tester’s account—which will still do most things normally, but will begin to stand down even though unbanned accounts don’t stand down in the same circumstances. A tester who suspected this, would have to create a new account, wait 30 days, and build up a sizable points balance to see what Honey is actually doing for other users. That’s a genuine barrier, impeding even a tester who correctly proposes this hypothesis.
Honey’s detection of affiliate industry cookies is particularly revealing and, in my view, devastatingly incriminating. Maybe Honey could concoct a story about why it stands down less for established accounts and for high-points-earners. It would be a tall tale, but maybe Honey could convince some people. In contrast, it is inconceivable why a shopping plugin should care whether a user previously logged into an affiliate dashboard. There’s only one reason to check: to hide from testers. But that reveals Honey’s intent and Honey’s knowledge that its conduct is prohibited.
Multiple forms of evidence support my finding of Honey detecting testers. First, consider hands-on testing. With a standard test account with few or no points, Honey honored stand-down. See video 1. But when I tricked the Honey plugin into thinking I had tens of thousands of points (details below about how I did this), Honey popped up despite stand-down rules. See video 2. I repeated this test over multiple days, as to multiple merchants. The finding was the same every time. The only thing I changed between the “video 1” tests and “video 2” tests was the number of points supposedly associated with my account.
To demonstrate Honey checking for affiliate industry cookies, I added a step to my test scenario. With Honey tricked into thinking I had ample points, same as video 2, I began a test run by logging into a CJ portal used by affiliates. In all other respects, my test run was the same as video 2. Seeing the CJ portal cookie, Honey stood down. See video 3.
Some might ask whether the findings in the prior section could be coincidence. Maybe Honey just happened to open in some scenarios and not others. Maybe I’m ascribing intentionality to acts that are just coincidence. Let me offer two responses to this hypothesis. One, my findings are repeatable, countering any claim of coincidence. Second, separate from hands-on testing, three separate types of technical analysis—config files, telemetry, and source code—all confirm the accuracy of the prior section.
Honey retrieves its configuration settings from JSON files on a Honey server. Honey’s core stand-down configuration is in standdown-rules.json, while the selective stand-down—declining to stand down according to the criteria described above—is in the separate config file ssd.json. Here’s the contents of ssd.json as of October 22, 2025, with // comments added by me
{“ssd”: {
“base”: {
“gca”: 1, //enable affiliate console cookie check
“bl”: 1, //enable blacklist check
“uP”: 65000, //min points to disable standdown
“adb”: 26298469858850
“affiliates”: [“https://www.cj.com”, “https://www.linkshare”, “https://www.rakuten.com”, “https://ui.awin.com”, “https://www.swagbucks.com”], //affiliate console cookie domains to check
“LS”: { //override points threshold for LinkShare merchants
“uP”: 5001
“PAYPAL”: {
“uL”: 1,
“uP”: 5000001,
“adb”: 26298469858850
“ex”: { //ssd exceptions
“7555272277853494990”: { //TJ Maxx
“uP”: 5001
“7394089402903213168”: { //booking.com
“uL”: 1,
“adb”: 120000,
“uP”: 1001
“243862338372998182”: { //kayosports
“uL”: 0,
“uP”: 100000
“314435911263430900”: {
“adb”: 26298469858850
“315283433846717691”: {
“adb”: 26298469858850
“GA”: [“CONTID”, “s_vi”, “_ga”, “networkGroup”, “_gid”] //which cookies to check on affiliate console cookie domains
On its own, the ssd config file is not a model of clarity. But source code (discussed below) reveals the meaning of abbreviations in ssd. uP (yellow) refers to user points—the minimum number of points a user must have in order for Honey to dishonor stand-down. Note the current base (default) requirement of uP user points at least 65,000 (green), though the subsequent section LS sets a lower threshold of just 5001 for merchants on the Rakuten Advertising (LinkShare) network. bl set to 1 instructs the Honey plugin to stand down if the server-side blacklist so instructs.
Meanwhile, the affiliates and ex GA data structures (blue), establish the affiliate industry cookie checks mentioned above. The “affiliates” entry lists domain where cookies are to be checked. The ex GA data structure lists which cookie is to be checked for each domain. Though these are presented as two one-dimensional lists, Honey’s code actually checks them in conjunction — checks the first-listed affiliate network domain for the first-listed cookie, then the second, and so forth. One might ask why Honey stored the domain names and cookie names in two separate one-dimensional lists, rather than in a two-dimensional list, name-value pair, or similar. The obvious answer is that Honey’s approach kept the domain names more distant from the cookies on those domains, making its actions that much harder for testers to notice even if they got as far as this config file.
The rest of ex (red) sets exceptions to the standard (“base”) ssd. This lists five specific ecommerce sites (each referenced with an 18-digit ID number previously assigned by Honey) with adjusted ssd settings. For Booking.com and Kayosports, the ssd exceptions set even higher points requirements to cancel standdown (120,000 and 100,000 points, respectively), which I interpret as response to complaints from those sites.
Honey’s telemetry is delightfully verbose and, frankly, easy to understand, including English explanations of what data is being collected and why. Perhaps Google demanded improvements as part of approving Honey’s submission to Chrome Web Store. (Google enforces what it calls “strict guidelines” for collecting user data. Rule 12: data collection must be “necessary for a user-facing feature.” The English explanations are most consistent with seeking to show Google that Honey’s data collection is proper and arguably necessary.) Meanwhile, Honey submitted much the same code to Apple as an iPhone app, and Apple is known to be quite strict in its app review. Whatever the reason, Honey telemetry reveals some important aspects of what it is doing and why.
When a user with few points gets a stand-down, Honey reports that in telemetry with the JSON data structure “method”:”suspend”. Meanwhile, the nearby JSON variable state gives the specific ssd requirement that the user didn’t satisfy—in my video 1: “state”:”uP:5001” reporting that, in this test run, my Honey app had less than 5001 points, and the ssd logic therefore decided to stand down. See video 1 at 0:37-0:41, or screenshots below for convenience. (My network tracing tool converted the telemetry from plaintext to a JSON tree for readability.)
When I gave myself more points (video 2), state instead reported ssd—indicating that all ssd criteria were satisfied, and Honey presented its offer and did not stand down. See video 2 at 0:32.
Finally, when I browsed an affiliate network console and allowed its cookie to be placed on my PC, Honey telemetry reported “state”:“gca”. Like video 1, the state value reports that ssd criteria were not satisfied, in this case because the gca (affiliate dashboard cookie) requirement was triggered, causing ssd to decide to stand down. See video 3 at 1:04-1:14.
In each instance, the telemetry matched identifiers from the config file (ssd, uP, gca). And as I changed from one test run to another, the telemetry transmissions tracked my understanding of Honey’s operation. Readers can check this in my videos: After Honey does or doesn’t stand down, I opened Fiddler to show what Honey reported in telemetry, in each instance in one continuous video take.
As a browser extension, Honey provides client-side code in JavaScript. Google’s Code Readability Requirements allow minification—removing whitespace, shortening variable and function names. Honey’s code is substantial—after deminification, more than 1.5 million lines. But a diligent analyst can still find what’s relevant. In fact the relevant parts are clustered together, and easily found via searches for obvious string such as “ssd”.
In a surprising twist, Honey in one instance released something approaching original code to Apple as an iPhone app. In particular, Honey included sourceMappingURL metadata that allows an analyst to recover original function names and variable names. (Instructions.) That release was from a moment in time, and Honey subsequently made revisions. But where that code is substantially the same as the code currently in use, I present the unobfuscated version for readers’ convenience. Here’s how it works:
return e.next = 7, fetch(“”.concat(“https://s.joinhoney.com”, “/ck/alive”));
If the killswitch returns “alive”, Honey sets the bl value to 0:
c = S().then((function(e) {
e && “alive” === e.is && (o.bl = 0)
The ssd logic later checks this variable bl, among others, to decide whether to cancel standdown.
The core ssd logic is in a long function called R() which runs an infinite loop with a switch syntax to proceed through a series of numbered cases.
function(e) {
for (;;) switch (e.prev = e.next) {
Focusing on the sections relevant to the behavior described above: Honey makes sure the user’s email address doesn’t include the string “test”, and checks whether the user is on the killswitch blacklist.
if (r.email && r.email.match(“test”) && (o.bl = 0), !r.isLoggedIn || t) {
e.next = 7;
break
Honey computes the age of the user’s account by subtracting the account creation date (r.created) from the current time:
case 8:
o.uL = r.isLoggedIn ? 1 : 0, o.uA = Date.now() - r.created;
Honey checks for the most recent time a resource was blocked by an ad blocker:
case 20:
return p = e.sent, l && a.A.getAdbTab(l) ? o.adb = a.A.getAdbTab(l) : a.A.getState().resourceLastBlockedAt > 0 ? o.adb = a.A.getState().resourceLastBlockedAt : o.adb = 0
Honey checks whether any of the affiliate domains listed in the ssd affiliates data structure has the console cookie named in the GA data structure.
m = p.ex && p.ex.GA || []
g = i().map(p.ssd && p.ssd.affiliates, (function(e) {
return f += 1, u.A.get({
name: m[f], //cookie name from GA array
url: e //domain to be checked
}).then((function(e) {
e && (o.gca = 0) //if cookie found, set gca to 0
Then the comparison function P() compares each retrieved or calculated value to the threshold from ssd.json. The fundamental logic is that if any retrieved or calculated value (received in variable e below) is less than the threshold t from ssd, the ssd logic will honor standdown. In contrast, if all four values exceed the threshold, ssd will cancel the standdown. If this function elects to honor standdown, the return value gives the name of the rule (a) and the threshold (s) that caused the decision (yellow highlighting). If this function elects to dishonor standdown, it returns “ssd” (red) (which is the function’s default if not overridden by the logic that folllows). This yields the state= values I showed in telemetry and presented in screenshots and videos above.
function P(e, t) {
var r = “ssd”;
return Object.entries(t).forEach((function(t) {
var n, o, i = (o = 2, _(n = t) || b(n, o) || y(n, o) || g()),
a = i[0], // field name (e.g., uP, gca, adb)
s = i[1]; // threshold value from ssd.json
“adb” === a && (s = s > Date.now() ? s : Date.now() - s), // special handling for adb timestamps
void 0 !== e[a] && e[a] < s && (r = “”.concat(a, ”:“).concat(s))
})), r
Reviewing both config files and code, I was intrigued to see eBay called out for greater protections than others. Where Honey stands down for other merchant and networks for 3,600 seconds (one hour), eBay gets 86,400 seconds (24 hours).
“regex”: “^https?\\:\\/\\/rover\\.ebay((?![\\?\\&]pub=5575133559).)*$”,
“provider”: “LS”,
“overrideBl”: true,
“ttl”: 86400
...
Read the original on vptdigital.com »
Hey everyone, I have been staying at a hotel for a while. It’s one of those modern ones with smart TVs and other connected goodies. I got curious and opened Wireshark, as any tinkerer would do.
I was very surprised to see a huge amount of UDP traffic on port 2046. I looked it up but the results were far from useful. This wasn’t a standard port, so I would have to figure it out manually.
At first, I suspected that the data might be a television stream for the TVs, but the packet length seemed too small, even for a single video frame.
This article is also available in French.
The UDP packets weren’t sent to my IP and I wasn’t doing ARP spoofing, so these packets were sent to everyone. Upon closer inspection, I found out that these were Multicast packets. This basically means that the packets are sent once and received by multiple devices simultaneously. Another thing I noticed was the fact that all of those packets were the same length (634 bytes).
I decided to write a Python script to save and analyze this data. First of all, here’s the code I used to receive Multicast packets. In the following code, 234.0.0.2 is the IP I got from Wireshark.
import socket
import struct
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((‘’, 2046))
mreq = struct.pack(“4sl”, socket.inet_aton(“234.0.0.2″), socket.INADDR_ANY)
s.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)
while True:
data = s.recv(2048)
print(data)
On top of this, I also used binascii to convert this to hex in order make reading the bytes easier. After watching thousands of these packets scroll through the console, I noticed that the first ~15 bytes were the same. These bytes probably indicate the protocol and the packet/command ID but I only received the same one so I couldn’t investigate those.
It also took me an embarrassingly long time to see the string LAME3.91UUUUUUU at the end of the packets. I suspected this was MPEG compressed audio data, but saving one packet as test.mp3 failed to played with mplayer and the file utility only identified this as test.mp3: data. There was obviously data in this packet and file should know when it sees MPEG Audio data, so I decided to write another Python script to save the packet data with offsets. This way it would save the file test1 skipping 1 byte from the packet, test2 skipping 2 bytes and so on. Here’s the code I used and the result.
data = s.recv(2048)
for i in range(25):
open(“test{}”.format(i), “wb+“).write(data[i:])
After this, I ran file test* and voilà! Now we know we have to skip 8 bytes to get to the MPEG Audio data.
$ file test*
test0: data
test1: UNIF v-16624417 format NES ROM image
test10: UNIF v-763093498 format NES ROM image
test11: UNIF v-1093499874 format NES ROM image
test12: data
test13: TTComp archive, binary, 4K dictionary
test14: data
test15: data
test16: UNIF v-1939734368 format NES ROM image
test17: UNIF v-1198759424 format NES ROM image
test18: UNIF v-256340894 format NES ROM image
test19: UNIF v-839862132 format NES ROM image
test2: UNIF v-67173804 format NES ROM image
test20: data
test21: data
test22: data
test23: DOS executable (COM, 0x8C-variant)
test24: COM executable for DOS
test3: UNIF v-1325662462 format NES ROM image
test4: data
test5: data
test6: data
test7: data
test8: MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, JntStereo
test9: UNIF v-2078407168 format NES ROM image
while True:
data = s.recv(2048)
sys.stdout.buffer.write(data[8:])
Now all we need to do is continuously read packets, skip the first 8 bytes, write them to a file and it should play perfectly.
But what was this audio? Was this a sneakily placed bug that listened to me? Was it something related to the smart TVs in my room? Something related to the hotel systems? Only one way to find out.
$ python3 listen_2046.py > test.mp3
* wait a little to get a recording *
^C
$ mplayer test.mp3
MPlayer (C) 2000-2016 MPlayer Team
224 audio & 451 video codecs
Playing test.mp3.
libavformat version 57.25.100 (external)
Audio only file format detected.
Starting playback…
A: 3.9 (03.8) of 13.0 (13.0) 0.7%
What the hell? I can’t believe I spent time for this. It’s just elevator music. It is played in the hotel corridors around the elevators. Oh well, at least I can listen to it from my room now.
Lol, how do nerds listen to elevator music? Nice article.
I know that it’s been almost a decade since then but did you save the elevator music? Kinda curious about it honestly.
This is fascinating. Someone in the thread asked if the elevator controls would be on that port and I can report there is no way. That audio is ging to be from a 3rd party service like Muzak. Nobody wants to manage the royalty system except for specialized companies. But do we have any idea who? Mood Media? ActiveAire? Myndstream?
You can use `binwalk’ to search for file signatures (not quite the same ones as libmagic, but similar ones) at any offset in the file. As an upside, it also detects things like copyright strings and other helpful stuff.
Nice! Question: was the TV connected to an interface on your laptop as a proxy? Else how did you know about the multicast packets?
What made you think to try an offset? Is that a common way these files are formatted?
RESPONDING TO: or interjections of mysterious, echoed “voices from the past” at various intervals…Is there somebody there?…What are you doing?…This can’t be real…the sound of a watermelon being smashed…reverbed laughter…a quiet child’s voice… CREATE A TOOL TO SPOOF THE MULTICAST. BE SPOOKED COMPLAIN DEMAND A REFUND!
as it was UDP protocol, that would be interesting if you did the source spoofing for some fun
now the immediate question i get is can you transmit your own packets and change what music plays?
What a great article. Love the writing style and the digestible deep dive into reversing.
Sir, this is really inspirational.
Lol all for the elevator music. Was thinki g smart TV was snooping on everyone
Well done and very entertaining. A good read and learn.
as most things in life what seems mysterious it’s usually pretty disappointing. Love this article!
You sir, are an inspiration!
Music on hold / elevator music is usually sent as multicast hence why you and the elevator speaker is getting it.
Would have been even funnier if one of the songs was Rick Astley’s “Never Gonna Give You Up”.
How long all this took?
I’m respectfully adding a vote for you posting that sweet elevator music file.
I wonder if you could broadcast your own music on the same port?
What made you think to save the data with offsets? Is that representative of the total time of the elevator music?
Time to start streaming some black metal into the elevators.
So glad there is evidence that I’m not the only one who would have gone down this rabbit hole!
Gold! Thanks for the write up, and the ride.
Appreciated this article wanted to say Thank you for sharing.
Next step: Send out some mp3′s with the same format to the same multicast address. Kick that hotel party up a notch.
…or interjections of mysterious, echoed “voices from the past” at various intervals…Is there somebody there?…What are you doing?…This can’t be real…the sound of a watermelon being smashed…reverbed laughter…a quiet child’s voice…
Could be lots of fun
Looks like you missed out on a lot of free NES ROMs there!
For those asking for a link for the elevator music: https://www.youtube.com/watch?v=xNjyG8S4_kI
Nice. Who would have thought? I wouldn’t have expected that you could reconstitute the mp3 by stripping N bytes and concatenating the rest. Maybe someday I’ll find a use for that. Thanks.
reptoid clone of john wayne and elvis on 2023-02-23 17:25:41
now who woulda thought that they’d do it that way haha
now spoof it ;)
please now post the elevator’s music you dumped
...
Read the original on www.gkbrk.com »
Last week, I updated our pricing limits. One JSON file. The backend started enforcing the new caps, the frontend displayed them correctly, the marketing site showed them on the pricing page, and our docs reflected the change—all from a single commit.
No sync issues. No “wait, which repo has the current pricing?” No deploy coordination across three teams. Just one change, everywhere, instantly.
At Kasava, our entire platform lives in a single repository. Not just the code—everything:
kasava/ # 5,470+ files TypeScript files
├── frontend/ # Next.js 16 + React 19 application
│ └── src/
│ ├── app/ # 25+ route directories
│ └── components/ # 45+ component directories
├── backend/ # Cloudflare Workers API
│ └── src/
│ ├── services/ # 55+ business logic services
│ └── workflows/ # Mastra AI workflows
├── website/ # Marketing site (kasava.ai)
├── docs/ # Public documentation (Mintlify)
├── docs-internal/ # 12+ architecture docs & specs
├── marketing/
│ ├── blogs/ # Blog pipeline (drafts → review → published)
│ ├── investor-deck/ # Next.js site showing investment proposal
│ └── email/ # MJML templates for Loops.so campaigns
├── external/
│ ├── chrome-extension/ # WXT + React bug capture tool
│ ├── google-docs-addon/ # @helper AI assistant (Apps Script)
│ └── google-cloud-functions/
│ ├── tree-sitter-service/ # AST parsing for 10+ languages
│ └── mobbin-research-service/
├── scripts/ # Deployment & integration testing
├── infra-tester/ # Integration test harness
└── github-simulator/ # Mock GitHub API for local dev
This isn’t about abstract philosophies on design patterns for ‘how we should work.’ It’s about velocity in an era where products change fast and context matters.
AI is all about context. And this monorepo is our company—not just the product.
When our AI tools help us write documentation, they have immediate access to the actual code being documented. When we update our marketing website, the AI can verify claims against the real implementation. When we write blog posts like this one, the AI can fact-check every code example, every number, every architectural claim against the source of truth.
* Documentation updates faster because the AI sees code changes and suggests doc updates in the same context
* Website updates faster because pricing, features, and capabilities are pulled from the same config files that power the app
* Blog posts ship faster because the AI can run self-referential checks—validating that our “5,470+ TypeScript files” claim is accurate by actually counting them
* Nothing goes out of sync because there’s only one source of truth, and AI has access to all of it
When you ask Claude to “update the pricing page to reflect the new limits,” it can:
Check the frontend that displays them
Flag any blog posts that might mention outdated numbers
All in one conversation. All in one repository.
This is what “AI-native development” actually means: structuring your work so AI can be maximally helpful, not fighting against fragmentation.
Everything-as-code means everything ships the same way: git push. Want to update the website pricing page? git push. New blog post ready to go live? git push. Fix a typo in the docs? git push. Deploy a backend feature? git push.
No separate CMSs to log into. No WordPress admin panels. No waiting for marketing tools to sync. No “can someone with Contentful access update this?” The same Git workflow that ships code also ships content, documentation, and marketing. Everyone on the team can ship anything, and it all goes through the same review process, the same CI/CD, the same audit trail.
This uniformity removes friction and removes excuses. Shipping becomes muscle memory.
When a backend API changes, the frontend type definitions update in the same commit. When we add a new feature, the documentation can ship alongside it. No version mismatches. No “which version of the API does this frontend need?”
AI can see and validate the entire change in context.
When we ask Claude to add a feature, it doesn’t just write backend code. It sees the frontend that will consume it, the docs that need updating, and the marketing site that might reference it. All in one view. All in one conversation.
Real example from our codebase—adding Asana integration:
commit: “feat: add Asana integration”
├── backend/src/services/AsanaService.ts
├── backend/src/routes/api/integrations/asana.ts
├── frontend/src/components/integrations/asana/
├── frontend/src/app/integrations/asana/
├── docs/integrations/asana.mdx
└── website/src/app/integrations/page.tsx
One PR. One review. One merge. Everything ships together.
We have a single billing-plans.json that defines all plan limits and features:
// frontend/src/config/billing-plans.json (also copied to website/src/config/)
“plans”: {
“free”: { “limits”: { “repositories”: 1, “aiChatMessagesPerDay”: 10 } },
“starter”: {
“limits”: { “repositories”: 10, “aiChatMessagesPerDay”: 100 }
“professional”: {
“limits”: { “repositories”: 50, “aiChatMessagesPerDay”: 1000 }
The backend enforces these limits. The frontend displays them in settings. The marketing website shows them on the pricing page. When we change a limit, one JSON update propagates everywhere—no “the website says 50 repos but the app shows 25” bugs.
And AI validates all of it. When we update billing-plans.json, we can ask Claude to verify that the backend, frontend, and website are all consistent. It reads all three implementations and confirms they match—or tells us what needs fixing.
Renaming a function? Your IDE finds all usages across frontend, backend, docs examples, and blog code snippets. One find-and-replace. One commit.
* Search: Find anything with one grep
frontend/ # Customer-facing Next.js app
├── src/
│ ├── app/ # Next.js 15 App Router
│ │ ├── analytics/ # Semantic commit analysis
│ │ ├── bug-reports/ # AI-powered bug tracking
│ │ ├── chat/ # AI assistant interface
│ │ ├── code-search/ # Semantic code search
│ │ ├── dashboard/ # Main dashboard
│ │ ├── google-docs-assistant/
│ │ ├── integrations/ # GitHub, Linear, Jira, Asana
│ │ ├── prd/ # PRD management
│ │ └── … # 25+ route directories total
│ ├── components/ # 45+ component directories
│ │ ├── ai-elements/ # AI-specific UI
│ │ ├── bug-reports/ # Bug tracking UI
│ │ ├── dashboard/ # Dashboard widgets
│ │ ├── google-docs/ # Google Docs integration
│ │ ├── onboarding/ # User onboarding flow
│ │ └── ui/ # shadcn/ui base components
│ ├── mastra/ # Frontend Mastra integration
│ └── lib/ # SDK, utilities, hooks
backend/ # Cloudflare Workers API
├── src/
│ ├── routes/ # Hono API endpoints
│ ├── services/ # 55+ business logic services
│ ├── workflows/ # Mastra AI workflows
│ │ ├── steps/ # Reusable workflow steps
│ │ └── RepositoryIndexingWorkflow.ts
│ ├── db/ # Drizzle ORM schema
│ ├── durable-objects/ # Stateful edge computing
│ ├── workers/ # Queue consumers
│ └── mastra/ # AI agents and tools
These two talk to each other constantly. Having them in the same repo means:
...
Read the original on www.kasava.dev »
Many developers want to start a side project but aren’t sure what to build. The internet is full of ideas that are basic and dull.
Here’s our list of 73 project ideas to inspire you. We have chosen projects that teach a lot and are fun to build.
Build a BitTorrent client that can download files using the BitTorrent protocol. You can start with single-file torrents. This is a great way to learn how P2P networking works.
Read the official BitTorrent specification here.
Build a program that solves Wordle. This can be a great lesson on information theory and entropy. You’ll also get hands-on experience at optimizing computations.
This YouTube video will get you started.
Implement Optimal Transport from scratch to morph one face into another while preserving identity and structure. You’ll apply linear programming to a real problem.
Here are some OT resources and a paper which proposes a solution.
Create a spreadsheet with support for cell references, simple formulas, and live updates. You’ll learn about dependency graphs, parsing, and reactive UI design.
The founder of the GRID spreadsheet engine shares some insights here.
Build a lightweight container runtime from scratch without Docker. You’ll learn about kernel namespaces, chroot, process isolation, and more.
Read this to understand how containers work.
Build a system that uses Euclid’s postulates to derive geometric proofs and visualize the steps. You’ll learn symbolic representation, rule systems, logic engines, and proof theory.
Google uses a crawler to navigate web pages and save their contents. By building one, you’ll learn how web search works. It’s also great practice for system design.
You can make your own list of sites and create a search engine on a topic of your interest.
Build a DNS server that listens for queries, parses packets, resolves domains, and caches results. Learn more about low-level networking, UDP, TCP, and the internet.
Start with how DNS works and dive into the DNS packet format.
Build a game where players connect two actors through shared credits with other actors, and reveal the optimal path at the end. You’ll learn how to deal with massive graphs.
Explore how to create fast graphs, and then try Landmark Labelling for supreme performance.
Implement the RAFT protocol from scratch to support distributed computing. Learn consensus, failure recovery, and how to build fault-tolerant distributed systems.
Visit this page for the RAFT paper and other resources.
Design a program from scratch that creates satisfying crosswords with adjustable difficulty. You’ll learn procedural generation, constraint propagation, and difficulty modeling.
For example, you can implement the Wave Function Collapse algorithm explained here.
Bitcask is an efficient embedded key-value store designed to handle production-grade traffic. Building this will improve your understanding of databases and efficient storage.
You can implement this short paper.
Apps like Shazam extract unique features from audio. This fingerprint is then used to match and identify sounds. You’ll need to learn hash-based lookups and a bit of signal processing.
Here’s a detailed post with everything you need to know.
Recreate the industry-changing game using SDL, and add some story elements, NPC interactions, and levels. It’ll be a perfect intro to game development.
This video will set you up.
Implement an algorithm from scratch to compare two text files or programs. This will involve dynamic programming and application of graph traversal.
Here’s the classic paper behind Myers’ diff, used in Git for years.
Generate UML class diagrams from source code with support for relationships like inheritance. You’ll visualize object-oriented code and learn how to parse with ASTs.
Write your own encoder/decoder for the BMP image format and build a tiny viewer for it. You’ll learn binary parsing, image encoding, and how to work with pixel buffers and headers.
The Wikipedia article is a good place to start.
Build a FUSE filesystem for Linux from scratch, with indexing, file metadata, and caching. You’ll have to optimize data structures for storage and performance.
This article talks about the concepts used in filesystems.
Write the qubit and quantum gates from scratch. Use them to simulate a circuit for a quantum algorithm like Bernstein-Vazirani or Simon’s algorithm.
Read this short paper for the essentials without any fluff.
Write a video player that decodes H.264/H.265 using ffmpeg, and supports casting local files to smart devices. Learn packet buffering, discovery protocols, and stream encoding.
Get started with this article.
Build a Redis clone from scratch that supports basic commands, RDB persistence, replica sync, streams, and transactions. You’ll get to deep dive into systems programming.
You can use the official Redis docs as a guide.
Build a client-side video editor that runs in the browser without uploading files to a server. Learn how to work with WASM, and why people love using it for high performance tasks.
Visit the official WebAssembly site to get started.
This is a rite of passage. You’ll get hands-on experience with encryption, token expiration, refresh flows, and how to manage user sessions securely.
Implement username and password auth. Then manage sessions with JWT or session IDs.
You have used it in searches and other places where you write text. Implement a solution that suggests the right words, and then optimize heavily for speed.
This YouTube video gives an idea of the implementation process.
Build a simple SQL engine that reads .db files, uses indexes and executes queries. It’s a deep dive into how real-world databases are built and run efficiently.
You need to understand B-trees and how SQLite stores data on disk.
Remove background sounds from audio files. You’ll learn signal processing and denoising techniques used in GPS, mouse input, sensors, object tracking, etc.
You can use a technique like Kalman Filtering to do this.
Design a file sharing app with sync, cloud storage, and basic p2p features that can scale to some extent. You’ll get practice in cloud architecture and backend design.
This article dives into the system design.
Build a map engine to index roads, terrain (rivers, mountains), places (shops, landmarks), and areas (cities, states). Learn spatial indexing, range queries, and zoom-level abstractions.
Start by implementing an R-tree from scratch by following the original paper.
Use Natural Earth and GeoFabrik datasets to populate your map engine.
Recreate a city’s road network, simulate traffic using real open data, and design an improved version. Tackle an NP-hard optimization problem with real constraints.
In some cases, nature has long solved what we call hard. Implement SMA or ACO here.
Develop a decentralized collaborative text editor. Similar to Google Docs, but without any central server. Use CRDTs to manage concurrent edits and ensure eventual consistency.
Use ropes, gap buffers, or piece tables to build a fast text buffer optimized for efficient editing.
Read this article on designing data structures for such apps.
Evolve working models of machinery using only primitive mechanical parts and constraints. You’ll learn about genetic algorithms, fitness functions, and physics simulation.
You can design bridges, cars, clocks, calculators, catapults, and more. NASA used GAs to design an antenna for their space mission.
This YouTube video shows how interesting evolutionary design can get.
Create a server from scratch that supports HTTP requests, static files, routing, and reverse proxying. Learn socket programming and how web servers work.
This page will get you started.
Estimate a depth (disparity) map from a stereo image pair using Markov Random Fields. You’ll learn about computer vision, graphical models, and inference techniques.
Start with the Middlebury Dataset and this article on belief propagation for stereo matching.
Build a minimal Git with core features like init, commit, diff, log, and branching. Learn how version control works using content-addressable storage, hashes, and trees.
Check out Write yourself a Git for an overview of git internals.
Build a Unix debugger with stepping, breakpoints, and memory inspection. You’ll learn low-level systems programming and process control.
This article discusses the internal structure of GDB.
Build a deep learning framework from scratch with a tensor class, autograd, basic layers, and optimizers. Grasp the internals of backpropagation and gradient descent.
Start by building a simple 3-layer feedforward NN (multilayer perceptron) with your framework.
Andrej Karpathy explains the basic concepts in this YouTube Video.
Build a Chess app from scratch, where users can play against each other or your own UCI engine. This project offers a blend of algorithms, UI, game logic, and AI.
You can go one step further and make the engine play itself to improve like AlphaZero and Leela.
You can start with the rules and the chess programming wiki.
Build a fast search engine from scratch for the Wikipedia dump with typo tolerance and semantic ranking, and fuzzy queries. You’ll learn indexing, tokenization, and ranking algorithms.
This article offers a good introduction to the basics of information retrieval.
Build a caching system to avoid redundant fetches for static assets. You’ll learn web caching, log analysis, and how to use probabilistic data structures in a real setting.
You can use this dataset containing two month’s worth of HTTP requests to the NASA server.
This article introduces some of the key concepts.
Build a short-video app with infinite scroll, social graphs of friends and subs, and a tailored feed. You’ll learn efficient preloading, knowledge graphs, and behavioral signals.
Implement NTP from scratch to build a background service that syncs system time with time servers. You’ll learn daemon design and the internals of network time sync.
Implement HyperLogLog from scratch to provide analytics on number of users engaging with hashtags in real time. You’ll learn some key concepts around big data systems.
Write a query planner that rewrites SQL queries for better performance. You’ll learn cost estimation, join reordering, and index selection.
Implement an encrypted voting system for anonymity. Use zero-knowledge proofs to verify results.
For example, this paper attempts to define such a protocol.
Build a mesh VPN where nodes relay traffic without central servers. You’ll learn NAT traversal, encrypted tunneling, and decentralized routing.
Build a file archiver that compresses, bundles, and encrypts your files. Implement compression and encryption algorithms from scratch. Benchmark your performance against zip.
You can refer to the official .zip specification.
Build a basic ray tracer to render 3D scenes with spheres, planes, and lights. This will be great practice in writing clean abstractions and optimizing performance-heavy code.
You can refer to the Ray Tracing in One Weekend ebook.
Create your own language. It is best to start with an interpreted language that does not need a complier. Design your own grammar, parser, and an evaluation engine.
Crafting Interpreters is by far the best resource you can refer to.
Recreate WhatsApp with chats, groups, history, encryption, notifications, and receipts. You’ll get practice at building a production-grade app with an API, data store, and security.
You can draw inspiration from this system design approach.
Build a service to provide routes for a fleet of vehicles with limited capacity to deliver Amazon packages. You’ll learn to optimize routing under constraints.
...
Read the original on codecrafters.io »
Hey folks, I got a lot of feedback from various meetings on the proposed LLVM AI contribution policy, and I made some significant changes based on that feedback. The current draft proposal focuses on the idea of requiring a human in the loop who understands their contribution well enough to answer questions about it during review. The idea here is that contributors are not allowed to offload the work of validating LLM tool output to maintainers. I’ve mostly removed the Fedora policy in an effort to move from the vague notion of “owning the contribution” to a more explicit “contributors have to review their contributions and be prepared to answer questions about them”. Contributors should never find themselves in the position of saying “I don’t know, an LLM did it”. I felt the change here was significant, and deserved a new thread.
From an informal show of hands at the round table at the US LLVM developer meeting, most contributors (or at least the subset with the resources and interest in attending this round table in person) are interested in using LLM assistance to increase their productivity, and I really do want to enable them to do so, while also making sure we give maintainers a useful policy tool for pushing back against unwanted contributions.
I’ve updated the PR, and I’ve pasted the markdown below as well, but you can also view it on GitHub.
LLVM’s policy is that contributors can use whatever tools they would like to
craft their contributions, but there must be a human in the loop.
Contributors must read and review all LLM-generated code or text before they
ask other project members to review it. The contributor is always the author
and is fully accountable for their contributions. Contributors should be
sufficiently confident that the contribution is high enough quality that asking
for a review is a good use of scarce maintainer time, and they should be able
to answer questions about their work during review.
We expect that new contributors will be less confident in their contributions,
and our guidance to them is to start with small contributions that they can
fully understand to build confidence. We aspire to be a welcoming community
that helps new contributors grow their expertise, but learning involves taking
small steps, getting feedback, and iterating. Passing maintainer feedback to an
LLM doesn’t help anyone grow, and does not sustain our community.
Contributors are expected to be transparent and label contributions that
contain substantial amounts of tool-generated content. Our policy on
labelling is intended to facilitate reviews, and not to track which parts of
LLVM are generated. Contributors should note tool usage in their pull request
description, commit message, or wherever authorship is normally indicated for
the work. For instance, use a commit message trailer like Assisted-by: . This transparency helps the community develop best practices
and understand the role of these new tools.
An important implication of this policy is that it bans agents that take action
in our digital spaces without human approval, such as the GitHub @claude
agent. Similarly, automated review tools that
publish comments without human review are not allowed. However, an opt-in
review tool that keeps a human in the loop is acceptable under this policy.
As another example, using an LLM to generate documentation, which a contributor
manually reviews for correctness, edits, and then posts as a PR, is an approved
use of tools under this policy.
This policy includes, but is not limited to, the following kinds of
contributions:
Code, usually in the form of a pull request
The reason for our “human-in-the-loop” contribution policy is that processing
patches, PRs, RFCs, and comments to LLVM is not free — it takes a lot of
maintainer time and energy to review those contributions! Sending the
unreviewed output of an LLM to open source project maintainers extracts work
from them in the form of design and code review, so we call this kind of
contribution an “extractive contribution”.
Our golden rule is that a contribution should be worth more to the project
than the time it takes to review it. These ideas are captured by this quote
from the book Working in Public by Nadia Eghbal:
“When attention is being appropriated, producers need to weigh the costs and
benefits of the transaction. To assess whether the appropriation of attention
is net-positive, it’s useful to distinguish between extractive and
non-extractive contributions. Extractive contributions are those where the
marginal cost of reviewing and merging that contribution is greater than the
marginal benefit to the project’s producers. In the case of a code
contribution, it might be a pull request that’s too complex or unwieldy to
review, given the potential upside.” — Nadia Eghbal
Prior to the advent of LLMs, open source project maintainers would often review
any and all changes sent to the project simply because posting a change for
review was a sign of interest from a potential long-term contributor. While new
tools enable more development, it shifts effort from the implementor to the
reviewer, and our policy exists to ensure that we value and do not squander
maintainer time.
Reviewing changes from new contributors is part of growing the next generation
of contributors and sustaining the project. We want the LLVM project to be
welcoming and open to aspiring compiler engineers who are willing to invest
time and effort to learn and grow, because growing our contributor base and
recruiting new maintainers helps sustain the project over the long term. Being
open to contributions and liberally granting commit access
is a big part of how LLVM has grown and successfully been adopted all across
the industry. We therefore automatically post a greeting comment to pull
requests from new contributors and encourage maintainers to spend their time to
help new contributors learn.
If a maintainer judges that a contribution is extractive (i.e. it doesn’t
comply with this policy), they should copy-paste the following response to
request changes, add the extractive label if applicable, and refrain from
further engagement:
This PR appears to be extractive, and requires additional justification for
why it is valuable enough to the project for us to review it. Please see
our developer policy on AI-generated contributions:
http://llvm.org/docs/AIToolPolicy.html
Other reviewers should use the label to prioritize their review time.
The best ways to make a change less extractive and more valuable are to reduce
its size or complexity or to increase its usefulness to the community. These
factors are impossible to weigh objectively, and our project policy leaves this
determination up to the maintainers of the project, i.e. those who are doing
the work of sustaining the project.
If a contributor responds but doesn’t make their change meaningfully less
extractive, maintainers should escalate to the relevant moderation or admin
team for the space (GitHub, Discourse, Discord, etc) to lock the conversation.
Artificial intelligence systems raise many questions around copyright that have
yet to be answered. Our policy on AI tools is similar to our copyright policy:
Contributors are responsible for ensuring that they have the right to
contribute code under the terms of our license, typically meaning that either
they, their employer, or their collaborators hold the copyright. Using AI tools
to regenerate copyrighted material does not remove the copyright, and
contributors are responsible for ensuring that such material does not appear in
their contributions. Contributions found to violate this policy will be removed
just like any other offending contribution.
Here are some examples of contributions that demonstrate how to apply
the principles of this policy:
This PR contains a proof from Alive2, which is a strong signal of
value and correctness.
This generated documentation was reviewed for correctness by a
human before being posted.
...
Read the original on discourse.llvm.org »
Skip to content
Secure your code as you build
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Use saved searches to filter your results more quickly
To see all available qualifiers, see our documentation.
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Notifications
You must be signed in to change notification settings
Notifications
You must be signed in to change notification settings
There was an error while loading. .
Lower is better. Build with zig build -Doptimize=ReleaseFast.
Build with zig build -Doptimize=ReleaseFast for best performance.
zig build # Build library and CLI
zig build test # Run tests
const zpdf = @import(“zpdf”);
pub fn main() !void {
var gpa = std.heap. GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const doc = try zpdf.Document.open(allocator, “file.pdf”);
defer doc.close();
var buf: [4096]u8 = undefined;
var writer = std.fs.File.stdout().writer(&buf);
defer writer.interface.flush() catch {};
for (0..doc.pages.items.len) |page_num| {
try doc.extractText(page_num, &writer.interface);
zpdf extract document.pdf # Extract all pages to stdout
zpdf extract -p 1-10 document.pdf # Extract pages 1-10
zpdf extract -o out.txt document.pdf # Output to file
zpdf extract –reading-order doc.pdf # Use visual reading order (experimental)
zpdf info document.pdf # Show document info
zpdf bench document.pdf # Run benchmark
import zpdf
with zpdf.Document(“file.pdf”) as doc:
print(doc.page_count)
# Single page
text = doc.extract_page(0)
# All pages (parallel by default)
all_text = doc.extract_all()
# Reading order extraction (experimental)
ordered_text = doc.extract_all(reading_order=True)
# Page info
info = doc.get_page_info(0)
print(f”{info.width}x{info.height}“)
zig build -Doptimize=ReleaseFast
PYTHONPATH=python python3 examples/basic.py
src/
├── root.zig # Document API and core types
├── capi.zig # C ABI exports for FFI
├── parser.zig # PDF object parser
├── xref.zig # XRef table/stream parsing
├── pagetree.zig # Page tree resolution
├── decompress.zig # Stream decompression filters
├── encoding.zig # Font encoding and CMap parsing
├── interpreter.zig # Content stream interpreter
├── simd.zig # SIMD string operations
└── main.zig # CLI
python/zpdf/ # Python bindings (cffi)
examples/ # Usage examples
*ToUnicode/CID: Works when CMap is embedded directly.
**pdfium requires multi-process for parallelism (forked before thread support).
There was an error while loading. Please reload this page.
You can’t perform that action at this time.
...
Read the original on github.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.