10 interesting stories served every morning and every evening.
Open-Source-Software bildet heute das Fundament großer Teile der digitalen Infrastruktur — in Verwaltung, Wirtschaft, Forschung und im täglichen Leben. Selbst im aktuellen Koalitionsvertrag der Bundesregierung wird Open-Source-Software als elementarer Baustein zur Erreichung digitaler Souveränität genannt.
Dennoch wird die Arbeit, die tausende Freiwillige dafür leisten, in Deutschland steuer- und förderrechtlich nicht als Ehrenamt anerkannt. Dieses Ungleichgewicht zwischen gesellschaftlicher Bedeutung und rechtlichem Status gilt es zu korrigieren.
Als aktiver Contributor in Open-Source-Projekten fordere ich daher, Open-Source-Arbeit als gemeinwohlorientiertes Ehrenamt anzuerkennen — gleichrangig mit Vereinsarbeit, Jugendarbeit oder Rettungsdiensten.
...
Read the original on www.openpetition.de »
Imgur decided to block UK users. Honestly? I don’t really care that much. I haven’t actively browsed the site in years. But it used to be everywhere. Back when Reddit embedded everything on Imgur, maybe fifteen years ago, it was genuinely useful. Then Reddit built their own image hosting, Discord did the same, and Imgur slowly faded into the background.
Except it never fully disappeared. And since the block, I keep stumbling across Imgur links that just show “unavailable.” It’s mildly infuriating.
Here’s a concrete example. I was playing Minecraft with some work colleagues and wanted to try different shaders. Most shader pages embed preview images hosted on Imgur. So I’d click through shader after shader, and every single preview was just gone. I couldn’t see what any of them looked like without the images.
This kind of thing happens constantly now. Old forum posts, Reddit threads, documentation pages, random project READMEs. Imgur links are still scattered across the internet, and in the UK, they’re all broken.
The obvious solution is to use a VPN. Change your location, problem solved. But I have a few issues with that approach.
First, I just upgraded to 2.5 Gbps internet and I don’t want to route all my traffic through a VPN and take the speed hit. I have this bandwidth for a reason.
Second, even if I installed a VPN on my main machine, what about my phone? My laptop? My desktop? Every device would need the VPN running, and I’d have to remember to connect it before browsing. It’s messy.
I wanted something cleaner: a solution that works for every device on my network, automatically, without any client-side configuration.
I already run a homelab with Traefik as my reverse proxy, Pi-hole for DNS, and everything declaratively configured with NixOS. If you’ve read my previous post on Docker containers with secrets, you’ll recognise the pattern.
The idea was simple: intercept all requests to i.imgur.com at the DNS level, route them through a VPN-connected container, and serve the images back. Every device on my network automatically uses Pi-hole for DNS via DHCP, so this would be completely transparent.
Traefik sees the SNI hostname and routes to GluetunNginx (attached to Gluetun’s network) proxies to the real ImgurImage comes back through the tunnel to the device
Good question. Gluetun isn’t a reverse proxy. It’s a container that provides VPN connectivity to other containers attached to its network namespace. So I needed something inside Gluetun’s network to actually handle the proxying. Nginx was the simplest choice.
The Nginx config is minimal. It just does TCP passthrough with SNI:
This listens on port 443, reads the SNI header to confirm the destination, and passes the connection through to the real i.imgur.com. The TLS handshake happens end-to-end; Nginx never sees the decrypted traffic.
The compose file runs two containers. Gluetun handles the VPN connection, and Nginx attaches to Gluetun’s network:
The key detail is network_mode: “service:gluetun”. This makes Nginx share Gluetun’s network stack, so all its traffic automatically goes through the VPN tunnel.
I’m not going to mention which VPN provider I use. It’s one of the major ones with WireGuard support, but honestly I’m not thrilled with it. Use whatever you have.
The final piece is telling Traefik to route i.imgur.com traffic to the Gluetun container. This uses TCP routing with TLS passthrough:
The passthrough: true is important. It means Traefik doesn’t terminate TLS; it just inspects the SNI header and forwards the connection.
Following the same pattern from my Docker with secrets post, I created a systemd service that runs the compose stack with Agenix-managed secrets:
The VPN credentials are stored encrypted with Agenix, so my entire dotfiles repo stays public while keeping secrets safe.
Now when any device on my network requests an Imgur image, it works. My phone, my laptop, guest devices, everything. No VPN apps to install, no browser extensions, no manual configuration. Pi-hole intercepts the DNS, Traefik routes the connection, and Gluetun tunnels it through a non-UK exit point.
The latency increase is negligible for loading images, and it only affects Imgur traffic. Everything else still goes direct at full speed.
Is this overkill for viewing the occasional Imgur image? Probably. But it’s a clean solution that requires minimal ongoing maintenance, and it scratches the homelab itch. Plus I can finally see what those Minecraft shaders look like.
...
Read the original on blog.tymscar.com »
Sentence Transformers provide local, easy to use embedding models for capturing the semantic meaning of sentences and paragraphs.
The dataset in this HackerNews dataset contains vector emebeddings generated from the
all-MiniLM-L6-v2 model.
An example Python script is provided below to demonstrate how to programmatically generate embedding vectors using sentence_transformers1 Python package. The search embedding vector is then passed as an argument to the [cosineDistance()](/sql-reference/functions/distance-functions#cosineDistance) function in the SELECT` query.from sentence_transformers import SentenceTransformer
import sys
import clickhouse_connect
print(“Initializing…“)
model = SentenceTransformer(‘sentence-transformers/all-MiniLM-L6-v2’)
chclient = clickhouse_connect.get_client() # ClickHouse credentials here
while True:
# Take the search query from user
print(“Enter a search query :“)
input_query = sys.stdin.readline();
texts = [input_query]
# Run the model and obtain search vector
print(“Generating the embedding for ”, input_query);
embeddings = model.encode(texts)
print(“Querying ClickHouse…“)
params = {‘v1’:list(embeddings[0]), ‘v2’:20}
result = chclient.query(“SELECT id, title, text FROM hackernews ORDER BY cosineDistance(vector, %(v1)s) LIMIT %(v2)s”, parameters=params)
print(“Results :“)
for row in result.result_rows:
print(row[0], row[2][:100])
print(“––––-“)
An example of running the above Python script and similarity search results are shown below (only 100 characters from each of the top 20 posts are printed):Initializing…
Enter a search query :
Are OLAP cubes useful
Generating the embedding for “Are OLAP cubes useful”
Querying ClickHouse…
Results :
27742647 smartmic:
slt2021: OLAP Cube is not dead, as long as you use some form of:1. GROUP BY multiple fi ––––- 27744260 georgewfraser:A data mart is a logical organization of data to help humans understand the schema. Wh ––––- 27761434 mwexler:“We model data according to rigorous frameworks like Kimball or Inmon because we must r ––––- 28401230 chotmat: erosenbe0: OLAP database is just a copy, replica, or archive of data with a schema designe ––––- 22198879 Merick:+1 for Apache Kylin, it’s a great project and awesome open source community. If anyone i ––––- 27741776 crazydoggers:I always felt the value of an OLAP cube was uncovering questions you may not know to as ––––- 22189480 shadowsun7: _Codemonkeyism: After maintaining an OLAP cube system for some years, I’m not that ––––- 27742029 smartmic: gengstrand: My first exposure to OLAP was on a team developing a front end to Essbase that ––––- 22364133 irfansharif: simo7: I’m wondering how this technology could work for OLAP cubes. An OLAP cube ––––- 23292746 scoresmoke:When I was developing my pet project for Web analytics (
The example above demonstrated semantic search and document retrieval using ClickHouse.
A very simple but high potential generative AI example application is presented next.
The application performs the following steps:
Accepts a topic as input from the user
Generates an embedding vector for the topic by using the SentenceTransformers with model all-MiniLM-L6-v2
Retrieves highly relevant posts/comments using vector similarity search on the hackernews table
Uses LangChain and OpenAI gpt-3.5-turbo Chat API to summarize the content retrieved in step #3.
The posts/comments retrieved in step #3 are passed as context to the Chat API and are the key link in Generative AI.
An example from running the summarization application is first listed below, followed by the code for the summarization application. Running the application requires an OpenAI API key to be set in the environment variable OPENAI_API_KEY. The OpenAI API key can be obtained after registering at https://platform.openai.com.
This application demonstrates a Generative AI use-case that is applicable to multiple enterprise domains like : customer sentiment analysis, technical support automation, mining user conversations, legal documents, medical records, meeting transcripts, financial statements, etc$ python3 summarize.py
Enter a search topic :
ClickHouse performance experiences
Generating the embedding for ––> ClickHouse performance experiences
Querying ClickHouse to retrieve relevant articles…
Initializing chatgpt-3.5-turbo model…
Summarizing search results retrieved from ClickHouse…
Summary from chatgpt-3.5:
The discussion focuses on comparing ClickHouse with various databases like TimescaleDB, Apache Spark,
AWS Redshift, and QuestDB, highlighting ClickHouse’s cost-efficient high performance and suitability
for analytical applications. Users praise ClickHouse for its simplicity, speed, and resource efficiency
in handling large-scale analytics workloads, although some challenges like DMLs and difficulty in backups
are mentioned. ClickHouse is recognized for its real-time aggregate computation capabilities and solid
engineering, with comparisons made to other databases like Druid and MemSQL. Overall, ClickHouse is seen
as a powerful tool for real-time data processing, analytics, and handling large volumes of data
efficiently, gaining popularity for its impressive performance and cost-effectiveness.
Code for the above application :print(“Initializing…“)
import sys
import json
import time
from sentence_transformers import SentenceTransformer
import clickhouse_connect
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
import textwrap
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
encoding = tiktoken.encoding_for_model(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
model = SentenceTransformer(‘sentence-transformers/all-MiniLM-L6-v2’)
chclient = clickhouse_connect.get_client(compress=False) # ClickHouse credentials here
while True:
# Take the search query from user
print(“Enter a search topic :“)
input_query = sys.stdin.readline();
texts = [input_query]
# Run the model and obtain search or reference vector
print(“Generating the embedding for ––> ”, input_query);
embeddings = model.encode(texts)
print(“Querying ClickHouse…“)
params = {‘v1’:list(embeddings[0]), ‘v2’:100}
result = chclient.query(“SELECT id,title,text FROM hackernews ORDER BY cosineDistance(vector, %(v1)s) LIMIT %(v2)s”, parameters=params)
# Just join all the search results
doc_results = “”
for row in result.result_rows:
doc_results = doc_results + “\n” + row[2]
print(“Initializing chatgpt-3.5-turbo model”)
model_name = “gpt-3.5-turbo”
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
model_name=model_name
texts = text_splitter.split_text(doc_results)
docs = [Document(page_content=t) for t in texts]
llm = ChatOpenAI(temperature=0, model_name=model_name)
prompt_template = “”″
Write a concise summary of the following in not more than 10 sentences:
...
Read the original on clickhouse.com »
EXCLUSIVE: Credit Report Shows Meta Keeping $27 Billion Off Its Books Through Advanced GeometryAnalyst: Tom Bellwether
Contact Information: None available*
*Because of the complex nature of financial alchemy, our analysts live a hermetic lifestyle and avoid relevant news, daylight, and the olfactory senses needed to detect bullshit. Following our review of Beignet Investor LLC (the Issuer), an affiliate of Blue Owl Capital, in connection with its participation in an 80% joint venture with Meta Platforms Inc., we assign a preliminary A+ rating to the Issuer’s proposed $27.30 billion senior secured amortizing notes.This rating reflects our opinion that:All material risks are contractually assigned to Meta, which allows us to classify them as hypothetical and proceed accordingly.Projected cash flows are sufficiently flat and unbothered by reality to support the rating.Residual Value Guarantees (RVGs) exist, which we take as evidence that asset values will behave in accordance with wishes rather than markets.The Outlook is Superficially Stable, defined here as “By outward appearances stable unless, you know, things happen. Then we’ll downgrade after the shit hits the fan.”Blue Owl Capital Inc. (Blue Owl, BBB/Stable), through affiliated funds, has created Beignet Investor LLC (Beignet or Issuer), a project finance-style holding company that will own an 80 percent interest in a joint venture (JVCo) with Meta Platforms Inc. (Meta, AA-/Stable). The entity is named “Beignet,” presumably because “Off-Balance-Sheet Leverage Vehicle No. 5” tested poorly with focus groups.Beignet is issuing $27.30 billion of senior secured amortizing notes due May 2049 under a Rule 144A structure.Note proceeds, together with $2.45 billion of deferred equity from Blue Owl funds and $1.16 billion of interest earned on borrowed money held in Treasuries, will fund Beignet’s $23.03 billion contribution to JVCo for the 2.064 GW hyperscale data center campus in Richland Parish, Louisiana, along with reserve accounts, capitalized interest and other transaction costs that seem small only in comparison to the rest of the sentence.Iris Crossing LLC, an indirect Meta subsidiary, will own the remaining 20 percent of JVCo and fund approximately $5.76 billion of construction costs.We assign a preliminary A+ rating to the notes, one notch below Meta’s issuer credit rating, reflecting the very strong contractual linkage to Meta and the tight technical separation that allows Meta to keep roughly $27 billion of assets and debt off its balance sheet while continuing to provide all material economic support.Arrows, like cats, have a way of coming home, no matter how far you throw them.Meta transferred the Hyperion data center project into JVCo, which is owned 80 percent by Beignet and 20 percent by Iris Crossing LLC, an indirect Meta subsidiary. JVCo, in turn, owns Laidley LLC (Landlord). None of this is unusual except for the part where Meta designs, builds, guarantees, operates, funds the overruns, pays the rent, and does not consolidate it.This project has nine data centers and two support buildings, with about four million sq. ft. and 2.064 GW capacity. The support buildings will store the reams of documentation needed to convince everyone this structure isn’t what it looks like. The total capital plan of $28.79 billion will be funded as follows:And, in a feat of financial hydration, $1.16 billion of interest generated by the same borrowed money while it sits in laddered Treasuries.The structure allows the Issuer to borrow money, earn interest on the borrowed money, and then use that interest to satisfy the equity requirement that would normally require… money.Nothing is created. Nothing is contributed. It’s a loop. Borrow money, earn interest, and use the interest to claim you provided equity. The kind of circle only finance can call a straight line.Together, these flows cover Beignet’s $23.03 billion obligation to JVCo, plus the usual constellation of capitalized interest, reserve accounts, and transaction expenses. In any other context this would raise questions. For us, it raises the credit rating.Meta, through Pelican Leap LLC (Tenant), has entered into eleven triple-net leases—one for each building—with an initial four-year term starting in 2029 and four renewal options that could extend the arrangement to twenty years. The leases rely on the assumption that Meta will continue to need exponentially more compute power and that AI demand will not collapse, reverse, plateau, or become structurally inconvenient.The notes issued by Beignet are secured by Beignet’s equity interest in JVCo and relevant transaction accounts. They are not secured by the underlying physical assets, which remain at the JVCo and Landlord level. This is described as standard practice, which is true in the same way that using eleven entities to rent buildings to yourself has become standard practice.The resulting structure allows Meta to support the project economically while leaving the associated debt somewhere that is technically not on Meta’s balance sheet. The distinction is thin, but apparently wide enough to matter.The preliminary A+ rating reflects our view that this is functionally Meta borrowing $27.30 billion for a campus no one else will touch, packaged in legal formality precise enough to satisfy the letter of consolidation rules and absurd enough to insult the spirit.Credit risk aligns almost one-for-one with Meta’s own profile because:Meta is obligated to fund construction cost overruns beyond 105 percent of the fixed budget, excluding force majeure events, which rating agencies historically treat as theoretical inconveniences rather than recurring features of the physical world.Meta guarantees all lease payments and operating obligations, both during the initial four-year term and across any renewal periods it already intends to exercise, an arrangement whose purpose becomes clearer when one remembers why the campus is being built at all.Meta provides an RVG (residual value guarantee) structured to be sufficient, in most modeled cases, to ensure bondholders are repaid even if Meta recommits to the Metaverse or any future initiative born from its ongoing fascination with expensive detours. We did not model what would happen if data center demand collapses and Meta cannot secure a new tenant. This scenario was excluded for methodological convenience.The minimum rent schedule has been calibrated to produce a debt service coverage ratio of approximately 1.12 through 2049. We consider this a sufficient level of stability usually found only in spreadsheets that freeze when real-world data is used.Taken together, these features tie Beignet’s credit quality to Meta so tightly that you’d have to not be paying attention to miss them. The structure maintains a precarious technical separation that, under current interpretations of accounting guidance, allows Meta to keep roughly $27 billion of assets and debt off its own balance sheet while continuing to provide every meaningful form of economic support.This treatment is considered acceptable because the people who decide what is acceptable have accepted it.JVCo qualifies as a variable interest entity because the equity at risk is ceremonial and the real economic exposure sits entirely with the party insisting it does not control the venture. This remains legal due to the enduring belief that balance sheets are healthier when the risky parts are hidden.Under U.S. GAAP, consolidation is required if Meta is the primary beneficiary, defined as the party that both:Directs the activities that most significantly affect the entity’s performance, andMeta asserts it is not the primary beneficiary.To evaluate that assertion, we note the following uncontested facts:Meta is responsible for designing, overseeing, and operating a 2.064 GW AI campus, an activity that requires technical capabilities Blue Owl does not possess.Meta bears construction cost overruns beyond 105 percent of the fixed budget, as well as specified casualty repair obligations of up to $3.125 billion per event during construction.Meta provides the guarantee for all rent and operating payments under the leases, across the initial term and any renewals.Meta provides the residual value guarantee, ensuring bondholders are repaid if leases are not renewed or are terminated, either through a sale or by paying the guaranteed minimum values directly.Meta contributes funding, directs operations, bears construction risk, guarantees payments, guarantees asset values, determines utilization, controls renewal behavior, and can trigger the sale of the facility.Based on this, or despite this, Meta concludes it does not control JVCo.Our interpretation is fully compliant with U.S. GAAP, which prioritizes the geometry of the legal structure over the inconvenience of economic substance and recognizes control only if the controlling party agrees to be recognized as controlling.Meta has not agreed, and the framework, including this agency, respects that choice.For rating purposes, we therefore accept Meta’s non-consolidation as an accounting outcome while treating Meta, in all practical respects, as fully responsible for the performance of an entity it does not officially control.The lease structure is designed to look like a normal commercial arrangement while functioning as a long-term commitment Meta insists, for accounting reasons, it cannot possibly predict.Tenant will pay fixed rent for the first 19 months of operations, based on a 50 percent assumed utilization rate, after which rent scales with actual power consumption. The leases are triple-net. Meta is responsible for everything: operating costs, maintenance, taxes, insurance, utilities. If a pipe breaks, Meta fixes the pipe. If a hurricane relocates a roof, Meta pays to staple the roof back on.In practical terms, the only scenario in which Beignet bears operating exposure is a scenario in which Meta stops paying its own bills, at which point the lease structure becomes irrelevant because the same lawyers that structured this deal will have already quietly extricated Meta from liability.A minimum rent floor engineered to produce a DSCR of 1.12 in a spreadsheet where 1.12 was likely hard-coded and independent of math.A four-year initial term with four four-year renewal options, theoretically creating a 20-year runway Meta pretends not to see.Meta guarantees all tenant payment obligations across the entire potential lease life, including renewals it strategically refuses to acknowledge as inevitable.No performance-based KPIs. Under this structure, the buildings could underperform, overperform, or catch fire. Meta still pays rent.The RVG requires Meta to ensure that, at every potential lease-termination date, the asset is worth at least the guaranteed minimum value. If markets disagree, Meta pays the difference. Because Meta is rated AA-/Stable, we are instructed to assume that it will do so without hesitation, including in scenarios where demand softens or secondary markets discover that a hyperscale campus in Richland Parish is not the world’s most liquid asset class.The interplay between the lease term and the RVG creates a circular logic we find structurally exquisite.From a credit perspective, this circularity is considered supportive, because the same logic used to avoid consolidating the debt also ensures bondholders are paid. The circularity is not treated as a feature or a flaw. It is treated as accounting.Because Meta is AA-/Stable, we assume it will pay whatever number the Excel model finds through Goal Seek, even in scenarios involving technological obsolescence or an invasion of raccoons.The accounting hinges on a paradox engineered with dull tweezers:Under lease accounting, Meta must record future lease obligations only if renewals are reasonably certain.Under RVG accounting, Meta must record a guarantee liability only if payment is probable.To keep $27 billion off its balance sheet, Meta must therefore assert:Renewals are not reasonably certain, despite designing, funding, building, and exclusively using a 2.064 GW AI campus for which the realistic tenant list begins and ends with Meta.The RVG will probably never be triggered, despite the fact that not renewing would trigger it immediately.This requires a narrow corridor of assumptions in which Meta simultaneously plans to use the facility for two decades and insists that no one can predict four years of corporate intention.From a credit standpoint, we are supportive. The assumptions that render the debt invisible are precisely what make it secure. A harmony best described as collateralized cognitive dissonance.Meta linkage. The economics are wedded to Meta’s credit profile, which we are required to describe as AA-/Stable rather than “the only reason this entire structure doesn’t fold from a stiff breeze.” Meta guarantees the rent, the RVG, and the continued relevance of the facility. The rest is décor auditors would deem “tasteful.”Minimum rent floor. The lease schedule produces a perfectly flat DSCR of 1.12 through 2049. Projects of this size do not produce flat anything, but the model insists otherwise, so we pretend we believe it. Being sticklers for tradition, and having learned nothing from the financial crisis of 2008, we treat the spreadsheet as the final arbiter of truth, even when the inputs describe a world no one lives in.Construction risk transfer. Meta absorbs cost overruns beyond 105 percent of budget and handles casualty repairs during construction. Our methodology interprets “contractually transferred” as “ceased to exist,” so we decline to model the risk of overruns on a $28 billion campus built in a hurricane corridor. This is considered best practice.RVG backstop. The residual value guarantee eliminates tail risk in much the same way a parent cosigning for their teenager’s car loan eliminates tail risk: by ensuring that the person with all the money pays for everything. If the market value collapses, Meta pays the difference. If the facility can’t be sold, Meta pays the whole thing. If the entire campus becomes a raccoon sanctuary, Meta still pays. We classify this as credit protection, a nuanced designation that allows us to recognize the security of the arrangement without recognizing the debt.Absence of performance KPIs. There are no operational KPIs that allow rent abatement. This is helpful because KPIs create volatility, and volatility requires thought, a variable we explicitly exclude from our methodology. By removing KPIs entirely, the structure ensures a level of cash-flow stability that exists only in transactions where the tenant is also the economic owner pretending to be a squatter.Key Risks We Have Chosen To Be Comfortable WithThe rating also reflects several risks that are acknowledged, intellectually troubling, and ultimately tolerated because Meta is large enough that everyone agrees to stop asking questions.Off-balance-sheet dependence. Meta treats JVCo as if it belongs to someone else, which is a generous interpretation of ownership. If consolidation rules ever evolve to reflect economic substance, Meta could be required to add $27 billion of assets and matching debt back onto its own balance sheet. Our methodology treats this as a theoretical inconvenience rather than a credit event, because calling it what it really is would create a conflict with the very companies we rate.Concentration risk. The entire project exists for one tenant with one business model in one industry undergoing technological whiplash. The facility is engineered so specifically for Meta’s AI ambitions that the only plausible alternative tenant is another version of Meta from a parallel timeline. We strongly disagree with the many-worlds interpretation of quantum mechanics. We set this concern aside because at this stage in the transaction, the A+ rating is a structural load-bearing wall, and we are not paid to do demolition.Residual value uncertainty. The RVG depends on modeled guaranteed minimum values that assume buyers will one day desire a vast hyperscale complex in Richland Parish under stress scenarios. If hyperscale supply balloons or the resale market for 2-gigawatt data centers becomes as illiquid as common sense, Meta will owe more money. This increases Meta’s direct obligations, which should concern us, but does not, because Meta is rated AA-/Stable and therefore presumed to withstand any scenario we have chosen not to model.Casualty and force majeure. In extreme scenarios, multiple buildings could be destroyed by a hurricane, which we view as unlikely given that they almost never impact Louisiana. The logic resembles a Rube Goldberg machine built out of indemnities. We classify this as a strength.JV structural subordination. Cash flows must navigate waterfalls, covenants, carve-outs, and the possibility of up to $75 million of JV-level debt. These features introduce structural complexity, which we flag, then promptly ignore, because acknowledging would force us to explain who benefits from the convolution.Despite these risks, we maintain an A+ rating because Meta’s credit quality is strong, the structure is designed to hide risk rather than transfer it, and our role in this ecosystem is to observe these contradictions and proceed as though they were features rather than warnings.The outlook is Superficially Stable. That means we expect the structure to hold together as long as Meta keeps paying for everything and the accounting rules remain generously uninterested in economic reality.We assume, with the confidence of people who have clearly not been punished enough:Meta will preserve an AA-/Stable profile because any other outcome would force everyone involved to admit what this actually is.Construction will stay “broadly on schedule,” a phrase we use to pre-forgive whatever happens as long as Meta covers the overruns, which it must.Lease payments and the minimum rent schedule will continue producing a DSCR that hovers around 1.12 in models designed to ensure that result, and not materially below 1.10 unless something un-modeled happens, which we classify as “outside scope.”The RVG will remain enforceable, which matters more than the resale value of a hyperscale facility in a world where hyperscale facilities may or may not be worth anything.Changes in VIE or lease-accounting guidance will affect where Meta stores the debt, not whether Meta pays it.We could lower the rating if Meta were downgraded, if DSCR sagged below the range we pretend is acceptable, if Meta weakened its guarantees, or if events unfold in ways our assumptions did not account for, as events tend to do. The last category includes anything that would force us to revisit the assumptions we confidently made without testing.We view an upgrade as unlikely. The structure already performs the single miracle it was designed for: keeping $27.3 billion off Meta’s balance sheet in a manner we are professionally obligated to support.CONFIDENTIALITY AND USE: This report is intended solely for institutional investors, entities required by compliance to review documents they will not read, and any regulatory body still pretending to monitor off-balance-sheet arrangements. FSG LLC makes no representation, warranty, or faint gesture toward coherence regarding the accuracy, completeness, or legitimacy of anything contained herein. By reading this document, you irrevocably acknowledge that we did not perform due diligence in any conventional, philosophical, or legally enforceable sense. Our review consisted of rereading Meta’s press release until repetition produced acceptance, aided by a Magic 8-Ball we shook until it agreed.LIMITATION OF RELIANCE: Any resemblance to objective analysis is coincidental and should not be relied upon by anyone with fiduciary obligations, ethical standards, a working memory, or the ability to perform basic subtraction. Forward-looking statements are based on assumptions that will not survive contact with reality, stress testing, most Tuesdays, or a modest change in interest rates. FSG LLC is not liable for losses arising from reliance on this report, misunderstanding this report, fully understanding this report, or the sinking recognition that you should have known better. Past performance is not indicative of future results, except in the specific case of rating agencies repeating the same mistakes at larger scales with increasing confidence.RATING METHODOLOGY: The rating assigned herein may be revised, withdrawn, or denied ever existing if Meta consolidates the debt, Louisiana ceases to exist for tax purposes, or the data center becomes self-aware and moves to Montana to escape the heat. FSG LLC calculated the A+ rating using a proprietary model consisting of discounted cash flows, interpretive dance, and whatever number Meta’s CFO sounded comfortable with on a diligence call we did not in fact attend. Readers who discover material errors in this report are contractually obligated to keep them to themselves and accept that being technically correct is the least valuable form of correct.GENERAL PROVISIONS: By continuing to read, you consent to the proposition that what Meta does not consolidate does not exist, waive your right to say “I told you so” when this unravels, and accept that the term “investment grade” is now a disposition rather than a metric. FSG LLC reserves the right to amend, retract, deny, or disown this report at any time, particularly if Congress shows interest or someone notes that $27 billion off-balance-sheet is on a balance sheet somewhere. If you print this document, you may be required under applicable securities law to recycle it, shred it, or burn it before sunrise, whichever comes first. For questions, complaints, or sneaking suspicions, please do not contact us. We are unavailable indefinitely and have disabled our voicemail.
...
Read the original on stohl.substack.com »
I don’t remember when I first started noticing that people I knew out in the world had lost their sense of erotic privacy, but I do remember the day it struck me as a phenomenon that had escaped my timeline and entered my real, fleshy life. It was last year, when I was having a conversation with a friend of mine, who, for the record, is five years younger than me (I’m 31). I told my friend about an erotic encounter I’d just experienced and very much delighted in, in which I had my hair brushed at the same time by two very beautiful women at the hair salon — one was teaching the other how to do it a certain way. When I finished my story, my friend looked at me, horrified.
“They had no idea you felt something sexual about them,” she said. “What if they found out? Lowkey, I hate to say this but: you took advantage of them.” I was shocked. I tried to explain — and it felt extremely absurd to explain — that this had happened in my body and in my thoughts, which were private to me and which nobody had the right to know about. But they did have the right, my friend argued. She demanded that I apologize to the women for sexualizing them. Offended at having been accused — in my view, in extremely bad faith — of being some kind of peep-show creep, I tried to argue that I’d simply responded in a physical way to an unexpected, direct, and involuntary stimulus. Back and forth, back and forth, we fought like this for a while. In fact, it ended the friendship.
There were other conversations, too, that suggested to me that conceptions of love and sex have changed fundamentally among people I know. Too many of my friends and acquaintances — of varying degrees of “onlineness,” from veteran discourse observers to casual browsers — seem to have internalized the internet’s tendency to reach for the least charitable interpretation of every glancing thought and, as a result, to have pathologized what I would characterize as the normal, internal vagaries of desire.
Hence, there was the friend who justified her predilection for being praised in bed as a “kink” inherited through the “trauma” of her father always harping on her because of her grades. There was the friend who felt entitled to posting screenshots of intimate conversations on Twitter after a messy breakup so that she could get a ruling on “who was the crazy one.” Then there was the friend who bitterly described a man he was dating as a “fuckboy” because he stood him up, claiming that their having enjoyed sex together beforehand was “emotionally manipulative.” When I dug a bit deeper, it turned out the man in question had just gotten out of a seven-year relationship and realized he wasn’t ready to be sexually intimate, and while he was rude to stand my friend up, it shocked me how quick my friend was to categorize his rightfully hurt feelings as something pathological or sinister in the other person, and that he did this in order to preemptively shield himself from being cast as the villain in what was a multi-party experience. This last friend I asked: “Who are you defending yourself against?” To which he answered, to my astonishment: “I don’t know. The world.”
I choose these examples from my personal life because they express sentiments that were once the kind of stuff I encountered only in the messy battlegrounds of Twitter, amid discussions about whether Sabrina Carpenter is being oversexualized, whether kinks are akin to a sexual orientation, whether a woman can truly consent in an age-gap relationship, and whether exposure to sex scenes in movies violates viewer consent. It is quite easy to dismiss these “discourse wars” as a “puritanism” afflicting the young, a reactionary current to be solved with a different, corrective discourse of pro-sex liberation, distributed via those same channels. If only it were so! To me, the reality goes deeper and is bleaker.
The fact is that our most intimate interactions with others are now governed by the expectation of surveillance and punishment from an online public. One can never be sure that this public or someone who could potentially expose us to it isn’t there, always secretly filming, posting, taking notes, ready to pounce the second one does something cringe or problematic (as defined by whom?). To claim that these matters are merely discursive in nature is to ignore the problem. Because love and sex are so intimate and vulnerable, the stakes of punishment are higher, and the fear of it penetrates deeper into the psyche and is harder to rationalize away than, say, fear of pushback from tweeting a divisive political opinion.
I should state at this point that this is not an essay about “cancel culture going too far,” a topic which can now be historicized as little more than a rhetorical cudgel wielded successfully by the right to wrest cultural power back from an ascendant progressive liberalism. This was especially true after the prominence of organized campaigns such as #MeToo. #MeToo was smeared by liberals and conservatives alike (united, as they always are, in misogyny) as being inherently punitive in nature, meant to punish men who’d fallen into a rough patch of bad behavior, or who, perhaps, might not have done anything at all (the falsely accused or the misinterpreted man became the real victim, in this view). #MeToo did make use of the call-out — the story shared in a spreadsheet anonymously or in a signed op-ed — but the call-outs had a purpose: to end a long-standing and long-permitted norm of sexual abuse within institutions. Underlying this was a discursive practice and a form of solidarity building in which people believed that sharing their stories of trauma en masse could bring about structural change. As someone who participated myself, I too believed in this theory and saw it as necessary, cathartic, and political, and far from vigilante justice.
But the pushback against #MeToo reveals a certain peril to storytelling as politics, not only in the retraumatization evident in the practice of revealing one’s most intimate harms before an infinite online audience, which could always include those listening in bad faith. But also, a discursive market opened up in which trauma became a kind of currency of authenticity, resulting in a doubled exploitation. This idea, while not very nice, lingers in the use of harm as an authoritative form of rhetorical defense. The problem here is not what is said, but how it is used. A friction has since emerged between an awareness of weaponization of harm and emotion and the continued need to express oneself as vulnerably as possible in order to come off as sincere. This friction is unresolved.
The organized goals of the #MeToo movement are missing from the new puritanism. I think that the prudish revulsion I’ve seen online and in my own life has as much to do with surveillance as with sex. Punishing strangers for their perceived perversion is a form of compensation for a process that is already completed: the erosion of erotic and emotional privacy through internet-driven surveillance practices, practices we have since turned inward on ourselves. In short, we have become our own panopticons.
On the rightmost side of the spectrum, punitive anti-erotic surveillance is very explicit and very real, especially for women. The Andrew Tates of the world and the practitioners of extreme forms of misogyny have no problem with using internet tools and social media websites for mass shaming and explicit harm. Covert filming of sex acts, AI deep fakes, extortion, and revenge porn are all realities one has to contend with when thinking about hooking up or going to public places such as nightclubs and gay bars. This is blackmail at its most explicit and extreme, meant to further solidify a link between sex and fear.
But that link between sex and fear is operating in more “benign” or common modes of internet practice. There is an online culture that thinks nothing of submitting screenshots, notes, videos, and photos with calls for collective judgement. When it became desirable and permissible to transform our own lives into content, it didn’t take long before a sense of entitlement emerged that extended that transformation to people we know and to strangers. My ex sent me this text, clearly she is the crazy one, right? Look at this dumb/funny/cringe Hinge profile! Look at this note some guy sent me, is this a red flag? Look at this random woman I photographed buying wine, coconut oil, and a long cucumber at the supermarket!
I think these kinds of posts sometimes amount to little more than common bullying, but they are on a continuum with a puritan discourse in which intimate questions, practices, and beliefs about queerness, sexuality, gender presentation, and desire are also subjected to days-long piles-on. In both instances, the instinct to submit online strangers to viral discipline is given a faux-radical sheen. It’s a kind of casual blackmail that warns everyone to conform or be exposed; a way of saying if you don’t cave to my point of view, redefine yourself in my image of what sexuality is or should be, and (most importantly) apologize to me and the public, I will subject you to my large following and there will be hell to pay. Such unproductive and antisocial behavior is justified as a step toward liberation from predation, misogyny, or any number of other harms. But the punitive mindset we’ve developed towards relationships is indicative of an inability to imagine a future of gendered or sexual relations without subjugation. To couch that in the language of harm reduction and trauma delegitimizes both.
There are other ways the politics of surveillance have become a kind of funhouse mirror. It is seen as more and more normal to track one’s partner through Find My iPhone or an AirTag, even though the potential for abuse of this technology is staggering and obvious. There are all kinds of new products, such as a biometric ring that is allegedly able to tell you whether your partner is cheating, that expand this capability into more and more granular settings. That’s all before we get into the endless TikToks about “why I go through my partner’s text messages.” That men use these tactics and tools to control women is a known threat. What is astonishing is the lengths to which some women will go to use these same technologies, claiming that they are necessary to prevent harm — especially that caused by cheating, which is now seen as some kind of lifelong trauma or permanently damnable offense instead of one of the rather quotidian, if very painful, ways we hurt one another. Each of these surveillance practices operates from a feeling of entitlement and control over other people, their bodies, and what they do.
Pundits like to decree sexlessness as a Gen-Z problem, to argue no one is fucking because they are too on their phones. However, it is always too easy to blame the young. It was my generation that failed to instill the social norms necessary to prevent a situation where fear of strangers on the internet has successfully replaced the disciplinary apparatus more commonly held by religious or conservative doctrine. Even when, as in my experience in the salon, I am acting in the privacy of my own body, someone is always there watching, ready to interpret my actions, problematize them so as to share in the same sense of magical thinking, the same insecurities, and to be punished for not being insecure in the same way.
It’s only in retrospect that I’m able to realize the toll that constant, nagging interaction with my devices and the internet has taken on my thinking life and my sex life. I remember very viscerally when I’d just come out of the closet as bisexual in 2016. When I embarked on a journey to find the kind of lover I wanted to be, my only experience with the world of queerness was online through memes, articles, and others’ social media presentation of themselves and of politics. Queer sex was not something that could be discovered through sensation, through physical interaction, but was rather a catalog of specific acts and roles one was already expected to know. I was terrified of making some kind of mistake, of being the wrong kind of bisexual, of misrepresenting myself in an offensive way (could I use the term “soft butch” if I wasn’t a lesbian?), of being exposed somehow as a fraud. When the time came for me to have sex for the first time, what should have been a joyous occasion was instead burdened with a sense of being watched. I could not let the natural processes of erotic discovery take their course, so caught up was I in judging myself from the perspective of strangers to whom I owed nothing.
But it wasn’t just a matter of queerness, either. When I hooked up with men, I could only perceive of sex the same way, not as situational but as a set of prescribed acts and scenes, many of which I wanted to explore. However, this time I interrogated these urges as being sociogenic in nature and somehow harmful to me, when they were, in fact, private, and I did not, in reality, feel harmed. Because I wanted, at one point in my life, to be tied up and gagged, the disempowering nature of such a want necessitated trying to justify it against invisible accusations with some kind of traumatogenic and immutable quality. Maybe it was because I was raped in college. Maybe I was just inherently submissive. One of the great ironies in the history of sex is that pathologization used to be a way of controlling sexual desire. (All are familiar with the many myths that masturbation would turn one blind.) Now it is a way of exempting oneself, of relinquishing control of one’s actions so as to absolve them of scrutiny. My little bondage moment couldn’t be problematic if it couldn’t be helped. It couldn’t be subjected to interrogation if there was something I could point to to say “it’s beyond my control, don’t judge me!” One day, however, I came to an important revelation: The reality was much simpler. It was a passing phase, a desire that originated with a specific man and lost its charm after I moved on from him. There wasn’t some deterministic quality in myself that made me like this. My desire was not fixed in nature. My sexual qualities were transient and not inborn. What aroused me was wonderfully, entirely situational.
A situational eroticism is what is needed now, in our literalist times. It’s exhausting, how everything is so readily defined by types, acts, traumas, kinks, fetishes, pathology, and aesthetics. To me, our predilection for determinism is an expected psychological response to excessive surveillance. A situational eroticism decouples sensation from narrative and typology. It allows us to feel without excuse and to relate our feelings to our immediate embodied moment, grounded in a fundamental sense of personal privacy. While it is admirable to try and understand ourselves and important to protect ourselves from harm and investigate critically the ways in which what we want may put us at risk of that harm — or at risk of doing harm to others — sometimes desires just are, and they are not that way for long. Arousal is a matter of the self, which takes place within the body, a space no one can see into. It is often a mystery, a surprise, a discovery. It can happen at a small scale, say, the frisson of two sets of fingers in one’s hair at once. It is beautiful, unplanned and does not judge itself because it is an inert sensation, unimbued with premeditated meaning. This should liberate rather than frighten us. Maybe what it means doesn’t matter. Maybe we don’t have to justify it even to ourselves.
But in order to facilitate a return to situational eroticism, we need to kill the panopticon in our heads. That means first killing the panopticon we’ve built for others. There is no purpose in vindictive or thoughtless exposure. Not everything needs to be subjected to public opinion, not every anecdote is worth sharing, not every debate needs engagement, especially those debates which have no material basis to them, no ask, no funnel for all that energy. We need to stop confusing vigilantism with justice and posting with politics. That does not mean we stop the work that #MeToo started, but that revenge is a weapon best utilized collectively against the enemies of liberation. We need to protect the vulnerable from exploitative technologies and practices, repeatedly denounce their use, and work towards a world without sexual coercion, digital or otherwise.
On an individual level, we need to abandon or reshape our relationships with our phones and regain a sense of our own personal and mental privacy. It’s a matter of existential, metaphysical importance. Only when this decoupling from ourselves and the mediated performance of ourselves is complete, can we begin the process of returning to our own bodies out there, in the world, with no one watching or reading our thoughts except those we want to. The truth is, we are very afraid not of sex, but of exposure. Only when we are unafraid can we begin to let desire flourish. Only when we return to ourselves can we really know what we want.
Kate Wagner is the architecture critic at The Nation. Her award-winning cultural writing has been featured in magazines ranging from The Baffler to the New Republic.
...
Read the original on lux-magazine.com »
Toulouse, France, 28 November 2025 — Analysis of a recent event involving an A320 Family aircraft has revealed that intense solar radiation may corrupt data critical to the functioning of flight controls.
Airbus has consequently identified a significant number of A320 Family aircraft currently in-service which may be impacted.
Airbus has worked proactively with the aviation authorities to request immediate precautionary action from operators via an Alert Operators Transmission (AOT) in order to implement the available software and/or hardware protection, and ensure the fleet is safe to fly. This AOT will be reflected in an Emergency Airworthiness Directive from the European Union Aviation Safety Agency (EASA).
Airbus acknowledges these recommendations will lead to operational disruptions to passengers and customers. We apologise for the inconvenience caused and will work closely with operators, while keeping safety as our number one and overriding priority.
...
Read the original on www.airbus.com »
Molly is an independent Signal fork for Android with improved features:
Extra theme that follows your device palette
When you are gone for a set period of time
New and better features to come
...
Read the original on molly.im »
Every couple of years somebody notices that large tech companies sometimes produce surprisingly sloppy code. If you haven’t worked at a big company, it might be hard to understand how this happens. Big tech companies pay well enough to attract many competent engineers. They move slowly enough that it looks like they’re able to take their time and do solid work. How does bad code happen?
I think the main reason is that big companies are full of engineers working outside their area of expertise. The average big tech employee stays for only a year or two. In fact, big tech compensation packages are typically designed to put a four-year cap on engineer tenure: after four years, the initial share grant is fully vested, causing engineers to take what can be a 50% pay cut. Companies do extend temporary yearly refreshes, but it obviously incentivizes engineers to go find another job where they don’t have to wonder if they’re going to get the other half of their compensation each year.
If you count internal mobility, it’s even worse. The longest I have ever stayed on a single team or codebase was three years, near the start of my career. I expect to be re-orged at least every year, and often much more frequently.
However, the average tenure of a codebase in a big tech company is a lot longer than that. Many of the services I work on are a decade old or more, and have had many, many different owners over the years. That means many big tech engineers are constantly “figuring it out”. A pretty high percentage of code changes are made by “beginners”: people who have onboarded to the company, the codebase, or even the programming language in the past six months.
To some extent, this problem is mitigated by “old hands”: engineers who happen to have been in the orbit of a particular system for long enough to develop real expertise. These engineers can give deep code reviews and reliably catch obvious problems. But relying on “old hands” has two problems.
First, this process is entirely informal. Big tech companies make surprisingly little effort to develop long-term expertise in individual systems, and once they’ve got it they seem to barely care at all about retaining it. Often the engineers in question are moved to different services, and have to either keep up their “old hand” duties on an effectively volunteer basis, or abandon them and become a relative beginner on a brand new system.
Second, experienced engineers are always overloaded. It is a busy job being one of the few engineers who has deep expertise on a particular service. You don’t have enough time to personally review every software change, or to be actively involved in every decision-making process. Remember that you also have your own work to do: if you spend all your time reviewing changes and being involved in discussions, you’ll likely be punished by the company for not having enough individual output.
Putting all this together, what does the median productive engineer at a big tech company look like? They are usually:
* competent enough to pass the hiring bar and be able to do the work, but either
* working on a codebase or language that is largely new to them, or
* trying to stay on top of a flood of code changes while also juggling their own work.
They are almost certainly working to a deadline, or to a series of overlapping deadlines for different projects. In other words, they are trying to do their best in an environment that is not set up to produce quality code.
That’s how “obviously” bad code happens. For instance, a junior engineer picks up a ticket for an annoying bug in a codebase they’re barely familiar with. They spend a few days figuring it out and come up with a hacky solution. One of the more senior “old hands” (if they’re lucky) glances over it in a spare half-hour, vetoes it, and suggests something slightly better that would at least work. The junior engineer implements that as best they can, tests that it works, it gets briefly reviewed and shipped, and everyone involved immediately moves on to higher-priority work. Five years later somebody notices this and thinks “wow, that’s hacky - how did such bad code get written at such a big software company”?
I have written a lot about the internal tech company dynamics that contribute to this. Most directly, in Seeing like a software company I argue that big tech companies consistently prioritize internal legibility - the ability to see at a glance who’s working on what and to change it at will - over productivity. Big companies know that treating engineers as fungible and moving them around destroys their ability to develop long-term expertise in a single codebase. That’s a deliberate tradeoff. They’re giving up some amount of expertise and software quality in order to gain the ability to rapidly deploy skilled engineers onto whatever the problem-of-the-month is.
I don’t know if this is a good idea or a bad idea. It certainly seems to be working for the big tech companies, particularly now that “how fast can you pivot to something AI-related” is so important. But if you’re doing this, then of course you’re going to produce some genuinely bad code. That’s what happens when you ask engineers to rush out work on systems they’re unfamiliar with.
Individual engineers are entirely powerless to alter this dynamic. This is particularly true in 2025, when the balance of power has tilted away from engineers and towards tech company leadership. The most you can do as an individual engineer is to try and become an “old hand”: to develop expertise in at least one area, and to use it to block the worst changes and steer people towards at least minimally-sensible technical decisions. But even that is often swimming against the current of the organization, and if inexpertly done can cause you to get PIP-ed or worse.
I think a lot of this comes down to the distinction between pure and impure software engineering. To pure engineers - engineers working on self-contained technical projects, like a programming language - the only explanation for bad code is incompetence. But impure engineers operate more like plumbers or electricians. They’re working to deadlines on projects that are relatively new to them, and even if their technical fundamentals are impeccable, there’s always something about the particular setup of this situation that’s awkward or surprising. To impure engineers, bad code is inevitable. As long as the overall system works well enough, the project is a success.
At big tech companies, engineers don’t get to decide if they’re working on pure or impure engineering work. It’s not their codebase! If the company wants to move you from working on database infrastructure to building the new payments system, they’re fully entitled to do that. The fact that you might make some mistakes in an unfamiliar system - or that your old colleagues on the database infra team might suffer without your expertise - is a deliberate tradeoff being made by the company, not the engineer.
It’s fine to point out examples of bad code at big companies. If nothing else, it can be an effective way to get those specific examples fixed, since execs usually jump at the chance to turn bad PR into good PR. But I think it’s a mistake to attribute primary responsibility to the engineers at those companies. If you could wave a magic wand and make every engineer twice as strong, you would still have bad code, because almost nobody can come into a brand new codebase and quickly make changes with zero mistakes. The root cause is that most big company engineers are forced to do most of their work in unfamiliar codebases.
edit: this post got lots of comments on both Hacker News and lobste.rs.
It was surprising to me that many commenters find this point of view unplesasantly nihilistic. I consider myself fairly optimistic about my work. In fact, I meant this post as a rousing defence of big tech software engineers from their critics! Still, I found this response blog post to be an excellent articulation of the “this is too cynical” position, and will likely write a followup post about it soon. If you can’t wait, I wrote a bit on this topic at the start of 2025 in Is it cynical to do what your manager wants?.
Some Hacker News commenters had alternate theories for why bad code happens: lack of motivation, deliberately demoralizing engineers so they won’t unionize, or just purely optimizing for speed. I don’t find these compelling, based on my own experience. Many of my colleagues are highly motivated, and I just don’t believe any tech company is deliberately trying to make its engineers demoralized and unhappy.
A few readers disagreed with me about RSUs providing an incentive to leave, because their companies give stock refreshers. I don’t know about this. I get refreshers too, but if they’re not in the contract, then I don’t think it matters - the company can decide not to give you 50% of your comp at-will by just pausing the refreshers, which is an incentive to move jobs so it’s locked in for four more years.
...
Read the original on www.seangoedecke.com »
When we launched Skald, we wanted it to not only be self-hostable, but also for one to be able to run it without sending any data to third-parties.
With LLMs getting better and better, privacy-sensitive organizations shouldn’t have to choose between being left behind by not accessing frontier models and doing away with their committment to (or legal requirement for) data privacy.
So here’s what we did to support this use case and also some benchmarks comparing performance when using proprietary APIs vs self-hosted open-source tech.
A basic RAG usually has the following core components:
And most times it also has these as well:
What that means is that when you’re looking to build a fully local RAG setup, you’ll need to substitute whatever SaaS providers you’re using for a local option for each of those components.
Here’s a table with some examples of what we might use in a scenario where we can use third-party Cloud services and one where we can’t:
Do note that running something locally does not mean it needs to be open-source, as one could pay for a license to self-host proprietary software. But at Skald our goal was to use fully open-source tech, which is what I’ll be convering here.
The table above is far from covering all available options on both columns, but basically it gives you an indication of what to research into in order to pick a tool that works for you.
As with anything, what works for you will greatly depend on your use case. And you need to be prepared to run a few more services than you’re used to if you’ve just been calling APIs.
For our local stack, we went with the easiest setup for now to get it working (and it does! see writeup on this lower down) but will be running benchmarks on all other options to determine the best possible setup.
This is what we have today:
Vector DB: Postgres + pgvector. We already use Postgres and didn’t want to bundle another service into our stack, but this is controversial and we will be running benchmarks to make a better informed decision here. Note that pgvector will serve a lot of use cases well all the way up to hundreds of thousands of documents, though.
Vector embeddings: Users can configure this in Skald and we use Sentence Transformers (all-MiniLM-L6-v2) as our default (solid all-around performer for speed and retrieval, English-only). I also ran Skald with bge-m3 (larger, multi-language) and share the results later in this post.
LLM: We don’t even bundle a default with Skald and it’s up to the users to run and manage this. I tested our setup with GPT-OSS 20B on EC2 (results shown below).
Reranker: Users can also configure this in Skald, and the default is the Sentence Transformers cross encoder (solid, English-only). I’ve also used bge-reranker-v2-m3 and mmarco-mMiniLMv2-L12-H384-v1 which offer multi-lingual support.
Document parsing: There isn’t much of a question on this one. We’re using Docling. It’s great. We run it via docling-serve.
So the main goal here was first to get something working then ensure it worked well with our platform and could be easily deployed. From here we’ll be running extensive benchmarks and working with our clients to provide a solid setup that both performs well but is also not a nightmare to deploy and manage.
From that perspective, this was a great success.
Deploying a production instance of Skald with this whole stack took me 8 minutes, and that comes bundled with the vector database (well, Postgres), a reranking and embedding service, and Docling.
The only thing I needed to run separately was the LLM, which I did via llama.cpp.
Having gotten this sorted, I imported all the content from the PostHog website [1] and set up a tiny dataset [2] of questions and expected answers inside of Skald, then used our Experiments feature to run the RAG over this dataset.
I explicitly kept the topK values really high (100 for the vector search and 50 for post-reranking), as I was mostly testing for accuracy and wanted to see the performance when questions required e.g. aggregating context over 15+ documents.
So without any more delay, here are the results of my not-very-scientific at all benchmark using the experimentation platform inside of Skald.
This is our default Cloud setup. We use voyage-3-large and rerank-2.5 from Voyage AI as our embedding and reranking models respectively, and we default to Claude Sonnet 3.7 for responses (users can configure the model though).
Our LLM-as-a-Judge gave an average score of 9.45 to the responses, and I basically agree with the assessment. All answers were correct, with one missing a few extra bits of context.
With the control experiment done, I then moved on to a setup where I kept Voyage as the embeddings provider and reranker, and then used GPT-OSS 20B running on a llama.cpp server on a g5.2xlarge EC2 instance as the LLM.
The goal here was to see how well the open-source LLM model itself stacked up against a frontier model accessed via API.
And it did great!
We don’t yet support LLM-as-a-Judge on fully local deployments, so the only score we have here is mine. I scored the answers an average of 9.18 and they were all correct, with two of them just missing a few bits of information or highlighting less relevant information from the context.
Lastly, it was time for the moment of truth: running a fully local setup.
For this I ran two tests:
The most popular open-source models are all-MiniLM-L6-v2 for embeddings and ms-marco-MiniLM-L6-v2 as the reranker, so I used those for my first benchmark.
Here the average score was 7.10. Not bad, but definitely not great. However, when we dig into the results, we can get a better understanding of how this setup fails.
Basically, it got all point queries right, which are questions where the answer is somewhere in the mess of documents, but can be found from one specific place.
Where it failed was:
* Non-english query: The embeddings model and the reranker are English-based, so my question in Portuguese obviously got no answer
* An ambiguous question with very little context (“what’s ch”)
* Aggregating information from multiple documents/chunks e.g. it only found 5 out of PostHog’s 7 funding rounds, and only a subset of the PostHog competitors that offer session replay (as mentioned in the source data)
In my view, this is good news. That means that the default options will go a long way and should give you very good performance if your use case is only doing point queries in English. The other great thing is that these models are also fast.
Now, if you need to handle ambiguity better, or handle questions in other languages, then this setup is simply not for you.
The next test I did used bge-m3 as the embeddings model and mmarco-mMiniLMv2-L12-H384-v1 as the reranker. The embeddings model is supposedly much better than the one used in the previous test and is also multi-lingual. The reranker on the other hand uses the same cross-encoder from the previous test as the base model but also adds multi-lingual support. The more standard option here would have been the much more popular bge-reranker-v2-m3 model, but I found it to be much slower. I intend to tweak my setup and test it again, however.
Anyway, onto the results! I scored it 8.63 on average, which is very good. There were no complete failures, and it handled the question in Portuguese well.
The mistakes it made were:
* This new setup also did not do the best job at aggregating information, missing 2 of PostHog’s funding rounds, and a couple of its session replay competitors
* It also answered a question correctly, but added incorrect additional context after it
So overall it performed quite well. Again what we what saw was the main problem is when the context needed for the response is scattered across multiple documents. There are various techniques to help with this and we’ll be trialing some soon! They haven’t been needed on the Cloud version because better models save you from having to add complexity for minimal performance gains, but as we’re focused on building a really solid setup for local deploys, we’ll be looking into this more and more.
I hope this writeup has provided you with at least some insight and context into building a local RAG, and also the fact that it does work, it can serve a lot of use cases, and that the tendency is for this setup to get better and better as a) models improve b) we get more open-source models across the board, with both being things that we seem to be trending towards.
As for us at Skald, we intend to polish this setup further in order to serve even more use cases really well, as well as intend to soon be publishing more legitimate benchmarks for models in the open-source space, from LLMs to rerankers.
If you’re a company that needs to run AI tooling in air-gapped infrastructure, let’s chat — feel free to email me at yakko [at] useskald [dot] com.
Lastly, if you want to get involved, feel free to chat to us over on our GitHub repo (MIT-licensed) or catch us on Slack.
[1] I used the PostHog website here because the website content is MIT-licensed (yes, wild) and readily-available as markdown on GitHub and having worked there I know a lot of answers off the top of my head making it a great dataset of ~2000 documents that I know well.
[2] The questions and answers dataset I used for the experiments was the following:
...
Read the original on blog.yakkomajuri.com »
The chief prosecutor of the International Criminal Court suddenly couldn’t access his email. According to Microsoft, that’s because of US sanctions against the court’s employees. The Trump administration was not amused by the Court’s arrest warrant against the Israeli Prime Minister, Benjamin Netanyahu.
The main takeaway from this episode is that those looking to protect themselves from Trump’s wrath would be wise not to depend on any companies from his country. According to the Dutch newspaper NRC, the International Criminal Court now uses a German alternative to Microsoft, though it has not officially commented on the switch.
The German alternative, OpenDesk, allows users to send emails, edit text-based documents, create presentations, share files, and make video calls. It is open source, so anyone can view and improve its code.
The same applies to another alternative, also from Germany, called Nextcloud. This office software has been tested by around 75 researchers from five Dutch universities since the beginning of 2025. Maybe other institutions could switch to it as well?
Dependency
Dutch higher education is highly dependent on American tech companies, especially Microsoft. Not only do students and staff use its software extensively, but their IT staff are tied to a wide range of specialised Microsoft software. In addition, Dutch universities store a lot of data in Microsoft’s cloud.
Dutch lecturers have been sounding the alarm about this. Last Wednesday, the knowledge centre for practice-oriented research, DCC-PO, stated that the dominance of parties such as Google and Microsoft threatens the autonomy of Dutch researchers. In their view, universities should adopt more open-source tools and open standards.
In July, the Young Academy also warned that students and staff at Dutch higher education institutions have no idea what tech companies are doing with their data. By outsourcing the management of IT systems, these educational institutions are losing technical knowledge and control. As a result, they are becoming increasingly dependent on big tech, putting academic freedom and independence at risk.
Fickle
Seven Dutch universities and one university college are already on the State of Florida’s sanctions list for severing or freezing ties with Israeli institutions. With a fickle president like Donald Trump, educational institutions could also face “punishment” at any moment.
Can they do without Microsoft, however? Can they work without Office, Outlook, Teams and OneDrive? Not yet, according to UU professors José van Dijck and Albert Meijer. “All research and education would come to an immediate standstill,” they wrote in March in an open letter calling on the Executive Board to do something about digital dependence.
According to the professors, Utrecht University is particularly dependent on Microsoft Office 365. UU staff and students use the programme for email and video calls, writing and sharing documents, creating presentations and data storage, among other tasks. Such dependence makes for “vulnerabilities, especially in light of a rapidly changing geopolitical situation”.
Meijer and Van Dijck believe that “dependence on big tech is fundamentally at odds with public values such as freedom, independence, autonomy and equality”. The professors would like the Executive Board to invest more in “local expertise,” for example, by using its own mail server. They also recommend collaborating with other European universities, especially those in Germany and France, “on an autonomous academic IT infrastructure”.
Breaking free
It is becoming increasingly clear that dependence on big tech entails risks. This also applies to Dutch higher education, according to Wladimir Mufty, from SURF, the IT cooperative of Dutch education and research institutions. “We have already gone through an awareness phase that lasted several years. We have looked at where the dependencies lie, and now it is time to start trying out alternatives.”
Mufty is SURF’s digital sovereignty programme manager. At the end of last year, he sat down with five universities that wanted a single, shared digital environment for their research programme, AlgoSoc. Scientists from Delft, Utrecht, Rotterdam, Tilburg and Amsterdam (UvA) wanted to use the same appointment planner, share files, work together on a single text, and make video calls, without being dependent on a large provider. Mufty suggested the open source software package from Nextcloud.
One of the users, PhD student Jacqueline Kernahan from TU Delft, thinks that Nextcloud could compete with Microsoft, though there are still a few glitches here and there. She is not deterred by those, as she knows how problematic dependence on Microsoft is.
She demonstrates the software in the hall of her faculty in Delft. It looks very ordinary. “The word processor is quite good,” says Kernahan, who is doing her PhD on quality and security controls in digital systems. “I’m an average user, so I don’t need all the options and apps the programme has to offer. But, to be honest, Microsoft is making it increasingly attractive to switch. Now that the company is putting AI in everything, everything is becoming more annoying to use.”
Nevertheless, Mufty believes that not all educational institutions will be able to switch to OpenDesk or Nextcloud overnight. “The Criminal Court now has to act quickly, under pressure, but if a university wanted to move away from Microsoft tomorrow, that would pose a problem.”
Entanglement
Meanwhile, Microsoft is taking on more and more tasks. In addition to office software, it also develops artificial intelligence, builds its own data centres and even lays its own internet cables on the seabed. The company is “vertically integrated”, as specialists call it: everything can be done through one company, from basic technology to the end user.
And that’s not all. Microsoft is also expanding “horizontally” by acquiring companies where content is the primary focus, rather than technology. “That’s a new phase, which I find worrying,” says Mufty. For example, Microsoft bought LinkedIn, with its hundreds of millions of active users who produce enormous amounts of data, and GitHub, where software developers can share and store their work.
SURF is keeping a close eye on these developments. “I would like our education to remain public and be able to pursue public values such as autonomy, independence and academic freedom. IT should be helpful, not controlling,” says Mufty.
He views the collaboration between Microsoft and Sanoma with suspicion. The Finnish publisher, which also serves the Dutch education market through its Malmberg subsidiary, wants to make its teaching materials available via Microsoft Teams. Microsoft would then add its own “learning accelerators”, i.e. artificial intelligence designed to help personalise the learning process. “Things like this sometimes keep me awake at night,” sighs Mufty.
Alternatives
Dutch and European alternatives do exist. For example, research institute TNO is working with SURF and the Netherlands Forensic Institute on its own AI language model. There are also dedicated data centres.
Additionally, SURFConext is making headway with a secure login service. “But that’s not enough. If logging in via Microsoft doesn’t work in the future for whatever reason, we’ll have a big problem. This also applies to applications that are not from Microsoft itself,” explains Mufty.
In his view, we need serious alternatives. When the need arises, one shouldn’t have to start from scratch. Moreover, competition ensures that the market leader cannot charge top dollar.
But which educational institution is willing to sacrifice itself to run those alternatives, with all the teething problems that entails, when Microsoft can deliver everything ready-made? Mufty believes that, especially in the beginning, educational institutions will have to run two systems in parallel, with additional expenditure on support, maintenance, and security. “But in my opinion, no sector is as value-driven as education and research. This is precisely where alternatives should be able to get off the ground.”
Rectors
In 2019, the rectors of fourteen universities jointly published a compelling argument about the digital independence of Dutch higher education. The gist was that we risk losing control to Google and Microsoft.
Little has improved since then, according to Jacquelien Scherpen, Rector of the University of Groningen. “The coronavirus pandemic broke out just a few months after that article was published. We became even more dependent on big tech, because we didn’t have time to look for alternatives.” Microsoft Teams has become indispensable, for example.
Scherpen is the portfolio holder for digital sovereignty within UNL, the umbrella association for Dutch universities. She advocates taking small steps: “If we now choose an alternative product that functions less well, students and staff will start using free programmes, and we will be further away from our goal.”
Moreover, Scherpen says that we need legislation to protect European alternatives from big tech. Suppose a university partners up with a European competitor of Microsoft, and then Microsoft buys that company, what is the university to do then?
That is not a theoretical scenario. She mentions the Dutch software company Solvinity, which is involved with government services such as DigiD and provides secure communication for the Ministry of Justice. An American company now wants to take it over.
Scherpen: “Perhaps we need to become more protectionist, without hindering the free exchange of new insights and innovations. We must ensure that the independence we are fighting for does not slip out of our hands again.”
...
Read the original on dub.uu.nl »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.