10 interesting stories served every morning and every evening.
Opened YouTube and was greeted with this abomination:
This is on a 32” 1440p display. There are five (5) videos visible, and 1/6 of the page would have been an enormous ad.
For reference, here is YouTube as of January 2019:
There are 30 videos visible and zero ads.
I really, really hope that this A/B test fails.
Unfortunately, using an advanced analytics package I’ve projected that around May 2026 the YouTube homepage will just be one video, and by September there will be no videos at all on the homepage.
Presumably by then we’ll have our mandatory NeuraLinks and the YouTube algorithm will be able to inject real-time ML generated content (and ads) straight into our brains, tuning its output as needed to maximize our dopamine response.
I miss YouTube before they turned the pain dial all the way towards money.
...
Read the original on jayd.ml »
Pupils will be able to use their phones in some circumstances, but they will need to get permission from teachers.
Pupils will be able to use their phones in some circumstances, but they will need to get permission from teachers.
Finnish Parliament voted on Tuesday to approve a law that restricts the use of mobile devices by pupils at primary and secondary schools.
The new rules are expected to come into force after the summer break, in August.
The law does not entirely ban the use of mobile phones at school, and their use will be permitted in certain situations. But generally, the use of phones during class time will be prohibited.
Pupils will need to get special permission from teachers to use their phones, to assist them in studies, or to take care of personal health-related matters, for example.
The new law also gives school staff members the authority to confiscate mobile devices from pupils if they have caused teaching or learning disruptions.
Late last year, Education Minister Anders Adlercreutz (SPP) emphasised that kids’ digital skills will still be supported despite the phone restrictions.
Users with an Yle ID can leave comments on our news stories. You can create your Yle ID via this link. Our guidelines on commenting and moderation are explained here.
...
“According to our own port optimizer, which measures the loadings in Asia, we’ll be down just a little bit over 35% next week compared to last year. And it’s a precipitous drop in volume with a number of major American retailers stopping all shipments from China based on the tariffs,” Seroka said.
Gene Seroka, executive director of the Port of Los Angeles, said Tuesday on CNBC’s “Squawk Box” that he expects incoming cargo volume to slide by more than a third next week compared with the same period in 2024.
Shipments from China to the West Coast of the U. S. will plummet next week as the impact of President Donald Trump’s tariffs leads companies to cut their import orders.
“According to our own port optimizer, which measures the loadings in Asia, we’ll be down just a little bit over 35% next week compared to last year. And it’s a precipitous drop in volume with a number of major American retailers stopping all shipments from China based on the tariffs,” Seroka said.
Gene Seroka, executive director of the Port of Los Angeles, said Tuesday on CNBC’s “Squawk Box” that he expects incoming cargo volume to slide by more than a third next week compared with the same period in 2024.
Shipments from China to the West Coast of the U. S. will plummet next week as the impact of President Donald Trump’s tariffs leads companies to cut their import orders.
Shipments from China make up about 45% of the business for the Port of LA, though some transport companies will be looking to pick up goods at other points in Southeast Asia to try to fill up their ships, Seroka said.
“Realistically speaking, until some accord or framework can be reached with China, the volume coming out of there — save a couple of different commodities — will be very light at best,” Seroka said.
Along with the lower volume of goods, Seroka said he expects roughly a quarter of the usual number of arriving ships to the port to be canceled in May.
Trump announced a sharp increase in tariffs on Chinese goods on April 2, which led to escalation on both sides, eventually resulting in both the U. S. and China imposing levies of more than 100% on many goods from each other. U.S. Treasury Secretary Scott Bessent has described the situation as “unsustainable” but there has been no sign of substantial negotiations between the two countries.
Data on shipments out of China had already started to signal slowing trade volume to the U. S., alarming some economists. Apollo Global Management’s chief economist, Torsten Slok, recently laid out a timeline where lower imports from China leads to layoffs in transportation and retail industries in the U.S., empty shelves and a recession this summer.
Seroka said he thinks U. S. retailers have about five to seven weeks before the impact of the curtailed shipments begins to bite, partly because companies stocked up ahead of Trump’s tariff announcements.
“I don’t see a complete emptiness on store shelves or online when we’re buying. But if you’re out looking for a blue shirt, you might find 11 purple ones and one blue in a size that’s not yours. So we’ll start seeing less choice on those shelves simply because we’re not getting the variety of goods coming in here based on the additional costs in place. And for that one blue shirt that’s still left, you’ll see a price hike,” Seroka said.
...
Read the original on www.cnbc.com »
Hi, my name is Ilia. I founded Perfect Wiki — a SaaS product for creating internal company knowledge bases that works directly within Microsoft Teams. We created a simple and convenient tool for storing, editing, and sharing knowledge within companies. It all started with the idea to resolve one specific pain point: the built-in Wiki in Microsoft Teams offered was inconvenient, and there was no worthy alternatives with full integration to the platform.
In this article, I want to share how the idea came about, the mistakes I made, how I found my first customers, and how I gradually grew to a steady income of $250,000 a year over five years. All of this — without investors, a 20-person team, or a “Series A” round.
In May 2020, I lost my job and started thinking about new projects to launch or where to direct my efforts. The pandemic drastically changed the market: the mass transition to remote work boosted interest in online communication tools, and everyone wanted to launch their own video conferencing service. It felt like a gold rush, and I decided to follow the principle: in such times, those who sell shovels win, not those who search for gold.
Zoom became hugely popular during the pandemic. I decided to try making a small app — a translator — and published it on the Zoom Marketplace. But it turned out people were only interested in the Zoom app itself, and the marketplace had almost no traffic.
After that failure, I moved on to Plan B: I tried publishing the translator app on the Microsoft Teams Marketplace. It seemed like there were significantly more users, apps there had lots of ratings and installs. The platform felt “alive.” My intuition didn’t fail me — just a few days after publishing, someone bought a paid subscription. But I soon realized the translator app was very limited with no room for growth. Microsoft could easily replace it anytime.
That’s when I decided to dive deeper into analyzing what other problems Microsoft Teams users were facing and what kind of service I could offer them. I was confident I’d find a niche because the traffic and activity on the marketplace were high — a ready-made customer base was just in front of me. I just needed to find a product idea that would solve a real problem.
I started reading forums, comments, and online discussions. It turned out the built-in Wiki in Microsoft Teams annoyed users really a lot. It was slow and inconvenient. That’s how the idea came about — I had to create a fast, user-friendly knowledge base built directly into Microsoft Teams. The main goal was to make it simple and intuitive for people who weren’t tech-savvy — just regular PC users.
I created and published the first version of the product in a fairly short time — it took me about three weeks. It already had page creation and editing features, and most importantly, full-text search (a much-requested feature the users lacked in the built-in Wiki).
I used technologies and tools I was already very well familiar with: Node.js + Express for the backend and React for the frontend.
Just a couple of days after publishing Perfect Wiki on the Microsoft Teams Marketplace, I got my first paying user. My assumptions were confirmed — people were actively looking for an alternative to the built-in Wiki, and they searched for it directly in the Teams marketplace. They found my app using the keyword “wiki.” It was an awesome free acquisition channel. Perfect Wiki was always the top search result because there were no competitors. That’s when I realized I had found a real pain point — and I could make money by solving it.
Today, over 500 companies around the world use Perfect Wiki. Our main markets are the US, Canada, the UK, and Germany.
Over five years, the product has grown significantly. Revenue is now about $250,000 a year. However, it wasn’t always smooth sailing — there were months with no growth, times when everything felt stuck. We had to change plans, improve the product, and look for new ideas.
In 2024, Microsoft even featured us at Microsoft Build as an example of an app that’s top-rated and highly valued among Teams users and the one the really works — a big milestone for us.
Many of our clients came to us after trying the Microsoft built-in Wiki. It was clunky, inconvenient, and didn’t do the job well. We focused on simplicity: the essential features only, nothing extra — and everything should function inside Microsoft Teams.
Integration with Microsoft Teams is the key. Unlike other knowledge base platforms, Perfect Wiki doesn’t require switching to a separate site or tab. It’s available right where employees already spend most of their day — in Microsoft Teams. It saves time, doesn’t add any difficulties, and makes working with a knowledge base a natural part of the workflow.
Microsoft tried to address this issue via products like Viva and Loop, but they turned out to be too bulky and confusing. Competitors like Confluence or Notion just aren’t integrated into Teams in a way that’s convenient for users.
Perfect Wiki was built specifically for Microsoft Teams — and that’s been our main advantage from day one.
Currently, the team behind Perfect Wiki is just two people. I handle the development and product, and my colleague manages user support. Despite having a tiny team, we manage to achieve a lot: we launch new features quickly, communicate with customers, test ideas, and maintain stable service.
We outsource some marketing and content tasks, but everything related to the product and code we do ourselves.
Sometimes we bring in new people if we feel it’s time to grow. Right now is one of those moments: if you’re an experienced developer familiar with Node.js + Express + React — send us your CV at hello@perfectwiki.com
It all starts with communication. We have an internal app chat — people regularly send us questions, suggestions, and feedback. We also do demo calls, discuss use-case scenarios, and every quarter, we reach out to active loyal users asking for feature and improvement ideas. This helps us to deeply understand user needs.
We don’t implement features just because they seem useful. Every new functionality in Perfect Wiki must be genuinely requested and needed by users. For example, I wasn’t sure whether a “search within a page” was necessary. But after several complaints about documents getting longer, and Ctrl+F not working in Teams — it became clear the feature was needed.
Another example: users suggested a weekly digest with a list of new or updated knowledge base articles. They wanted to stay in the loop about changes.
That’s how we improve the product — not by simple guessing, but in collaboration with our users.
And we actually use Perfect Wiki ourselves — that helps us spot areas for changes and growth. All our internal documentation, tasks, and plans are stored in Perfect Wiki. Even our public Help Center runs on our platform. This way, we test the product in real use and quickly notice what needs fixing or tweaking.
Every time I check out competitors’ sites — those who also build knowledge base or customer support platforms — I notice something odd. Almost all of them use third-party tools like Intercom or Zendesk to support their own customers. That surprises me. If your product is so great — why don’t you use it yourself? For me, that’s a golden rule: your product should be so good you want to use it yourself. If not, that means something’s wrong.
Right now, I earn around $25,000 per month. My monthly expenses are pretty modest:
Everything else is my profit.
The most important rule: don’t be afraid to build niche products for a narrow audience. It’s vital to create something that solves a specific problem really well.
Second lesson I learned: simplicity wins. The simpler and more understandable your product, the easier it is to sell and maintain. When you have a small team and limited resources, simplicity isn’t a luxury — it’s a necessity. It keeps you from drowning in features, endless requests, and tech debt.
Honestly? I didn’t have big ambitions. I just wanted to earn a stable $70–80K a year — about what I earned at my previous job. Everything beyond that has been a pleasant bonus. Perfect Wiki has grown more than I ever expected. All without investments, offices, or a big team. Just because the product was in demand — and we kept making it better, step by step.
Perfect Wiki has already become more than just an add-on to Microsoft Teams. Now it can also be used in Slack, via ChatGPT, or as a chatbot on your website. You can even create a public support portal for your customers — our Help Center is a prime example.
We’re constantly adding new integrations, improving search, and most importantly — always listening to our users. The best is still ahead!
P. S. If you’re curious to follow our product journey, I have a Telegram channel and Twitter.
...
Read the original on habr.com »
© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U. S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.
...
Read the original on fortune.com »
This code repository is licensed under the Apache2.0 License.
Currently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.
In this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.
We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model. We believe this report along with the models will provides valuable insights to develop powerful reasoning LLM that benefit the larger community.
We optimize data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
* We optimize data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
* We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
* We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
We curate 130K mathematics and code problems as RL training data, which can be verified by rule-based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. We employ only rule-based accuracy rewards to avoid potential reward hacking.
To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
* We curate 130K mathematics and code problems as RL training data, which can be verified by rule-based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. We employ only rule-based accuracy rewards to avoid potential reward hacking.
* To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
* We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving 2.29$\times$ faster training and 1.96$\times$ faster validation.
We support MTP in vLLM and enhance the robustness of the inference engine in RL system.
* We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving 2.29$\times$ faster training and 1.96$\times$ faster validation.
* We support MTP in vLLM and enhance the robustness of the inference engine in RL system.
[Recommended] We official support inference with MiMo-MTP using our fork of vLLM.
from vllm import LLM, SamplingParams
model_path = “/path/to/MiMo”
llm = LLM(
model=model_path,
trust_remote_code=True,
num_speculative_tokens=1,
disable_log_stats=False
sampling_params = SamplingParams(temperature=0.6)
conversation = [
“role”: “system”,
“content”: “”
“role”: “user”,
“content”: “Write an essay about the importance of higher education.”,
outputs = llm.chat(conversation,
sampling_params=sampling_params,
use_tqdm=False)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f”Prompt: {prompt!r}, Generated text: {generated_text!r}“)
print(“=” * 80)
Or, you can register a vLLM loader for MiMo without loading MTP parameters.
You can copy the registry/register_mimo_in_vllm.py to your directory and import it with
import register_mimo_in_vllm
from vllm import LLM, SamplingParams
model_path = “/path/to/MiMo”
llm = LLM(
model=model_path,
trust_remote_code=True,
# num_speculative_tokens=1,
disable_log_stats=False
sampling_params = SamplingParams(temperature=0.6)
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer
model_path = “/path/to/MiMo”
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer([“Today is”], return_tensors=‘pt’)
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))
* We recommend using our fork of vLLM which is developed based on vLLM 0.7.3.
We haven’t verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.
@misc{xiaomi2025mimo,
title={MiMo: Unlocking the Reasoning Potential of Language Model — From Pretraining to Posttraining},
author={{Xiaomi LLM-Core Team}},
year={2025},
primaryClass={cs.CL},
url={https://github.com/XiaomiMiMo/MiMo},
Please contact us at mimo@xiaomi.com or open an issue if you have any questions.
...
Read the original on github.com »
And now I can analyze it with DuckDB. Behold the fraction of total comments and stories referencing key topics over time!
As part of building hn.unlurker.com, I wrote a
HN API client. There are already a bunch of other clients, but I wanted to try the latest Go features and linters on a new project. I’m glad I did; it was a lot of fun.
The client can retrieve active items, lists of items, etc. (comments and stories are called “items” in the HN API). Although I only really needed recent items for my project, for completeness I added “scan” which downloads all the items, in order, from zero to the latest or the other way around.
I wondered — could I just download the whole thing? Extrapolating from a few thousand items, it would only be tens of GiB of JSON. I thought I’d give it a try.
hn scan –no-cache –asc -c- -o full.json
I had to CTRL-C a stalled download a few times, but scan is resumable so after a few hours I was done. I had a 20 GiB JSON file of everything that has ever happened on Hacker News, and I can just re-run the command above to “top it off” any time I need the latest. But what could I do with it?
First I just grepped for things. How many times has the phrase “correct horse battery staple” appeared on the site? Quite a few: 231 times (the last one
just today). But grepping stuff is old news, so I thought I’d try out DuckDB.
In the database world, DuckDB is unique: a super-fast embeddable analytics execution engine also available as a command-line tool. I spend most of my day wrangling a
different database (there’s the plug my coworkers will be looking for) but I’ve been meaning to try DuckDB and it seemed perfect for this one-off task.
As it turns out, with their new UI for novices like me, it’s a breeze to use. AND LLMs are pretty good at helping craft the SQL queries. I just had to import the data:
CREATE TABLE items AS
SELECT *
FROM read_json_auto(‘/home/jason/full.json’, format=‘nd’, sample_size=-1);
Then query it. Here’s a 12-week moving average of the fraction of total items containing the terms I am interested in:
WITH weekly AS (
SELECT
DATE_TRUNC(‘week’, TO_TIMESTAMP(time)) AS week_start,
COUNT(*) FILTER (WHERE text ILIKE ‘%python%’)::float / NULLIF(COUNT(*),0)
AS python_prop,
COUNT(*) FILTER (WHERE text ILIKE ‘%javascript%’)::float / NULLIF(COUNT(*),0)
AS javascript_prop,
COUNT(*) FILTER (WHERE text ILIKE ‘%java%’)::float / NULLIF(COUNT(*),0)
AS java_prop,
COUNT(*) FILTER (WHERE text ILIKE ‘%ruby%’)::float / NULLIF(COUNT(*),0)
AS ruby_prop,
COUNT(*) FILTER (WHERE text ILIKE ‘%rust%’)::float / NULLIF(COUNT(*),0)
AS rust_prop
FROM items
GROUP BY week_start
SELECT
week_start,
AVG(python_prop) OVER (
ORDER BY week_start
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS avg_python_12w,
AVG(javascript_prop) OVER (
ORDER BY week_start
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS avg_javascript_12w,
AVG(java_prop) OVER (
ORDER BY week_start
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS avg_java_12w,
AVG(ruby_prop) OVER (
ORDER BY week_start
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS avg_ruby_12w,
AVG(rust_prop) OVER (
ORDER BY week_start
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS avg_rust_12w
FROM weekly
ORDER BY week_start;
Overall DuckDB seems really great for analyzing data sets of this size.
Now that I have a local download of all Hacker News content, I can train hundreds of LLM-based bots on it and run them as contributors, slowly and inevitably replacing all human text with the output of a chinese room oscillator perpetually echoing and recycling the past.
Or alternatively, I think for this project I am done. Someone else will have to take it to the next logical step.
Thanks for reading! Please check out hn.unlurker.com, take a look at my
other articles, or find me on X.
...
Read the original on www.jasonthorsness.com »
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3′s step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model.
To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3 as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals.
We use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data.
We curate a subset of challenging problems that remain unsolved by the 7B prover model in an end-to-end manner, but for which all decomposed subgoals have been successfully resolved. By composing the proofs of all subgoals, we construct a complete formal proof for the original problem. This proof is then appended to DeepSeek-V3′s chain-of-thought, which outlines the corresponding lemma decomposition, thereby producing a cohesive synthesis of informal reasoning and subsequent formalization.
After fine-tuning the prover model on the synthetic cold-start data, we perform a reinforcement learning stage to further enhance its ability to bridge informal reasoning with formal proof construction. Following the standard training objective for reasoning models, we use binary correct-or-incorrect feedback as the primary form of reward supervision.
The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching % pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. The proofs generated by DeepSeek-Prover-V2 for the miniF2F dataset are available for download as a ZIP archive.
we introduce ProverBench, a benchmark dataset comprising 325 problems. Of these, 15 are formalized from number theory and algebra questions featured in the recent AIME competitions (AIME 24 and 25), offering authentic high-school competition-level challenges. The remaining 310 problems are drawn from curated textbook examples and educational tutorials, contributing a diverse and pedagogically grounded collection of formalized mathematical problems. This benchmark is designed to enable more comprehensive evaluation across both high-school competition problems and undergraduate-level mathematics.
We release DeepSeek-Prover-V2 in two model sizes: 7B and 671B parameters. DeepSeek-Prover-V2-671B is trained on top of DeepSeek-V3-Base. DeepSeek-Prover-V2-7B is built upon DeepSeek-Prover-V1.5-Base and features an extended context length of up to 32K tokens.
You can directly use Huggingface’s Transformers for model inference. DeepSeek-Prover-V2-671B shares the same architecture as DeepSeek-V3. For detailed information and supported features, please refer to the DeepSeek-V3 documentation on Hugging Face.
The following is a basic example of generating a proof for a problem from the miniF2F dataset:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(30)
model_id = “DeepSeek-Prover-V2-7B” # or DeepSeek-Prover-V2-671B
tokenizer = AutoTokenizer.from_pretrained(model_id)
formal_statement = “”″
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/– What is the positive difference between $120\%$ of 30 and $130\%$ of 20? Show that it is 10.-/
theorem mathd_algebra_10 : abs ((120 : ℝ) / 100 * 30 - 130 / 100 * 20) = 10 := by
sorry
″“”.strip()
prompt = “”″
Complete the following Lean 4 code:
```lean4
Before producing the Lean 4 code to formally prove the given theorem, provide a detailed proof plan outlining the main proof steps and strategies.
The plan should highlight key ideas, intermediate lemmas, and proof structures that will guide the construction of the final formal proof.
″“”.strip()
chat = [
{“role”: “user”, “content”: prompt.format(formal_statement)},
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=“auto”, torch_dtype=torch.bfloat16, trust_remote_code=True)
inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors=“pt”).to(model.device)
import time
start = time.time()
outputs = model.generate(inputs, max_new_tokens=8192)
print(tokenizer.batch_decode(outputs))
print(time.time() - start)
The use of DeepSeek-Prover-V2 models is subject to the Model License.
If you have any questions, please raise an issue or contact us at service@deepseek.com.
...
Read the original on github.com »
The latest fad amongst tech CEOs is no longer “founder mode”, or taking drugs that they would fire you for taking, or telling everybody to return to the office — it’s demanding that all work be AI-first! This is a great idea if you think nobody at your company is great at what they do. It may otherwise be a suboptimal strategy. Let’s dive in!
Let’s use me as a case study. I’m pretty okay at writing. For example, one time I wrote a fairly technical analysis of Twitter’s platform strategy that inspired Will. I.Am of the Black Eyed Peas to start Twitter beef with me two years later when he read the post and took offense to my referring to him as “nobody’s favorite rapper”.
This is something your GPTs cannot do, I assure you. An average LLM won’t even know that Drake’s favorite MIME type is application/pdf. Chalk one up for the greatness of human creativity.
Shopify’s CEO Tobi Lütke (personal motto: “what if a Canadian was all the worst things about the United States?“) started the “AI-first” trend, with one of those big memos that included, amongst other things, the declaration that “We will add Al usage questions to our performance and peer review questionnaire.” This is unusual — did your boss ever have to send you a memo demanding that you use a smartphone? Was there a performance review requiring you to use Slack? I’m actually old enough that I was at different workplaces when they started using spreadsheets and email and the web, and I can tell you, they absolutely didn’t have to drive adoption by making people fill out paperwork about how they were definitely using the cool new technology. Isn’t that interesting?
Some of the other CEOs talking about the use of AI are a little more reasonable. Duolingo’s CEO Luis von Ahn seems to be trying to be somewhat more moderate in his memo, stating plainly that he doesn’t see AI replacing his employees. (Though that does immediately raise the “who brought that up?” question…) Yet even in this more even-handed take, we still get the insistence that “Al use will be part of what we evaluate in performance reviews”. This is really weird!
The funny thing is, I’m not saying LLMs are without their uses. Let’s use me as a case study again. I’m a lousy coder, these days. I haven’t had time to keep up my skills, and the area I focused on for most of my dev career (front end web development) changes particularly quickly. So I use some of the modern tools to help me get up to speed and get more done in a limited amount of time, because otherwise I’m woefully unproductive in the short windows I have to code in my free time.
To be explicit: I code on the weekends, not professionally. That means I’m not very good at it. I’m certainly nothing like the incredibly talented developers that I’ve had the good fortune to work with over the years. I’m just fluent enough to be able to debug the broken code that LLMs generate, or to catch the bugs that they spew out by default. And I’m sure I don’t even catch all the bugs that pop up, but fortunately, I’m not making any production systems; I’m just building little toy apps and sites for myself.
This is an important illustration: AI is really good for helping you if you’re bad at something, or at least below average. But it’s probably not the right tool if you’re great at something. So why would these CEOs be saying, almost all using the exact same phrasing, that everyone at their companies should be using these tools? Do they think their employees are all bad at their jobs?
Big tech CEOs and VCs really love performing for each other. We know they hang out in group chats like high schoolers, preening and sending each other texts, each trying to make sure they’re all wearing the latest fashions, whether it’s a gold chain or a MAGA hat or just repeating a phrase that they heard from another founder. A key way of showing that they’re part of this cohort is to make sure they’re having a tantrum and acting out against their workers fairly regularly.
The return to office fad was a big part of this effort, often largely motivated by reacting to the show of worker power in the racial justice activism efforts of 2020. Similarly, being AI-first shows that a company is participating in the AI trend in the “right” way, by imposing it on workers, rather than trusting workers to judge what tools are useful for them to do their jobs.
A more normal policy on AI at a company might be something like this:
Our IT department has evaluated a set of LLM tools and determined that these ones meet our requirements for security, performance, data governance, reliability, manageability and integration with our workflows. We’ll be doing a controlled deployment of these tools and you can choose to use them if you think they’ll help you with your work; please share your feedback on whether they are helpful, and what might make them more useful for you over time. Here are the ways these AI tools meet our corporate standards for compliance with intellectual property consent, sustainability and environmental goals, and accessibility.
This would not get you invited to the fascist VC group chat, tho!
How did we get here? What can we do? Maybe it starts by trying to just… be normal about technology.
There’s an orthodoxy in tech tycoon circles that’s increasingly referred to, ironically, as “tech optimism”. I say “ironically”, because there’s nothing optimistic about it. The culture is one of deep insecurity, reacting defensively, or even lashing out aggressively, when faced with any critical conversation about new technology. That tendency is paired with a desperate and facile cheerleading of startups, ignoring the often equally interesting technologies stories that come from academia, or from mature industries, or from noncommercial and open source communities that don’t get tons of media coverage, but quietly push forward innovating without the fame and fortune. By contrast, those of us who actually are optimistic about technology (usually because we either create it, or are in communities with those who do) are just happily moving forward, not worrying when people point out the bugs that we all ought to be fixing together.
We don’t actually have to follow along with the narratives that tech tycoons make up for each other. We choose the tools that we use, based on the utility that they have for us. It’s strange to have to say it, but… there are people picking up and adopting AI tools on their own, because they find them useful. This is true, despite the fact that there is so goddamn much AI hype out there, with snake oil salesman pushing their bullshit religion of magical thinking machines and overpromising that these AI tools can do tasks that they’re simply not capable of performing. It’s telling that the creators of so many of the AI tools don’t even have enough confidence in their offerings to simply let users choose to adopt them, and are instead forcing them into users’ faces in every possible corner of their apps and websites.
The strangest part is, the AI pushers don’t have to lie about what AI can do! If, as they say, AI tools are going to get better quickly, then let them do so and trust that smart people will pick them up and use them. If you think your workers and colleagues are too stupid to recognize good tools that will help them do their jobs better, then… you are a bad leader and should step down. Because you’ve created a broken culture.
But I don’t think the audience for these memos is really the people who work at these companies. I think the audience is the other CEOs and investors and VCs in the industry, just as it was for the other fads of the last few years. And I expect that AI will indeed be part of how we evaluate performance in the future, but mostly in that the way CEOs communicate to their teams about technologies like AI will be part of how we all evaluate their performance as leaders.
...
Read the original on www.anildash.com »
From the start of 2024 to the present, the Android app marketplace went from hosting about 3.4 million apps worldwide to just around 1.8 million, according to a new analysis by app intelligence provider Appfigures. That’s a decline of about 47%, representing a significant purge of the apps that have been available to Android users globally.
The decline is not part of some larger global trend, the firm also notes. During the same period, Apple’s iOS App Store went from hosting 1.6 million apps to now just around 1.64 million apps, for instance — a slight increase.
In Google’s case, the decline in apps could be a relief for Android device owners who have had to sort through scammy, spammy, and otherwise poor-quality apps to find the best ones to install. The reduction could also help developers who have had to fight for visibility.
Over the years, Google Play’s less stringent requirements for app review have led to the marketplace being overrun with lower-quality apps. While Apple continues to enforce strict app review measures before publication, Google often relies on automated checks combined with malware scans to speed up the app-review process. It tends to have a shorter app-review period as a result of its lighter touch in terms of human review.
In July 2024, Google announced it would raise the minimum quality requirements for apps, which may have impacted the number of available Play Store app listings.
Instead of only banning broken apps that crashed, wouldn’t install, or run properly, the company said it would begin banning apps that demonstrated “limited functionality and content.” That included static apps without app-specific features, such as text-only apps or PDF-file apps. It also included apps that provided little content, like those that only offered a single wallpaper. Additionally, Google banned apps that were designed to do nothing or have no function, which may have been tests or other abandoned developer efforts.
Reached for comment, Google confirmed that its new policies were factors here, which also included an expanded set of verification requirements, required app testing for new personal developer accounts, and expanded human reviews to check for apps that try to deceive or defraud users.
In addition, the company pointed to other 2024 investments in AI for threat detection, stronger privacy policies, improved developer tools, and more. As a result, Google prevented 2.36 million policy-violating apps from being published on its Play Store and banned more than 158,000 developer accounts that had attempted to publish harmful apps, it said.
One factor Google didn’t cite was the new trader status rule enforced by the EU as of this February, which began requiring developers to share their names and addresses in the app’s listing. Those who failed to do so would see their apps removed from EU app stores. (It’s worth pointing out that Apple also began requiring trader status information in February and did not see a decline in available apps as a result.)
Appfigures additionally notes it began seeing a decline in the number of apps on the Google Play Store even before the official start of the purge last summer; it doesn’t yet have an explanation for this change. However, the firm says there have been 10,400 releases on Google Play so far this year, up 7.1% year-over-year as of April.
...
Read the original on techcrunch.com »
To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".
10HN is also available as an iOS App
If you visit 10HN only rarely, check out the the best articles from the past week.
If you like 10HN please leave feedback and share
Visit pancik.com for more.