

That was a nice detailed explanation. The description of the way the tests degenerated was really worrying. Even some boosters insist the tests need careful human oversight.


That was a nice detailed explanation. The description of the way the tests degenerated was really worrying. Even some boosters insist the tests need careful human oversight.


Relatedly, I think another part of the problem is the implicit assumption that ‘able to do one narrowly defined/narrowly constrained type of problem within a field’ = ‘expert in a field’.


my point is the Scott-A is massively overvaluing the societal worth of pure mathematicians.
I think pure mathematics is as valuable as the humanities. Unlike many stembros, I think the disconnect is that we vastly undervalue humanities, not that pure mathematics is overvalued. Agreed Scott is probably overvaluing them.


deleted by creator


If they advertise themselves as a team of forecasters, but then pick a number that doesn’t line up with their forecasts because one team member has a gut feeling or vibes it should be sooner, then that is just another reason not to trust them and to treat them like the clowns they are. Of course, even that reading is pretty charitable, the real reason they picked 2027 is to balance urgency and hype generation with a bit of cushion for when the prediction doesn’t pan out.


Gotta love the ex post facto of it all.
Within a month after it was out, they were already building up excuses (calling 2027 their modal number, and admitting their timelines had already slipped back a few months). Also, if you read between the lines of various statements they made, they all but admit they picked 2027 for maximum clout/influence. (Lying is okay if its to stop the AI apocalypse! Or maybe they were all more short-term sort of grifters). Even Eliezer recognized setting a hard and early date would damage the grift for everyone!


I was surprised in a good way see nearly every single comment call him out. Of course, some of those comments (maybe even the majority) are probably boosters mad that he is skipping the slop emails in his inbox. I guess Paul Graham found an angle of hypocrisy that both AI boosters and realists can unite in mocking. Quite an accomplishment.


Not when what they want contradicts the basic limits of reality and logistics!
Ed Zitron has done a breakdown on building normal sized data centers vs. the current target size of AI data centers, and on the bigger end normal data centers are 10s of MWs up to 100s of MWs. After 2 years Stargate Abilene has only turned on its first 200-300 MW. So I think even if regulators roll over on using twice the power of the entire state this project would take 2-3 years just to turn on the first few hundred megawatts then stall out.


Every author named as writing a paper bears full responsibility for the paper.
This has the nice added bonus that it will likely catch PI’s that put their name on their grad students paper without actually doing the mentoring they were supposed to. It will also catch professors that coast (or at least inflate their citation index) by getting their name on papers they barely contributed to.
I am quite convinced that, under these arxive guidelines, every single major PI in the field will be banned within a few years.
Catching a lot of PIs that have allowed and even encouraged slop submission is a good thing in my book.


I was pretty happy about seeing that news about arXiv! So much news has been various organizations giving into LLM usage like some kind of inevitability, so it was a nice change of pace.


he just posted an entirely unnecessary amount of words
taking a quick look at it… it’s actually short by Scott’s standards, but still overly long, given that the only point he makes is claiming Lindy’s Law is applicable to predicting AI progress in absence of other information. Edit: glancing at it again… its not that short, I kinda skimmed until I got to Scott’s actual point my first time glancing at it. You can’t blame me for not reading it.
you-can’t-really-knows
Yeah, he straw-mans AI critics/skeptics as trying to make an argument from ignorance, then tries to argue against that strawman using Lindy’s Law (which assumes ignorance and a pareto distribution). He completely ignores that AI critics are actually making detailed arguments about LLM companies consuming all the good and novel training data, hitting the limits on what compute costs they can afford, running into problems of the long lead time for building datacenters, etc. Which is pretty ironic given his AI 2027 makes a nominal claim to accounting for all that stuff (in actuality it basically all rests on METR’s task horizons, and distorts even that already questionable dataset).


The plagiarism, massive expenditure of venture capital, and unreliable slop output are all intrinsic to the technology, and they hate to be reminded of that because there isn’t much they can do about it. From a technological standpoint, even locally run community fine-tuned open-weight models still originated from plagiarism and big corporate investments, and still output slop. From a social standpoint, the most the can do is try to claim legitimacy through consensus building and we are a threat to that.


It is this continuing slippage of standards that makes me appreciate a hard line against any and all genAI that place like awful.systems have. You concede one small usage and the boosters will keep pushing for more.


Even Scott’s fantasy dream scenario for what prediction markets could be like and what questions they could answer feels… …deliberately naive? …like libertarian brainrot? …disconnected from reality?
Ask yourself: what are the big future-prediction questions that important disagreements pivot around? When I try this exercise, I get things like:
Will the AI bubble pop? Will scaling get us all the way to AGI? Will AI be misaligned?
Huge amounts of money are being dumped into a bubble based on hype, so to hope a predict market would or could make better predictions than the existing business-idiot VCs funding this bubble feels hopelessly naive in a libertarian kind of way. There is already a method of aggregating the wisdom of the crowd and it is failing to incredibly lazy hype and PR.
Will Trump turn America into a dictatorship? Make it great again? Somewhere in between?
Again, there is already a mechanism for aggregating wisdom of the crowds, its called an election, and its also failed to get a answer predicated on reality or truth, so again, it seems incredibly naive to expect prediction markets to do better!
Will YIMBY policies lower rents? How much?
I mean, the councils and communities making these decision already ignore or overlook longer-term broader predictions of economic impact in favor of immediate home-owner value, I don’t see why Scott would expect prediction markets to help decision making go better here.
Overall, it feels like Scott is overlooking the way decision making often already ignores science and experts. Society doesn’t have a problem making decent predictions compared to the problems it has communicating expert opinions to the public effectively and crafting policy aligned with the public interest.


The prediction markets seem to have all the basic problems that sneerclubbers: problems with resolution mechanisms, all sorts of insider trading and gaming the market, people using it for gambling…
Various prediction markets have made various half-assed attempts at solutions, but so far nothing seems to actually work well enough to make prediction markets nearly as useful as rationalists expected.


Some of the change probably involves the discovery of a natural bat coronavirus with a furin cleavage site last October, but I’m surprised by the extent of the decline.
That actually seems like the prediction market sort of did its job in this case? I mean, 27% yes is still too high, but actually changing in response to real evidence is much better than my low low expectations for prediction markets. It seems like he should take his own advice and actually take the prediction market seriously in this case.


Yeah that was a good article. I think that is one of the fundamental issues with rationalists, they are basically a group formed around neat sci-fi ideas and not actually getting anything done, and their strong libertarian biases prevent them from actually pursuing the strategies that would be most effective for many of their nominal goals.


Their proposed sort of solution (controlled miscalibration) even amounts to forcing the model to generalize less by memorizing more, which used to be the opposite of why you would choose to use this type of topography.
Yeah, it does seem to be running into the basic issue that what boosters want LLMs to be (all knowing oracle) is in sharp contrast to what LLMs actually are (churn out statistically plausible content).


You’ve described the problem with generalization yes. Well, you could maybe sort of train it not to generate “all men are cats”, but then that might also prevent it from making the more correct generalization “all cats are mortal” or even completely valid generalizations like combing “all men are mortal” and “Socrates is man” to get “Socrates is mortal”.
The problem with monofacts is a bit more subtle. Let’s say the fact that “John Smith was born in Seattle in 1982, earned his PhD from Stanford in 2008, and now leads AI research at Tech Corp,” appears only once in the training data set. Some of the other words the model will have seen multiple times and be able to generate tokens in the right way for. Like Seattle as a location in the US, Stanford as a college, 2008 as a date, etc. But the combination describing a fact about John Smith appearing uniquely trains the model to try to generate facts that are unique combinations of data. So the model might try to make up a fact like “Jane Doe was born in Omaha in 1984, earned her master from Caltech in 2006, and is now CEO of Tech Corp” because it fits the pattern of a unique fact that was in its training data set.
Anyone familiar with the IPO process have any guessestimates about how long until the public complete S-1 follows? Or odds that it leaks?