I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b
The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer
You can view the model performance within ollama using the command: /set verbose
An ARM supercomputer was already #1 in 2020 [0]. The Japanese Fugaku is also notable because it doesn't use GPUs to achieve high performance, but rather wide vector units on the CPU.
I'm sorry but being #1 or even in the top 500 matters very little because most of it is due to how many units are installed in the cluster including networking, memory, architecture, cooling, etc. I mean, if you didn't reach top 500, just add more units into the cluster until you do. Don't get me wrong because there is a lot of tech involved but we are talking about distributed processing instead of the performance and efficiency of a single processor.
I love these puzzles. GNU C supports a label as value for computed goto. This is useful for direct threaded dispatch. You trade off a branch instruction for an address lookup, but it makes the code more structured.
Sometimes, the fact that one implementation includes it can make it actually more difficult to standardize, if there are some implementation details that are disagreeable (see GNU's nested functions)
I wrote a short paper for HPEC that included some power and performance benchmarks and analysis on the HiFive Unleashed U540 SoC [0]. The SoC isn't open source as some suggest although I believe the core is based on the open source Rocket Chip Generator [1]. It seems the greatest weakness was in the slow memory interface. The details of memory controller and configuration were proprietary when I tried to find out why it wasn't performing well.
The STREAM benchmark [6] was also compiled and executed, confirming that DRAM performance is limited to less than 1.6 GB/s on this platform. It’s unclear if this is a problem with the cache hierarchy, memory controller, or the configuration
of the DDR Controller Control Registers
Wow, that's unspeakably terrible. That's about 10-20% of the b/w one should get from 2400 mem (depending on how many channels the mem controller uses).
I hope this is some simple misconfiguration that can be fixed in firmware or in the kernel.
The Freedom platform is open and lots of the tilelink interconnect. The xore is based on rocket but they have some internal changes that is not open source jet.
The generated RTL is open but that is of course limited.
You don't have to worry too much about the low Earth orbit (LEO) sateillites as they will deorbit soon enough. My understanding is that there is also ongoing research into methods to deorbit them at end of life.
In the case of a Kessler syndrom event it remains a concern. Collisions will push some debris higher while causing other pieces to enter early. The result would be a debris cloud that persists for decades even in LEO.
From first principles, I'm not seeing how a LEO collision pushes a significant amount of debris notably higher, but I'm certainly missing something:
Post collision, all debris orbits will still be passing through the point of collision. Any deflection with a vertical component (up or down towards the earth) will have a part of their orbit go through thicker atmosphere, which will make them deorbit faster. That leaves deflections which are in the plane spanned by the two orbits ("sideways" and "forwards/backwards"). If those deflections in any way slow down the piece of debris, that will also go through lower atmosphere and deorbit.
Disregarding debris under those effects, the remaining debris will have two more things going for it: They'll be out of the LEO orbit for a large part of their (now elliptic) orbits, and they'll be smaller so they'll slow down more from friction (due to the square-cube law).
Of course, cascading effects could still affect all satellites in LEO (and humanity's access to orbit for years), but it doesn't seem to me like it'd be a "permanent" issue in LEO? What am I not seeing?
When collisions are more frequent than once per orbit you get effects where more than one collision cause a bit of debris to have itself lifted. MOST debris is deorbited, but a small fraction is lifted into a higher orbit overall. As these collide with other objects the total mass of the system might go down (from reentry), but the number of objects and the frequency of collision goes up, continuing the process of some objects getting into higher energy orbits.
This is not too dissimilar to the process of evaporative cooling in a liquid, or gas escape from an atmosphere.
My gut feeling says that statistically, even just going from one impact to two impacts being likely would require an immense density of satellites, let alone having more collisions than that.
Then there's also the fact that every impact would have a loss of kinetic energy (because it gets converted to heat as the objects deform), which would also make a reduction in orbit likely.
If the debris keeps fragmenting, which maybe could increase odds of impact, the remaining kinetic energy would be divided over each object. The smaller the debris gets, the more drag it should feel too, because of the square-cube law[0]. So that too would only make it more likely to deorbit.
Not how orbits work. A collision can't cause, for example, an object with a circular orbit at 400km (passive reentry regime) to become fragments with a circular orbit at, say, 2000km (non-passive reentry regime.) Like snaily said, all fragments originating from a collision will still pass through the point of collision, which, if it is still in the upper atmosphere, will lead to reentry. Orbital debris is actually very dissimilar to gas escape.
The proportion of fragments that would have their orbits boosted, through multiple collisions, to an orbit higher than the upper atmosphere, is trivial. Nearly every angle of collision between two objects in orbit lowers their periapses. The risk of Kessler Syndrome doesn't come from objects in upper-atmosphere orbits somehow getting boosted out through collision chains, it comes from collisions between objects already in higher orbits not strongly affected by atmospheric drag (>600km).
It can me infinite collisions. It does not matter. Conservation of momentum still applies. The system of collisions only have a finite amount of energy. And it's chaotic rather than engineered. So it's not cumulative, much more likely to happen at numerous different angles cancelling previous collision trajectories out.
There are several efforts including VexCL [0] expression templates, OpenACC [1] preprocessor, Kokkos [2] template metaprogramming, SYCL [3], and many other similar projects. Parallel programming for non-trivial parallelism is still...non-trivial.
Don't the inevitable methane leaks from natural gas create more of a GHG effect than the CO2 emitted from burning it? It's not just the environment disasters like in California, but everyday minor leaks.
Methane has a short half life in the atmosphere ~10 years ish and there is already quite a bit of it there. So, it's only really important when the amount released is increasing relative to the average released over the last 20 year. AKA if they cut release in 20 years then (ed: it's contribution to) temperature would drop fairly quickly. https://en.wikipedia.org/wiki/Atmospheric_methane
Note it does break down into more CO2 than the wieght of Methane because Oxygen molecules are heavy.
The real risk is if warming the arctic starts to release a lot of stored methane, which would be very bad.
John Gustafson is on the board of Rex Computing, a small semi startup. They claim to have taped out last year, but I don't know if the chip has been validated or if it had posits in silicon, but I know that is one of their longer term goals.
I find Gustafson's UNUMS 2.0 more compelling than the first version.
Founder of REX Computing here, we taped out in July of 2016 and got our silicon brought up and working back in February. I gave a talk at Stanford showing our hardware and a tiny bit of software: https://www.youtube.com/watch?v=ki6jVXZM2XU
Type 2 unums are pretty much entirely deprecated by type 3 unums (now given the name 'posits' as referred to in the OP's linked paper)... they are basically superior in every way, and I recommend watching John's February 2nd talk at Stanford.
I may be somewhat qualified to speculate...Based on my experience with both Intel's and AMD's OpenCL implementations for their CPUs, I suggest that Intel has a much better vecorizing compiler than AMD. The benchmarks they are running have different compilers. If it was a simple C code compiled by GCC for each CPU, the comparison would be better. It would be interesting to see the results for AMD's OpenCL compiler on the i7 and Intel's compiler on Ryzen.
"In addition to ditching the requirement for regular password changes, the NIST is also advising sites to allow users to create passwords that are at least 64 characters long and include spaces so people can create pass phrases that may be easier to remember and to ditch special character requirements."
The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer
You can view the model performance within ollama using the command: /set verbose
[0] https://github.com/ollama/ollama