Hacker Newsnew | past | comments | ask | show | jobs | submit | artemonster's commentslogin

The very first sentence: "Large language models (LLMs) have shown promise in register-transfer level (RTL) design automation" I want to see some serious proof for this shitty claim. While LLMs excel at slop webapp codegen because the code is usually highly modular, composable and easy to reason about), LLMs understanding of RTL is just pure dogshit. A simple signaling protocol, even well documented with some temporal behaviour and even some ready made assertions that are picked up by formal verification tools for static proving - none of this helps any top tier LLM to grok whats happening. state explosion, temporal dependencies, no composition - RTL is not code, its construction for complex machinery and LLM suck balls at it. and all of this will not go away if you slop into existence some low quality DSL for netlists

I had Claude build a spiking neural net example for classifying MNIST digits in FPGA last week. It wrote the verilog and seemed to deal fine with timing issues. The FPGA receives a binarized/downsampled (to 14x14) image via SPI and runs the classification. ~8K LUTs in an ECP-5 85K. 95% accuracy. Claude even built me an app that runs on the PC side that lets you draw a digit and then send it to the FPGA via SPI.

I've also had it make mods to an existing RISC-V processor to add custom accelerators. Did fine there as well.


> MNIST digits in FPGA

Your example suffers from one of the same problems in the paper. They exclusively used text book level problems in their eval of the models HDL coding ability. These are effectively poisoned in the training data with many many copies of solutions done in different ways.


I've used AI successfully for some things but for SystemVerilog it's pretty rubbish. Too little training data I assume, and everything is behind firewalls.

However it's not completely useless - I've used it successfully for boilerplate for small demos and bug reproducers... And it has clarified some things for me (obviously I double checked what it said with the LRM). So I guess it's fair to say that they have "shown promise", in the same way that my 6 year old shows promise.

AI is advancing so quickly though, I bet it will be pretty good in a few years.

> RTL is not code

Of course it is. It executes on an unusual machine, but it's clearly code. I'll never understand this "hardware design is totally unlike software design" attitude that hardware designers have. Is it just so they feel special? They're really quite similar.


> RTL is not code

>>Of course it is.

most programming languages have that one serious key property that Ive mentioned above - ABSTRACTION. you can well reason about a function that calls another function addTheseTwoStranegObjectsTogether(arg, arg2) and make totally valid assumptions on how and what would happen. "executing" that code is a depth first call graph walk, pretty linear one too. You cannot do these things by reading RTL code, the state space is enormous, there is no "unusual machine" that executes it (if you mean simulators, thats a different thing). also you cannot reason even about simplest instances since they are stateful and that is in no way is exposed via interface connections

>AI is advancing so quickly though, I bet it will be pretty good in a few years.

Time will tell :) Cheers


I dunno what you're talking about. The whole point of the module hierarchy is abstraction.

> also you cannot reason even about simplest instances since they are stateful

Uhm... Classes? Globals?


Good catch! You want me to rewrite the paragraph to sound less like an LLM? (sigh)

this resonates with my experience. once every 3 years I try linux as primary OS for my home PC, I do small stuff with C/python, browse web and play factorio. I use linux in VM on my job daily, so I am not a beginner, but gosh, linux sucks. Everything breaks constantly even when doing NOTHING. Nothing ever works installing first try, you always end up googling stupid error message and stumble upon 250 other idiots that try to solve same issue. after trying 5 solutions, one (or combo) will hopefully work. Then, you can hang-up entire system by a stupid python script or your own buggy program and I miss unkillable always working task manager that can recover almost every hanger (and just stfu about reisub) without needing to restart the whole system and killing my FKIN FLOW! ugh. I just use WSL2 for rare cases where I need my unix build tools and forever abandoned the idea of switching to linux. Life is too short wasting it on googling some nonsense shit that just have to fucking work.

Is there any research in composing state machines?


All fine, where is pelican on bicycle?


I am always puzzled by such articles - its actually very well made, drawings are good, little interactive pipeline animations are fine. But in order to follow it you must already know and understand what its writeen about and if you dont - the content is just noise for you.


The article does say what it expects you to know before reading. However, it has a dead link to the knowledge it wants you to know.


Author here: thanks for flagging the dead link! Unfortunately, I had to remove it. I couldn't find the original slides.


Hi, I'm the author! Thanks for saying it's well made :).

I actually agree with you, the intended audience isn't someone who has never heard of CPUs before.

I tend to either write for myself: you know the saying you don't understand something until you try to explain. Or I'm writing for the person self-studying that is looking for that one explanation where everything finally clicks. I always get a lot out of those type of posts myself, so like to create them for others too.


You could use colors in the step-by-step simulation to show dependencies. Also show some tooltips/comments when things happen (that you described above). Ideally one should press next next next in the simulation, and understand what happens better than the paragraph description above.


Imagine in steampunk fashion wed get an alternative future timeline where computer tech froze in 80s due to some physical limitation that prohibited shrinking transistors. all typical laptops would have same config as this awesome project. what would the society become?


Speed certainly wouldn't be there, but capabilities would. Plenty could get done on those old machines — most of it had to do with programmers having the imagination & skill to be able to shoehorn their ideas into spaces they weren't meant to be crammed into.

One memory this project brought to mind for me was a hack I came across which allowed simultaneously running DOS 3.3 & ProDOS on a 128k Apple II, giving each 64k (well, a little less due to overhead) & a way to switch between the two with a simple command. Two programs couldn't run at once, but one could step between the two OSes to run programs made for each pretty seamlessly. If this sort of thing was possible on basic consumer hardware, ten or twenty years of development would have led to many far more interesting & useful things.


nah, something like LLMs wouldnt be possible due to sheer power consumption - abstract (FL)OPs/uW is billions worse than modern tech. I used claude to make me back of a napkin calcs - single LLM prompt in 6502 era tech would be over 3k Eur vs fraction of a cent today, DISRECARDING WALL TIME (which is ridiculously impractical)


I believe the actual silicon of a 6502 is much smaller than the DIP package, so even if we couldn't shrink the silicon itself much more, you could just take up more space inside the package, and use a package that has more pins in it, like current CPU designs. You would probably hit a bottleneck at some point since I believe the speed of light is a problem for processing speed at some point, but then I'd expect we'd just go into massively parallel systems, with multiple cores acting somewhat individually


Okay what if something else had prevented something better than a 6502 being mass market available?


the 6502 package would probably shrink to use something like a BGA package, and you could probably make some kind of "multicore" system using 6502 processors. I'm not knowledgeable enough to say how feasible that would be, but you could probably use something with shared memory regions to pass data between them and run code in parallel.

If you are absolutely limited to 6502 DIP chips, there would probably be more prevalence of large mainframe systems and single 6502-based "terminals"/"thin clients". The mainframes could use systems similar to the Transputer or the Connection Machine to use large amounts of (comparatively) low-power processors to make a single, more powerful computer. They both used custom processors, with the Connection Machine in the early 80s and the Transputer in the late 70s. You could probably reasonably easily create a "graphics card" style system, comprised of many 6502 cores in a SIMD configuration.

I don't know how easy it would be to implement wifi or ethernet with only 6502 chips, so communications with the mainframe might be quite slow


Isn't this basically the idea behind collapse os? Chin up! That could still be our future.


I was thinking lately about how much memory you could handle on a 6502. The BBC Micro had a 16KB block of RAM paged between up to 16 ROMs/RAM but if you could have 256 banks you could do 4MB. One problem is that that would require a very large PCB. Another problem is that the OS searches for commands on all the ROMs and this would become slow for so many banks; one solution would be to limit the ROMs to the first few banks and let the rest be RAM.

It could be useful for some sort of minicomputer for business applications.


The Commodore REU (RAM Expansion Unit) architecture for the C64/C128 allows for up to 16 MiB - 256 banks of 256 addresses in 256 pages.

Due to the lack of support hardware in the C64 (no hardware RAM bank switching/MMU) this memory is not bank switched and then directly addressable by the CPU, it's copied on request by DMA into actual system RAM. But in some sense, a C64 with a 16 MiB REU is a 6502 with 16 MiB RAM.

But yeah, you want CPU addressable RAM with real bank switching. You couldn't really do 16 MiB, you wouldn't want to bank switch the entire 64 KiB memory space. The Commander X16 (a modern hobbyist 6502 computer) supports up to 2 MiB by having hardware capable of switching 256 banks into an 8 KiB window (2 MiB/256 banks = 8 KiB).

Let's say you design something with 32 KiB pages instead -- that seems kind of plausible, depending on what the system does -- you could then do 256*32 = 8 MiB and still have 32 KiB of non-paged memory space available. I think this looks like just about the maximum you would want to do without the code or hardware getting too hairy.


Depends entirely on what banking scheme you use. Nothing stops you from adding e.g. an 8-bit banking register (even two of them, one for instruction fetches, another one for normal memory reads/writes) to serve as bits 23–16 for the 24-bit memory bus. That's what WDC 65C816 from 1985 does, but it also goes full 16-bit mode as well.

And if you have a 16-bit CPU, you can do all kinds of silly stuff; for instance, you can have 4 16-bit MSRs, let's call them BANK0–BANK3, that would be selected by the two upper bits of a 16-bit address, and would provide top 16 bits for the bus, while the lower 14-bits would come from the original address. That already gives you 30 bits for 1 GiB of addressable physical memory (and having 4 banks available at the same time instead of just 2 is way more comfortable) and nothing stops you from adding yet another 4 16-bit registers BANK0_TOP–BANK3_TOP, to serve as even higher 16 bits of the total address — that'd give you 16+16+14 = 46 bit of physical address (64 TiB) which is only slightly less than what x64 used to give you for many years (48 bits, 256 TiB).


I was trying to get a grasp on what would be pratical.

Even 4MB would take you hours to load from floppies with a 6502.

Terabytes with a 68000 would also be impractical.


> Even 4MB would take you hours to load from floppies with a 6502.

Depends on your clock. Also, you could use some dedicated hardware, like a DMA controller e.g. 8257, or 8237. From 8257's datasheet:

    Speed

    The 8257 uses four clock cycles to transfer byte of
    data. No cycles are lost in the master to master transfer
    maximizing bus efficiency. 2MHz clock input will
    allow the 8257 to transfer at rate of 500K bytes/second.
and I recall 8237 could do even better, if wired and programmed properly.


Hard drivers were available for the 6502. They were expensive ($10k for a 10MB drive as I recall prices came down a lot, but never affordable in the 1980s)

Processing terabytes with a single CPU was impractical, but you could in theory connect it.


I know someone who - in the 1990s had 5MB connected to his Atari. He had two different expansions, and used all the memory for a RAM disk, as a result his BBS was the most responsive remote system I've ever used - including ssh to the server under my desk (open question, was it really or is this nostalgia?).


The SN74LS610 family chips were specifically designed to do bank-switching memory management like this for 8-bit micros.


> Imagine in steampunk fashion

See The 8-bit Guy regarding what the world would be like if we were still limited to vacuum tubes: https://www.youtube.com/watch?v=mEpnRM97ACQ (video)


I think people would just try to make larger chips. You can still do a 32 bit RISC even one the same node as 6502. Its just a bigger chip. Basically like ARM2 or something. Then you can do a multi-core version of that.


Laptops would be a lot less common. If computers were stuck in this era for that long, fewer people would be interested. Prices would be high.


Apple XXVgs and Amiga 15,000, I’m digging this alternative.


Please stop bickering about verilog vs vhdl - if you use NBAs the scheduler works exactly the same in modern day simulators. There is no crown jewel in vhdl anymore. Also type system is annoying. Its just in your way, not helping at all.


You're not wrong, but blocking assignments (and their equivalent in VHDL, variables), are useful as local variables to a process/always block. For instance to factor common sub-expressions and not repeat them. So using only non-blocking assignments everywhere would lead to more ugly code.


Ofc blocking assign is used too and even it that always_comb case scheduler splits eval/assign into 2 phases!


Draw yourself an SR latch and try simulating. Or a circuit what is known as „pulse generator“


Both SystemVerilog and VHDL have AMS extensions for simulating analog circuits. They work pretty well but you also pay a pretty penny for the simulator licenses for them.


Those are analog circuits, if you put them in your digital design you are doing something wrong.


dont know if trolling. SR latch you can do with 2 NANDs, or NORs there are plenty of *digital* circuits with that functionality, and yes, there are very rare cases when you construct this out of logic and not use a library cell for this. pulse circuit is AND(not(not(not(a))),a) also rarely used but used nonetheless. to properly model/simulate them you would need delta cycles


I'm not sure if you are trolling. 99.999% of digital design is "if rising edge clk new_state <= fn(old_state, input)", with an (a)sync reset. The language should make that the default and simple to do, and anything else out of the ordinary hard. Now it's more the other way around.


All circuits are analog when physically realized, the digital view is an abstraction.


New proc step : Cheese Vapor Deposition


I want that on my waffler


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: