A Wyld Ride 🛸🏴‍☠️

#imaginarium #blog #AI #AIart #AIAgent #coding #vibe-coding #carpe_diem #CreativityUnleashed

It's Been Wyrd

magejosh

"It's dangerous to go alone, here take this.
It'll help."

Item

It has been a wyld ride, with a y to show how weird and cool it's been. Only a month ago, I signed up for windsurf IDE as a trial run with vibe coding using it to build a browser based game for a game jam that sounded fun. Thanks Pietr for the inspiration to have fun.

I learned a lot, and had even more fun along the way.
I learned how to spot a loop while vibe coding and how to surf/steer the agent to break the tide and crest the wave to keep paddling out and catch the larger waves of progress.
I learned how to manage my own VPS, deploy my own apps, and network them with reverse proxy and SSL. Mind you I've done each of these before, but I have clearly leveled up my skills in this regard at this point.
I learned how to build and use webhooks, which is something I've been putting off for no good reason for a couple years now. Took less than an hour to build and implement my first one. Now I have multiple in active deployment.

And every frontier AI lab has made major releases in this last month. I can't even begin to explain how awesome that has been for some things I've been working on. What I can say, is that the same loop patterns I had issues with in windsurf before deciding it wasn't the tool for me are eerily similar to the patterns of mistakes i see in OpenAI's gpt-o4-mini model.

Despite the higher context limit, it still gets lost on the plot in the process of a large task. I think we need smarter context management if these higher context window limits aren't really paying off in improved efficiency of compute/result.

Which brings me back to @VictorTaelin post on x.com about their project NeoGen. They mused about how they feel like the Frontier labs will achieve AGI before they can complete their project and mentioned an idea of releasing a LLM model that uses NeoGen under the hood like a tool call.
I suggested they should use the bitnet 1.58bit architecture to train said model to help take advantage of the incredible speed/efficiency of NeoGen. https://x.com/ErroneousGaes/status/1913265681300082952
Imagine a model like this being small enough to function as both agent/tool in a self-contained app/process that can live on edge devices and be used by the "Smarter" agents/systems as repair and engineering teams for user devices/networks. You wrap the bitnet/NeoGen model in an agent wrapper that includes MCP and A2A protocols support. Put the whole thing in a cloneable container format so that a Smart/Orchestrator agent can call on as many of these bitnet/NeoGen agents as it needs to achieve optimal results for its goals.

Because if anything has been learned in the last 4 years of AI becoming more and more useful, is that everything is layers upon layers.
Lots of people pick an architecture they think is going to produce AGI. They form into camps chanting for their chosen model type, whether that is Transformers, Diffusion, etc.
This is the same pattern humans follow for all things in society and life. Pick a favorite, build an identity around that favorite, fight to the death to preserve the idea their favorite is best because they are best.

I believe we will find that AGI comes from properly organizing all these different cognitive architectures into the right layers of a larger dynamic-interactive system.

I imagine using RNNs for components of memory layers alongside various RAG and CAG systems to simulate Conscious and Subconscious Memory circuits/processes. The ability to train them repeatedly as they are used effectively by hot-swapping versions of the same model makes them ideal for this type of use. Where as diffusion-transformers are ideal for video generation, which is ultimately a form of self-image/imagination circuit/process.
What will ramp AGI type systems beyond human ability is coming from the robotics market I believe as they add more and more non-human sensory data to their training datasets.
Increasing dimensions of data-modality is the best way to continue to scale up the available training data as we reach for more intelligent AI systems. Combining various sensors like used in robots provides a wealth of new dimensions beyond the human's basic 5 physical sense we currently imitate alongside language and image/audio/video generation.
Our tendency is a bias towards a single modality or two at most. When we ourselves are built to think and feel using far more than just the physical 5 we classically label all experiences outside ourselves within. Why wouldn't an AI benefit from the same? They would, I think the challenge at the moment is learning how to bridge them all together working as one whole to recursively self-improve. But I also think we are at the point where that is a challenge we can solve in fairly short order if research goes in the right directions.

Who knows where we'll be in 6 months from today?