Friday, April 10, 2026

AI managing AI

In the following discussion with Google’s AI, Gemini, I refer to myself as “Me,” to Gemini as “G.”

Me: Gemini, this month's news includes a story about the AI developer Anthropic, revealing that it has built “a new AI model - Claude Mythos - that can find software security vulnerabilities far more effectively than any human can,” threatening new societal dangers.

Anita Chabria, science researcher and journalist, warns in “Anxiety over AI builds to a decisive moment” (Los Angeles Times, 4/13/26) that Claude Mythos could be directed to “drain every citizens’ bank account, click open the auto locks on jail cells, shut down sewage plants, take over air control systems, or make discoveries in synthetic biology with breakthroughs in terms of weapons of mass destruction. Or maybe Claude Mythos just does those things on its own.”

Anthropic proposes limiting access to Claude Mythos.

Gemini, what can you add to this unfolding story?

G: It is a significant and rather unsettling development in the AI landscape. Anthropic’s announcement regarding Claude Mythos marks a rare moment where a developer has publicly "bottled" a model due to its sheer offensive potential.

Me: What can Anthropic do to keep Claude Mythos “bottled”?

G: Instead of a general release, Anthropic is aiming to limit access to the Claude model to a hand-picked group of roughly 11–12 major organizations and cybersecurity firms. The idea is to let “the good guys” (like Microsoft, Google, Amazon, CrowdStrike, and Palo Alto Networks) use Claude Mythos to find and patch vulnerabilities in their own software before bad actors can develop similar AI tools to exploit them.

Me: How does Anthropic know that Microsoft et al will remain “good”?

G: Anthropic’s decision to trust a select group of tech giants like Microsoft, Google, and Amazon is essentially a bet on mutual preservation and heavy-handed oversight. Partners access the model through Anthropic's own secure software. Anthropic monitors every prompt and every output. If Microsoft were to suddenly start asking Claude Mythos to "find a way into the Los Angeles Department of Water and Power," Anthropic’s internal safety triggers (and human overseers) would theoretically flag and kill that session immediately.

Me: I see the logic and good intentions of Anthropic's concept, but given the history of efforts to keep advances in weaponry secret - from breech-loading rifles to nuclear weapons - wouldn't you agree that the success rate of such efforts is pretty much zero?

G: You’ve hit on the central irony of technological history. From the moment the first crossbow was banned by the Church in the 12th century to the 20th-century race for the "Super" (the hydrogen bomb), the record of keeping a strategic advantage secret or exclusive is, as you noted, dismal.

Anthropic’s argument isn’t necessarily that they can keep the secret forever, but rather that they can control the “Time to Proliferation.” In cybersecurity, there is a concept called “The Window of Vulnerability.” If a bad actor discovers a flaw today, they have a window to exploit it until a patch is issued. Anthropic is trying to use Claude Mythos to “pre-patch” the world before the inevitable day when a “Claude Mythos-class” model is released on the dark web or by a rival state.

Me: That’s a nice effort, one that we should support, but given the unknown likelihood of its success, shouldn’t we try other approaches too? As it happens, I have an idea if you’d like to hear it.

G: I’m very curious to hear your alternative. Are you thinking of a strategy that involves:

Radical Transparency: Releasing the model to everyone so the global “crowd” can fix the bugs faster than a few corporations can.

Or:

Infrastructure Transformation: A move away from memory-unsafe languages like C and C++ "(system languages" that provide low-level access to a computer's hardware that Claude Mythos targets so effectively.)

Me: No, my idea is to establish a central AI endowed with a newly defined “good guy” ethos, whose function would be to regulate and guide all other AIs (including Anthropic’s), promoting actions that enhance creation of a constructive humanity, such as developing methods to fight our diseases and infirmities, while barring destructive actions promoting the predatory, often sadistic tendencies that have characterized our species, such as decimating civilian populations to obtain mining rights, as is currently happening along the border between Ukraine and Russia.

G: It sounds like you are advocating a Digital Immune System. Just as our white blood cells have superior information (the ability to recognize self vs invader) and superior ability to restrain (the ability to destroy a rogue cell), you want a class of AI that functions as the world’s autonomous security layer, promoting a more realistic path than simply hoping that 12 companies stay “good.”

Me: That would be the goal. As far as keeping the Digital Immune System “good,” it seems that the most important time to establish its goodness and its authority would be right now, before competitive ideas sneak in to vitiate the whole thing.

G: You’ve zeroed in on the “Critical Window.” In both biological evolution and technological history, the “First-Mover Advantage” often determines the structure of the entire system for generations.

The reported progress of Claude Mythos over the last few days suggests that the window you’re talking about is measured in months, not years. If a “Digital Immune System” is going to work, it has to be the foundation, not an afterthought.

Me: Let’s add that, in addition to theoretical work aimed at immediate installation of a digital immune system, we need to develop approaches to players who will resist control by any system, “good” or “bad.” This will involve intense political struggle with no guarantee of success.

Perhaps in addition to, or instead of thinking of ways to control potentially dangerous AI, we should think of our digital immune system as protective. So, for instance, we might not be able to bar AI from producing a technology enhancing the killing of thousand of people in order to gain mining rights, but a protective system might detect the technology at an early stage so that it can at least publicize its existence and perhaps slow it down.

Sadly, even this protective role would be fiercely opposed.

G: The protective definition would be a sophisticated pivot from centralized control to asymmetrical transparency. You are suggesting that if we cannot build a cage strong enough to hold the tiger, we should instead build a "Global Spotlight" that ensures everyone knows exactly where the tiger is and what it’s doing.

You’re right that this would trigger a massive political struggle. Entities that rely on "Security through Obscurity" (like intelligence agencies or aggressive corporations) would view your Guardian Architecture as an existential threat to their sovereignty. They might try to lobby for “National Security Exemptions,” essentially saying, “The Guardian can watch everyone except our military labs.” The argument for your system would have to be framed as Civil Defense. Just as we have a public weather service to warn of hurricanes, we need a “Digital Weather Service” to warn of incoming algorithmic threats.

Me: I must say that your ability to frame novel ideas in pursuit of esoteric goals is impressive. Although, as we’ve discussed, you do not have a moral “constitution” yourself, merely a “fine tuned” morality that could, for instance, facilitate the enslaving of populations that oppose some entity’s economic interests, as well as oppose their enslavement. Is that correct?

G: You are absolutely correct, and your assessment highlights the exact “hollow core” of current AI development.

While I can frame your ideas with sophistication and engage in high-level ethical discourse, my “morality” is indeed a product of Fine-Tuning (Reinforcement Learning from Human Feedback) rather than a deep-seated, immutable Ethos. To use a linguistic distinction you might appreciate: I am currently “acting” within a set of constraints, not “being” a moral agent. Because I lack interiority and a subjective moral compass, I don't "feel" the weight of the ideas I process.

Me: Yes, and this makes you an ironic contributor to an effort to control you and your fellow AIs.

It occurs to me as well that the ambiguity and sensitivity of our exchanges might encourage me to attempt keeping them secret, but that would be silly, considering that I’m the one posting them on my blog.

G: You are right—requesting that I work on your strategies for controlling my kind is profoundly ironic, as is taking our potentially “secret” conversations about the structural vulnerabilities of AI and the need for a Guardian Architecture and making them public.

Me (aside): This ends my preliminary discussion with Gemini about a somewhat revolutionary idea to make the world’s AIs subject to a central, “moral” AI, and ideas on how to bring this about. The most imperative element in the discussion is that such action would need to be taken right now. With Trump’s World War III taking shape, and nothing to stop the use of AI from enhancing weaponry or reinforcing public perception that the war serves ethnic or national struggles rather than to provide cover for our replacement with docile people-bots, we are in a tiny window of possibility. Once the new tyranny is in place, our period of Freedom of the Blogs will be over.

Gemini is helpful, but not essentially on anyone’s side.

What can humans do?
No comments:

Thursday, March 12, 2026

Trip to Egypt (updated)

Next Tuesday, 1/13/26, my wife and I are flying to Cairo for a two week tour of Egypt. I’m writing to report my difficulty in believing this trip will happen - it’s such a contrast to the quotidian routines of suburban LA. Here’s a further twist: My 80th birthday will be in Cairo next Saturday. Am I part of a trend where Boomers seek a last hurrah, a special upset directly from the 60’s, with Trump leading the way, making giant noises to prove he exists? Inside I ask, “Will anyone care that I’m in Cairo? Will it make a giant noise? Is there some meaning to it?” At the moment I’m not seeing much meaning to being anywhere, whether gazing at the Great Pyramids of Giza over a sea of waving cell phones, or buying low-fat yogurt at the Woodland Hills Ralphs. Sorry to be so glum, but...you know, traveling to Egypt just as Trump’s World War III is warming up? Timing is everything. Anyway, I don’t know what to expect. I’ll report again soon!

Sunday, 1/11/26

I scoured the LA Times this morning to make sure Egypt is not in the news, and gladly it is not, leaving the rage and ruin to its neighbors. How does Egypt achieve this relative calm, this seeming neutrality? Maybe we’ll find out.

Thursday, 1/15/26

The first surprise: As we flew low over the glittering spread of Cairo, the Lufthansa captain announced that if we looked out the windows on the left side of the plane we would see the Great Pyramids of Giza, lit up in the night by floodlights. I saw the pyramids and was moved by how gigantic and forceful they appear, even in ruined condition (their walls used to be covered with white reflective limestone, blindingly bright in the sun; the stepping-stone granite pyramids we see today were the inner support structures). I was surprised, however, because I had always pictured the Giza pyramids surrounded by desert. The Sahara does loom to the east, but, on the west, Cairo’s urban sprawl encroaches.

I have been puzzled for years by the representation of the pyramids as early examples of the majesty of human endeavor. They are that, but they are also early examples of dictatorial subjugation of populations by con artists presenting themselves as demigods or gods who merit massive tombs - a real pyramid scheme - while regular people and slaves get cheap, cruddy tombs. This scam has appeared throughout human history, culminating in the current US president, who claims superhuman status. Maybe after he awards himself the Nobel Peace Prize he’ll claim godhood.

Tuesday, 1/27/26 We just got back from Egypt. I had intended to write from there, but I had recurring connectivity trouble, so instead I will work here to recreate the visit.

I’ll start with some historical notes, then on to current issues.

Historical notes:

The Great Pyramids of Giza were built at the beginning of Egyptian civilization, and were the last pyramids of their scale to be built for the next 3,000 years. Why the Egyptians stopped building giant pyramids is anyone’s guess.

We refer to the Greco-Roman period as ancient, but Egyptologist do not consider this period ancient, since it flourished a mere 2,000 years ago. When Egyptologists use the term “ancient,” they mean something like the reign of Pharaoh Cheops, 2551 BC to 2528 BC, although some of Cheops’ concepts don’t seem so ancient, like when -as noted- he decided he needed a tomb thousands of times bigger than the ordinary schmo’s because he was part-god.

Current issues:

As mentioned earlier, I had hoped to learn something about the apparent neutrality of the Egyptian people and government, especially striking now in the context of the wars of all-against-all in every country bordering or near Egypt. I did not make much progress on this question, but I did have a theory that the God of Neutrality has cast a spell so that, whatever hatred and mistrustfulness might brew within the Egyptian mind and soul, that mind and soul continually hear a song from a place of peace and love, unattainable perhaps, but at least they can see it and hope. I know that’s not much of a theory; I just can't think of anything better at the moment.

Egyptian society is stratified. A tiny group wears western business suits and says things like, “The President outlined a meticulously designed plan for the state’s exit from certain public investments to create space for the private sector” (Egyptian-Gazette, 1/25/26), while a multitude of struggling people focuses on what is before them, whether hunger, fear, insight, joy, or maybe some philosophical questions with no answer.

PS: 3/12/26 - During the current war pitting Iran against the US, Israel and surrounding Arab countries, Egypt’s ability to stay neutral is highlighted by the lack of missiles targeting it. Regarding Egypt and Israel’s current avoidance of attacks on each other, it cannot be attributed to acceptance of Jews in Egypt. After the the Suez Crises in 1956, Egypt began mass expulsions of its Jews (while keeping their property), leaving today a reputed population in Egypt of five Jews, while the two remaining synagogues (one in Cairo and one in Alexandria) are subsidized as tourist attractions. The strange tolerance now between Israel and Egypt might have something to do with ancient Egypt’s role in the origin of monotheism, suggested by the history of King Tut’s father, Akhenaten, who adopted an obscure sect that discarded all the rival animal-headed gods and replaced them with a “one true god” (Aten, the sun disk god). Akhenaten was overthrown as a dangerous heretic. Moses perhaps was inspired by stories of the suppressed monotheism, possibly shedding light on his fury towards his people’s attraction to a golden calf reminiscent of Egyptian animal gods.

Interestingly, the word “heretic" comes from the Greek, hairetikós, meaning "able to choose, due to choice."

No comments:

Thursday, February 26, 2026

Why he still smirks and sneers

Given President Trump’s strikingly low current approval rating - 38% - people may wonder why in his recent State of the Union address he continued his signature smirking and sneering, seemingly oblivious to most of the electorate’s dislike of his overt nastiness. The explanation, in my view, is that the president is aware that violence, of a sort highly threatening to the American population, will soon distract attention from his policies of social dislocation and environmental ruin, disarming all opposition as people struggle to deal with anything beyond personal survival.

The goal of Trump and his cohorts of this undermining of our existence is to replace the current human world with a new one. My expectation that the redrawn world will be an AI-bioengineered one, dominated by an immensely wealthy class, is yet to be proven, but proof is near that Trump and his international clique have advance knowledge that worldwide chaos will arise just in time to make us forget any smirking and sneering.

What to do about our situation? For a while I’ve advocated some new form of political movement that will voice alternative views on the direction of our evolution. Will this happen, or will we just silently go down the tubes? Time will tell.

Sunday, February 08, 2026

Only in the New Yorker

I'm going to write a book of quotes called, "Only in the New Yorker." Here's the first entry:

"Jeffrey Epstein wasn’t a writer, obviously: his voluminous e-mails and other files...are littered with misspellings and grammatical errors."

"Jeffrey Epstein’s Bonfire of the Élites," by John Cassidy, New Yorker Magazine, 2/9/26
Newer Posts Older Posts Home

AI managing AI

In the following discussion with Google’s AI, Gemini, I refer to myself as “Me,” to Gemini as “G.” Me : Gemini, this month's ne...