Tuesday, July 30, 2024

Center Of Mass (Or How To Think Strategically About Generative AI)

It may seem like generative AI is moving too fast right now for cogent strategic thinking.  At the edges of it, that is probably right.  Those "up in the high country," as Lloyd Bridges might put it (see clip below), are dealing with incalculably difficult technical and ethical challenges and opportunities as each new version of Claude, ChatGPT, Gemini, Llama, or other foundational large language model tries to outperform yesterday's release.

 

That said, while all this churn and hype is very real at the margins, I have seen a fairly stable center start to emerge since November, 2022 when ChatGPT first released.  What do I mean, then, by "a fairly stable center?

For the last 20 months, my students, colleagues, and I have been using a wide variety of generative AI models on all sorts of problems.  Much of this effort has been exploratory, designed to test these tools against realistic, if not real, problems.  Some of it has been real, though--double-checked and verified--but real products for real people.  

It has never been standalone however. No one in the center of mass is ready or comfortable completely turning over anything but scut work to the AIs.  In short, anyone who uses a commercially available AI on a regular basis to do regular work rapidly comes to see them as useful assistants, unable to do most work unsupervised, but of enormous benefit otherwise. 

What else have I learned over the last 20 months? 

As I look at much of what I have written recently, it has almost all been about generative AI and how to think about it.  My target audience has always been regular people looking for an edge in doing regular work--the center of mass.  My goal has been to find the universals--the things that I think are common to a "normal" experience with generative AI.  I don't want to trivialize the legitimate concerns about what generative AIs might be able to do in the future, nor to suggest I have some sort of deep technical insights into how it all works or how to make it better.  I do want to understand, at scale, what it might be good for today and how best to think about it strategically.

My sources of information include my own day-to-day experience of the grind with and without generative AI.  I can supplement that with the experiences of dozens of students and my faculty colleagues (as well as with what little research is currently available).  All together, we think we have learned a lot of "big picture" lessons.  Seven to be exact:
  1. Generative AI is neither a savior nor Satan.  Most people start out in one of these two camps.  The more you play around with generative AIs, the more you realize that both points of view are wrong and that the truth is more nuanced.
  2. Generative AI is so fast it fools you into thinking it is better than it is.  Generative AI is blindingly fast.  A study done last year using writing tasks for midlevel professionals found that participants were 40% faster at completing the task when they used the then current version of ChatGPT.  Once they got past the awe they felt at the speed of the response, most of my students, however, said the quality of the output was little better than average.  The same study mentioned earlier found similar results.  The speed improved 40% but the average quality of the writing only improved 18%.
  3. Generative AI is better at form than content.  Content is what you want to say and form is how you want to say it.  Form can be vastly more important than content if the goal is too communicate effectively.  You'd probably explain Keynesian economics to middle-schoolers differently than you would to PHD candidates, for example.  Generative AI generally excels at re-packaging content from one form to another.  
  4. Generative AI works best if you already know your stuff.  Generative AI is pretty good and it is getting better fast.  But it does make mistakes.  Sometimes it is just plain wrong and sometimes it makes stuff up.  If you know your discipline already, most of these errors are easy to spot and correct.  If you don't know your discipline already, then you are swimming at your own risk.
  5. Good questions are becoming more valuable than good answers.  In terms of absolute costs to an individual user, generative AI is pretty cheap and the cost of a good or good enough answer is plummeting as a result.  This, in turn, implies that the value of good question is going up.  Figuring out how to ask better questions at scale is one largely unexplored way to get a lot more out of a generative AI investment.
  6. Yesterday's philosophy is tomorrow's AI safeguard.  AI is good at some ethical issues, lousy at others (and is a terrible forecaster).  A broad understanding of a couple thousand years of philosophical thinking about right and wrong can actually help you navigate these waters.
  7. There is a difference between intelligence and wisdom.  There is a growing body of researchers who are looking beyond the current fascination with artificial intelligence and towards what some of them are calling "artificial wisdom."  This difference--between intelligence and wisdom--is a useful distinction that captures much of the strategic unease with current generative AIs in a single word.
These "universals" have all held up pretty well since I first started formulating them a little over a year ago.  While I am certain they will change over time and that I might not be able to attest to any of them this time next year, right now they represent useful starting points for a wide variety of strategic thought exercises about generative AIs.

Monday, July 8, 2024

How Good AIs Make Tough Choices

Rushworth Kidder, the ethicist, died 12 years ago. I never met him, but his book "How Good People Make Tough Choices" left a mark. It was required reading in many of my classes, and I still think it is the best book available on the application of philosophy to the moral problems of today.  

Why?  For a start, it is well-organized and easy to read.  Most importantly, though, it doesn't get lost in the back-and-forth that plague some philosophical discussions.  Instead, it tries to provide a modicum of useful structure to help normal people make hard decisions.  In the tradition of some of the earliest philosophers, it is about the application of philosophical thinking to everyday life, not about abstract theorizing.

Don't get me wrong.  I am not against abstract theorizing.  I'm a futurist.  Speculation masquerading as analysis is what I do for a living, after all.  It is just, at some point, we are all faced with tough decisions and we can either let the wisdom of hundreds of philosophers over thousands of years inform that thinking or we can go on instinct.  William Irvine put the consequences even more directly: 

"Why is it important to have such a philosophy? Because without one, there is a danger that you will mislive—that despite all your activity, despite all the pleasant diversions you might have enjoyed while alive, you will end up living a bad life. There is, in other words, a danger that when you are on your deathbed, you will look back and realize that you wasted your one chance at living."

One of the most common questions I get asked these days sits at the intersection of these "tough choices" Kidder was talking about and artificial intelligence.  There is a lot of (justifiable) hand-wringing over the questions of what can we, should we, turn over to AIs on the one hand, and what are the consequences of not turning over enough to the AIs on the other.

For me, these questions begin with another:  What can AIs do already?  In other words, where can AIs clearly outperform humans today?  Fortunately, Stanford collates exactly these kinds of results in an annual AI index (Note:  They don't just collate them, they also put them in plain english with clear charts--well done Stanford!).  The results are summarized in the table below:

Items in dark red are where AIs have already surpassed humans.  The light red is where there is evidence that AIs will surpass humans soon.  This table was put together with help from Claude 3, the AI I think does the best job of reading papers.  I spot checked a number of the results and they were accurate but your mileage may vary.  The estimated time to surpass humans is all Claude, but the time frames seem reasonable to me as well.  If you want the full details, you should check out the Stanford AI Index, which you should do even if you don't want the full details.

The most interesting row (for this post, at least) is the "Moral Reasoning" row.  Here there is a new benchmark, the MoCa benchmark for moral reasoning.  The index highlighted the emergence of harder benchmarks over the last year, stating, "AI models have reached performance saturation on established benchmarks such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones."  In other words, AIs were getting so good, so fast that researchers had to come up with a whole slew of new tests for them to take, including the MoCa benchmark.

MoCa is a clever little benchmark that uses moral and causal challenges from existing cognitive science papers where humans tended to agree on factors and outcomes.  The authors of the paper then present these same challenges to a wide variety of AIs and score the AIs based on something called "discrete agreement" with human judges.  Discrete agreement appears, by the way, to be the scientific name for just plain "agreement"--go figure.  The chart below is from the AI Index not the original paper but summarizes the results:

From the Stanford AI Index.  Scores are from 0-100 with higher scores equaling higher agreement with human judgement.  

If you are scoring things at home, this chart makes AIs look pretty good until you realize that the y axis doesn't include the full range of possible values (A little data-viz sleight of hand there...).  This sort of professorial nit-picking might not matter, though.  This was a study published in late 2023 and there is already a 2024 study out of the University of North Carolina and the Allen Institute that shows significant improvement--albeit on a different benchmark and with a new LLM.  Specifically, the researchers found, "that advice from GPT-4o is rated as more moral, trustworthy, thoughtful, and correct than that of the popular The New York Times advice column, The Ethicist."  See the full chart from the paper below:

Taken from "Large Language Models as Moral Experts? GPT-4o Outperforms Expert Ethicist in Providing Moral Guidance" in pre-print here:  https://europepmc.org/article/PPR/PPR859558 

While these results suggest improvement as models get larger and more sophisticated, I don't think I would be ready to turn over moral authority for the kinds of complex, time-sensitive, and often deadly decisions that military professionals routinely have to make to the AIs anytime soon.

OK.  

Stop reading now.

Take a breath.

(I am trying to keep you from jumping to a conclusion.)  

As you read the paragraph above (the one that begins, "While these results..."), you probably thought one of two things.  Some of you may have thought, "Yeah, the AIs aren't ready now, but they will be and soon.  It's inevitable."  Others of you may have thought, "Never.  It will never happen.  AIs simply cannot replace humans for these kinds of complex moral decisions."  Both positions have good arguments in favor of them.  Both positions also suffer from some major weaknesses.  In classic Kidder-ian fashion, I want to offer you a third way--a more nuanced way--out of this dilemma.

Kidder called this "third way forward, a middle ground between two seemingly implacable alternatives" a trilemma.  He felt that taking the time to try to re-frame problems as trilemmas was an enormously useful way to help solve them. It was about stepping back long enough to imagine a new way forward.  The role of the process, he said, "is not always to determine which of two courses to take. It is sometimes to let the mind work long enough to uncover a third."

What is this third way? Once again, Kidder comes in handy.  He outlined three broad approaches to moral questions:

  • Rules-based thinking (e.g. Kant and the deontologists, etc.)
  • Ends-based thinking (e.g. Bentham and the utilitarians, etc.)
  • Care-based thinking (e.g. The Golden Rule and virtually every religion in the world)
Each of these ways of looking at moral dilemmas intersect with AIs and humans in different ways.

AI is already extremely good at rules-based thinking, for example.  We see this in instances as trivial as programs that play Chess and Go, and we see it in military systems as sophisticated as Patriot and Phalanx.  If we can define a comprehensive rule set (a big “if”) that reliably generates fair and good outcomes, then machines likely can and should be allowed to operate independently.

Ends-based thinking, on the other hand, requires machines to be able to reliably forecast outcomes derived from actions, including second, third, fourth, etc. order consequences.  Complexity Theory (specifically the concept of sensitive dependence on initial conditions) suggests that perfect forecasting is a mathematical impossibility, at least in complex scenarios.  Beyond the math, practical experience indicates that perfection in forecasting is an unrealistic standard.  All this, in turn, suggests that the standard for a machine cannot be perfection.  Rather, it should be “Can it do the job better than a human?”

The “Can the machine do the job better than a human?” question is actually composed of at least three different sub-questions:
  • Can the machine do the job better than all humans?  An appropriate standard for zero-defect environments.
  • Can the machine do the job better than the best humans?  An appropriate standard for environments where there is irreducible uncertainty.
  • Can the machine do the job better than most humans?  A standard that is appropriate where solutions need to be implemented at scale.
If "the job" we are talking about is forecasting, in turns out that the answer, currently, is: Not so much. Philipp Schoenegger, from the London School of Economics, and Peter Park from MIT recently posted a paper to ArXiv where they showed the results of entering GPT-4 into a series of forecasting challenges on Metaculus. For those unfamiliar with Metaculus, it is a public prediction market that looks to crowdsource answers to questions such as Will the People's Republic of China control at least half of Taiwan before 2050? or Will there be Human-machine intelligence parity before 2040?

The results of the study? Here, I'll let them tell you:
"Our findings from entering GPT-4 into a real-world forecasting tournament on the Metaculus platform suggest that even this state-of-the-art LLM has unimpressive forecasting capabilities. Despite being prompted with established superforecasting techniques and best-practice prompting approaches, GPT-4 was heavily outperformed by the forecasts of the human crowd, and did not even outperform a no-information baseline of predicting 50% on every question."

Ouch.

Ends-based thinking is very much a part of most military decisions. If AIs don't forecast well and ends-based thinking requires good forecasting skills, then it might be tempting to write AIs off, at least for now. The trilemma approach helps us out in this situation as well, however. There are powerful stories of hybrid human/machine teams accomplishing more than machines or humans alone that are starting to appear. As more and more of these stories accumulate, it should be possible to detect the "golden threads," the key factors that allow the human and machine to optimally integrate.

Finally, Kidder defined care-based thinking as “putting love for others first.”  It is here that machines are at their weakest against humans.  There are no benchmarks (yet) for concepts such as “care” and “love.”  Furthermore, no one seems to expect these kinds of true feelings from an AI anytime soon.  Likewise, care-based thinking requires a deep and intuitive understanding of the multitude of networks in which all humans find themselves embedded.  

While the machines have no true ability to demonstrate love or compassion, they can simulate these emotions quite readily.  Whether it is because of anthropomorphic bias, the loneliness epidemic, or other factors, humans can and do fall in love with AIs regularly.  This tendency turns the AIs' weakness into a strength in the hands of a bad faith actor.  AIs optimized to elicit sensitive information from unsuspecting people are likely already available or will be soon.

Beyond the three ways of thinking about moral problems, Kidder went on to define four scenarios that are particularly difficult for humans and are likely to be equally challenging for AIs. Kidder refers to these as “right vs right” scenarios, “genuine dilemmas precisely because each side is firmly rooted in one of our basic, core values.” They include:
  • Truth vs. loyalty
  • Individual vs. community
  • Short-term vs. long term
  • Justice v. mercy
Resolving these kinds of dilemmas involves more than just intelligence. These kinds of problems seem to require a different characteristic–wisdom–and wisdom, like intelligence can, theoretically at least, be artificial.

Artificial Wisdom is a relatively new field (almost 75% of the articles in Google Scholar that mention Artificial Wisdom have been written since 2020). The impetus behind this research seems to be a genuine concern that intelligence is not sufficient for the challenges that face humanity. As Jeste, et al. put it, “The term “intelligence” does not best represent the technological needs of advancing society, because it is “wisdom”, rather than intelligence, that is associated with greater well-being, happiness, health, and perhaps even longevity of the individual and the society.”

I have written about artificial wisdom elsewhere and I still think it is a useful way to think about the problem of morality and AIs. For leaders, "wisdom" is a useful shorthand for communicating many of the concerns they have about turning operations, particularly strategic operations, over to AIs. I think it is equally useful for software developers, however. Wisdom, conceptually, is very different from intelligence but no less desirable. Using the deep literature about wisdom to help reframe problems will likely lead to novel and useful solutions.

Monday, February 5, 2024

The Battle of Moore's Chasm And Who Will Win The Next War

There is a battle going on right now.  It is being fought by every military in the world.  

Victory in this battle is crucial.  The militaries' on the winning side will likely be on the winning side of the next large-scale war.  The losers will likely be forgotten, studied only for the mistakes they made.

This is the Battle of Moore's Chasm.

This battle is taking place everywhere.  There are physical manifestations of it in Ukraine, the Taiwan Strait, and Gaza, but there are equally important conceptual and theoretical manifestations of it in the Pentagon, on Arbatskaya Square in Moscow, and deep inside the August 1 Building in Beijing.

What this battle is about and how to win it are the subjects of this article.

What Is The Battle Of Moore's Chasm?

To understand this battle it is necessary, at first, to travel back to 1962.  It was then that a young professor of rural sociology, Everett Rogers, published what was to become the second most cited book in all the social sciences, Diffusion of Innovations 

While the book contains much that is still relevant today, the part that is important to the current battle is the idea that the "market" for an idea, an innovation, a new concept, or a technology generally follows a bell curve and that this bell curve can be divided into five major sections of users (See chart below):  Innovators, Early Adopters, Early Majority, Late Majority and Laggards. 

Source:  https://en.wikipedia.org/wiki/Diffusion_of_innovations

Fast forward to 1989, when two researchers at the famous consulting firm, Regis McKenna, Inc. (RMI), Warren Schirtzinger and James Lee, hypothesized and then demonstrated that there was a "chasm" between the early adopters and the early majority.  

This chasm existed largely due to the different motivations of the members of these groups.  Innovators and Early Adopters are very much into cool, new things.  They tend to be more enamored with the potential of a new technology or process than they are with the utility or scalability of these products.  Early and Late Majority motivations, on the other hand, typically have more to do with solving particular problems and doing so at the lowest cost and at a scale that is appropriate for their organization.

Another researcher at RMI, Geoffrey Moore, picked up on the idea and, in 1991, published what was to become one of the most influential business books ever, Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers. Now in its third edition, it has sold over a million copies and is considered by Inc. magazine as one of the top ten marketing books ever written. Think Insights has a good article that lays out the main ideas in detail, but for our purposes, their chart showing the chasm is sufficient:

Think Insights (January 3, 2024) Crossing The Chasm – Technology Adoption Lifecycle. Retrieved from https://thinkinsights.net/strategy/crossing-the-chasm/.

Most importantly, Moore's Chasm has become synonymous with the place where good ideas go to die.  Whether it is a lack of capital, innovator inexperience, or an inability to get traction in the much more lucrative Early and Late Majority markets, failure to bridge the chasm leads, at best, to relegation to a niche market and, at worst, to inevitable decline and bankruptcy. 

While almost all of these ideas and the literature accompanying the chasm have come out of business journals, it has a direct and immediate correlation with issues faced by militaries around the world.  Indeed, Secretary of the Army, Christine Wormuth recently said:

“This is a crucial moment for the Army to summon our ingenuity, to innovate and invest in emerging technologies, to test and develop in uncharted areas like artificial intelligence and contested domains like space and cyber, to reshape and transform the force to be more adaptable and flexible.”

Yet, across the globe, the difference between how much militaries want to innovate and how much they are actually innovating seems to be heading in the wrong direction.  As the Boston Consulting Group highlighted in its report last year on the defense innovation readiness gap:

"One of (the report's) most important findings is that the defense innovation readiness gap significantly increased in the year since our first study. Across 10 of the 11 dimensions of readiness assessed, MoDs failed to match their 2021 results, by an average of 8%."

Moreover, there is some evidence to suggest that this chasm exists within the US Department of Defense as well.  A recent report by the Center for Security and Emerging Technology put it this way:

"However, under the DOD’s current organizational structure, defense innovation is disconnected from defense procurement. This division limits innovation offices’ ability to affect technological change across the military and excuses procurement offices from integrating cutting-edge capabilities into major systems and platforms." (Italics mine)

The Battle of Moore's Chasm is real, and right now, no one is winning.

Who Will Fight This Battle?

While there are a number of possible ways to win the battle (the CSET paper, for example, references three), all of these courses of action require the right people to implement them.  Acquisition officers, policy wonks, commanders, and others all do and will have their role to play.  The most important warrior in this battle, however, is the innovation champion.

Developed about the same time as Diffusion of Innovations Theory, the idea of an innovation champion was first put forward by Dr. Donald Schön in the Harvard Business Review article, "Champions for radical new inventions."  Since then, thousands of articles (Google Scholar says about 2140) have been written about the role, traits, and importance of innovation champions in driving modernization and incorporating emerging technologies across a wide variety of fields.  

All of the more modern definitions of innovation champion are similar to the one developed by researchers at the German Graduate School of Management and Law:  "an innovation champion is an individual or a group of individuals who is willing to take risks to enthusiastically promote innovations through the various stages of the development process."


This same paper identified five skills, seven traits, and three different kinds of knowledge that were characteristic of innovation champions based on a systematic literature analysis looking at 85 of the most influential journal articles on the topic (See image to the left).

The approach here is similar to the approach taken by the US Army in teaching leadership.  With leadership, the Army focuses on Attributes (roughly equivalent to Traits in the chart to the left) and Competencies (roughly equivalent to Skills and Knowledge in the chart).  A fundamental premise of Army leadership training is that "most people have leadership potential and can learn to be effective leaders."  The same could be said, perhaps, for innovation champions.

While the approach is similar, there is not a one-to-one correlation between what the Army thinks makes a good leader and what is necessary for an innovation champion (See chart below and to the right).

Source:  ADP 6-22, ARMY LEADERSHIP
AND THE PROFESSION, 2019

 
In short, while routine Army leadership training likely covers many of the attributes of an innovation champion, it is equally likely that there are several gaps that will need to be filled if the Army is to have the warriors it needs for the ongoing battle.

Specifically, having the minimal technical knowledge necessary to champion particular innovations jumps out as one such requirement.  Many soldiers are so deeply involved in the day-to-day activities of running the Army or fighting in the country's conflicts, that they have little time for understanding arcane emerging technologies such as 3D printing, quantum computing, synthetic biology, 6 and 7G telecommunications systems, augmented reality, and others. Yet decisions, potentially costing billions of dollars, regarding the development, testing and fielding of these technologies will need to be made regularly and soon if the US Army's technical advantage is to remain.

Likewise, would-be innovation champions will need to learn the transformational leadership skills necessary to manage teams of experts from disparate fields.  Most military officers have grown up in an environment similar to Machiavelli's Kingdom of the Turk, which "is governed by one lord, the others are his servants; and, dividing his kingdom into sanjaks, he sends there different administrators, and shifts and changes them as he chooses."  

This hierarchical organization with its emphasis on commanders and their intent suddenly gives way when confronted by interdisciplinary teams of experts and contractors in the diverse technical fields common to innovation activities.  Here the comfortable chain of command often is replaced with something akin to Machiavelli's Kingdom of the Franks, where officers find themselves "placed in the midst of an ancient body of lords, acknowledged by their own subjects, and beloved by them; they have their own prerogatives, nor can the king take these away except at his peril."  Leading innovation activities, in short, requires different skills than leading at the tactical and operational levels.

Where Will These Champions Come From?

Some of these Skills and Knowledge categories also typically require a certain level of experience.  For example, all officers understand their organization to a certain extent, but it takes a relatively senior officer to have a feel for the entire enterprise.  Likewise, officers, as they move from one assignment to another, develop useful networks, but the kind of depth and breadth necessary to lead innovation activities typically requires a deeper rolodex.  

This kind of officer with the experience, organizational understanding, and networks to do this kind of work are generally at the level of Lieutenant Colonel and Colonel, the O5's and O6's of the Army.  LTC Richard Brown put it bluntly in his essay for AUSA, "Staff colonels are the Army’s innovation center of gravity."

Officers this senior can often come with some baggage as well, however.  For example, unless an officer's career has been carefully managed, it is certainly possible that some of the essential Traits of an innovation champion, such as creativity, risk-taking, or optimism, have been suppressed or even beaten out by an unforgiving system.  Fortunately, the right training and environment allows much of this damage to be repaired.  Creativity, for example, "is something you practice...not just a talent you are born with."

All this--filling in technical knowledge and leadership gaps while simultaneously re-energizing officers closer to the end of their careers than to the beginning--is, in military terms, a "heavy lift," a difficult, perhaps impossible, job.  Making it even more challenging is the fact that there is only one realistic opportunity to do it and that is at a senior service college.  In the Army's case, that is the US Army War College.  

The War College, as it turns out, is the critical chokepoint in the Battle of Moore's Chasm.

The 10 month stint at the War College comprises the last in-depth, formal military education most senior officers will receive.  After this, they typically move on to senior staff positions or take command of brigade sized units.  A relatively few of these graduates will go on to become generals and most will complete only one or two more assignments before retiring.  If officers don't get it at the War College, they are unlikely to get this kind of specialized education and training once they get back to the field.

Fortunately, I think the War College understands this generally and I am involved in two specific activities that are deliberately designed to address these challenges, the Futures Seminar and the Futures Lab.

The Futures Seminar use real questions from real senior defense officials to jumpstart a year long project designed, typically, to not only delve deep into the world of technology as well as more generalized "futures-thinking" but also to gain practical skills in managing highly diverse teams of experts as the students seek to integrate their thinking in pursuit of the best possible answer to their sponsor's question.

The Futures Lab also seeks to fill the tech knowledge gap but in a more hands-on way, allowing students an opportunity to spend as much or as little time as they want learning the ins-and-outs of technologies such as 3D printing, drones, virtual reality, and robots.  With a wide variety of technologies and expert assistance available, the Lab creates an environment designed to re-awaken creativity, enthusiasm, and risk-taking.

Who will win?

Andrew Krepinevich, a military strategist and award winning author, in his recent book, The Origins of Victory: How Disruptive Military Innovation Determines the Fates of Great Powers, states:

"Viewed from a lagging competitor’s perspective, failing to keep pace in exploiting the potential of an emerging military revolution risks operating at a severe disadvantage. Consequently, the common challenge for all major-power militaries in a period of military revolution is to be the first to identify its salient characteristics and exploit its potential. Silver medals are not awarded to those who come in second."

If the side that innovates best, that not only employs emerging technologies but also combines them into a system where the whole can be more than the sum of its parts, is the side that wins, then the crucial battle, the first fight, is the Battle of Moore's Chasm, and the US Army will need trained and ready innovation champions to win it.

Note:  The views expressed are those of the author and do not necessarily reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government. 

Tuesday, December 12, 2023

Forget Artificial Intelligence. What We Need Is Artificial Wisdom

I have been thinking a lot about what it means to be "wise" in the 21st Century.

Wisdom, for many people, is something that you accrue over a lifetime.  "Wisdom is the daughter of experience" insisted Leonardo Da Vinci.  Moreover, the sense that experience and wisdom are linked seems universal.  There's an African proverb, for example, of which I am particularly fond that claims, "When an old person dies, a library burns to the ground."  

Not all old people are wise, of course.  Experience sometimes erodes a person, like the steady drip-drip of water on a stone, such that, in the end, there is nothing but a damn fool left.  We have long had sayings about that as well.

Experience, then, probably isn't the only way to become wise and may not even be a necessary pre-condition for wisdom.  How then to define it?

One thing I do know is that people still want wisdom, at least in their leaders.  I know this because I asked my contacts on LinkedIn about it.  100 responses later virtually everyone said they would rather have a wise leader than an intelligent one.  

These results suggest something else as well:  That people know wisdom when they see it.  In other words, the understanding of what wisdom is or isn't is not something that is taught but rather something that is learned implicitly, by watching and evaluating the actions of ourselves and others.

Nowhere is this more obvious than in the non-technical critiques of artificial intelligence (AI).  All of these authors seem nervous, even frightened, about the elements of humanity that are missing in the flawed but powerful versions of AI that have recently been released upon the world.  The AIs, in their view, seem to lack moral maturity, reflective strategic decision-making, and an ability to manage uncertainty and no one, least of all the authors of these critiques, wants AIs without these attributes to be making decisions that might change, well, everything.  This angst seems to be a shorthand for a simpler concept, however:  We want these AIs to not just be intelligent, but to be wise.

For me, then, a good bit of the conversation about AI safety, AI alignment, and "effective altruism" comes down to how to define wisdom.  I'm not a good enough philosopher (or theologian) to have the answer to this but I do have some hypotheses.

First, when I try to visualize a very intelligent person who has only average wisdom, I imagine a person who knows a large number of things.  Their knowledge is encyclopedic but their ability to pull things together is limited.  They lack common sense.  In contrast, when I try to imagine someone who is very wise but of just average intelligence, I imagine someone who knows considerably less but can see the connections between things better and, as a result, can envision second and third order consequences.  The image below visualizes how I see this difference:

This visualization, in turn, suggests where we might find the tools to better define artificial wisdom, in network research, graph theory, and computational social science.

I also think there are some hints lurking in biology, psychology, and neuroscience.  Specifically in the study of cognitive biases.  Over the last 30 years or so, in many disciplines cognitive biases have come to be seen as "bad things"--predictable human failures in logical reasoning.  Recently, though, some of the literature has started to question this interpretation.  If cognitive biases are so bad, if they keep us from making rational decisions, then why aren't we all dead?  Why haven't evolutionary pressures weeded out the illogical?  

If you accept the premise that cognitive biases evolved in humans because they were useful (even if only on the savannahs of east Africa), then it sort of begs the question, 'What did they help us do?"

My favorite attempt at answering this question is the Cognitive Bias Codex (See image below).

By Jm3 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=51528798

Here the authors grouped all of the known cognitive biases into four major categories sorted by what they helped us do:

  • What should we remember
  • What to do when we have too much information
  • What to do when there is not enough meaning
  • What to do when we need to act fast

Interestingly, all of these areas are new areas of research in the AI community (For examples see:  Intentional Forgetting in Artificial Intelligence Systems: Perspectives and Challenges and Intentional Forgetting in Distributed Artificial Intelligence).  

Even the need to act fast, which seems like something at which AI excels, becomes more about wisdom than intelligence when decomposed.  Consider some of the Codex's sub-categories within the need to act fast:

  • We favor simple-looking options and complete information over complex, ambiguous options.
  • To avoid mistakes, we aim to preserve autonomy and group status, and avoid irreversible decisions.
  • To get things done, we tend to complete things we've invested time and energy in.
  • To stay focused, we favor the immediate, relatable thing in front of us.
  • To act, we must be confident we can make an impact and feel what we do is important.

All of these seem to have more to do with wisdom than intelligence.  Furthermore, true wisdom would be most evident in knowing when to apply these rules of thumb and when to engage more deliberative System 2 skills.

As I said, these are just hypotheses, just guesses, based on how I define wisdom.  Despite having thought about it for quite some time, I am virtually certain that I still don't have a good handle on it.

But that is not to say that I don't think there is something there.  Even if only used to help communicate to non-experts the current state of AI (e.g. "Our AIs exhibit some elements of general intelligence but very little wisdom"), it can, perhaps, help describe the state of the art more clearly while also driving research more directly.  

In this regard, it is also worth noting that modern AI dates back to at least the 1950's, and that it has gone through two full blown AI "winters" where most scientists and funders thought that AI would never go anywhere.  In other words, it has taken many years and been a bit of a roller coaster ride to get to where we are today.  It would seem unrealistic to expect artificial wisdom to follow a different path but it is, I would argue, a path worth taking.

Note:  The views expressed are those of the author and do not necessarily reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government. 

Monday, October 30, 2023

The Catch 22 Of Generative AI

A true 3D chart done is the style of
Leonardo Da Vinci (Courtesy MidJourney)
I have always wanted to be able to easily build true 3D charts.  Not one of those imitation ones that just insert a drop shadow behind a 2D column and call it "3D," mind you.  I am talking about a true 3D chart with an X, Y and Z axis.  While I am certain that there are proprietary software packages that do this kind of thing for you, I'm cheap and the free software is either clunky or buggy, and I don't have time for either.

I was excited, then, when I recently watched a video that claimed that ChatGPT could write Python scripts for Blender, the popular open source animation and 3D rendering tool.  I barely know how to use Blender and do not code in Python at all, but am always happy to experiment with ChatGPT.

Armed with very little knowledge and a lot of hope, I opened up ChatGPT and asked it to provide a Python script for Blender that would generate a 3D chart with different colored dots at various points in the 3D space.  I hit enter and was immediately rewarded with what looked like 50 or so lines of code doing precisely what I asked!

I cut and pasted the code into Blender, hit run, and...I got an error message.  So, I copied the error message and pasted it into ChatGPT and asked it to fix the code.  The machine apologized(!) to me for making the mistake and produced new code that it claimed would fix the issue.  

It didn't.

I tried again and again.  Six times I went back to ChatGPT, each time with slightly different error messages from Blender.  Each time, after the "correction," the program failed to run and I received a new error message in return.

Now, I said I didn't know how to code in Python, but that doesn't mean I can't code.  Looking over the error messages, it was obvious to me that the problem was almost certainly something simple, something any Python coder would be able to figure out, correct, and implement.  Such a coder would have saved a vast amount of time as, even when you know what you are doing, 50 lines of code takes a good bit of time to fat-finger.  

In other words, for generative AI to be helpful to me, I would need to know Python, but the reason I went to a generative AI in the first place was because I didn't know Python!  

And therein lies the Catch-22 of generative AI.  

I have seen this same effect in a variety of other situations.  I asked another large language model, Anthropic's Claude, to write a draft of a safety SOP.  It generated a draft very quickly and with surprising accuracy.  There were, however, a number of things that needed to be fixed.  Having written my fair share of safety SOPs back in the day, I was able to quickly make the adjustments.  It saved me a ton of time.  Without understanding what a good safety SOP looked like to begin with, however, the safety SOP created by generative AI risked being, well, unsafe.

At one level, this sounds a lot like some of my previous findings on generative AI such as "Generative AI is a mindnumbingly fast but incredibly average staff officer" or "Generative AI is better at form than content."  And it is.

At another level, however, it speaks to the need for an education system that is both going to keep up with advancements in generative AI while simultaneously maintaining pre-generative AI standards.  The only way, at least for now, to use generative AI safely will be to know more than the AI about the AI's outputs--to know enough to spot the errors.  The only way, in turn, to know more than generative AI is to learn it the old-fashioned way--grind through the material on your own until you are comfortable that you understand it.  Ironically, AI may be able to speed up the grind, but the learning is still on you.  

At another, deeper, level, it is more disturbing.  I worry that people will ask generative AI about things that they think they know but they don't.  Blender acted as a check on both my ignorance and the AI's errors in the first example.  My own experience with safety SOPs acted as a check on the AI in the second example.  What about areas such as political science, security studies, and military strategy where subjectivity reigns?  What if there aren't any checks on the answers generative AI produces?  Dumb questions will lead to incorrect answers which will lead to dumber questions and more incorrect answers--a sort of an AI powered, Dunning-Kruger death spiral.  

This mirrors, of course, one of the many concerns of AI experts.  I also know that there are many good people working hard to ensure that these kinds of scenarios rarely if ever play themselves out.  That said, I am reminded of an old Mark Twain saying that was a near perfect forecast of the problems with social media:  “A lie can travel halfway around the world while the truth is putting on its shoes.”  Perhaps that should be updated for the modern age:  "An AI energized chain reaction of stupid can destroy the world while the prudent are still slipping on their crocs."  

Not as catchy, I suppose, but equally prescient?