Tuesday, July 30, 2024

Center Of Mass (Or How To Think Strategically About Generative AI)

It may seem like generative AI is moving too fast right now for cogent strategic thinking.  At the edges of it, that is probably right.  Those "up in the high country," as Lloyd Bridges might put it (see clip below), are dealing with incalculably difficult technical and ethical challenges and opportunities as each new version of Claude, ChatGPT, Gemini, Llama, or other foundational large language model tries to outperform yesterday's release.

 

That said, while all this churn and hype is very real at the margins, I have seen a fairly stable center start to emerge since November, 2022 when ChatGPT first released.  What do I mean, then, by "a fairly stable center?

For the last 20 months, my students, colleagues, and I have been using a wide variety of generative AI models on all sorts of problems.  Much of this effort has been exploratory, designed to test these tools against realistic, if not real, problems.  Some of it has been real, though--double-checked and verified--but real products for real people.  

It has never been standalone however. No one in the center of mass is ready or comfortable completely turning over anything but scut work to the AIs.  In short, anyone who uses a commercially available AI on a regular basis to do regular work rapidly comes to see them as useful assistants, unable to do most work unsupervised, but of enormous benefit otherwise. 

What else have I learned over the last 20 months? 

As I look at much of what I have written recently, it has almost all been about generative AI and how to think about it.  My target audience has always been regular people looking for an edge in doing regular work--the center of mass.  My goal has been to find the universals--the things that I think are common to a "normal" experience with generative AI.  I don't want to trivialize the legitimate concerns about what generative AIs might be able to do in the future, nor to suggest I have some sort of deep technical insights into how it all works or how to make it better.  I do want to understand, at scale, what it might be good for today and how best to think about it strategically.

My sources of information include my own day-to-day experience of the grind with and without generative AI.  I can supplement that with the experiences of dozens of students and my faculty colleagues (as well as with what little research is currently available).  All together, we think we have learned a lot of "big picture" lessons.  Seven to be exact:
  1. Generative AI is neither a savior nor Satan.  Most people start out in one of these two camps.  The more you play around with generative AIs, the more you realize that both points of view are wrong and that the truth is more nuanced.
  2. Generative AI is so fast it fools you into thinking it is better than it is.  Generative AI is blindingly fast.  A study done last year using writing tasks for midlevel professionals found that participants were 40% faster at completing the task when they used the then current version of ChatGPT.  Once they got past the awe they felt at the speed of the response, most of my students, however, said the quality of the output was little better than average.  The same study mentioned earlier found similar results.  The speed improved 40% but the average quality of the writing only improved 18%.
  3. Generative AI is better at form than content.  Content is what you want to say and form is how you want to say it.  Form can be vastly more important than content if the goal is too communicate effectively.  You'd probably explain Keynesian economics to middle-schoolers differently than you would to PHD candidates, for example.  Generative AI generally excels at re-packaging content from one form to another.  
  4. Generative AI works best if you already know your stuff.  Generative AI is pretty good and it is getting better fast.  But it does make mistakes.  Sometimes it is just plain wrong and sometimes it makes stuff up.  If you know your discipline already, most of these errors are easy to spot and correct.  If you don't know your discipline already, then you are swimming at your own risk.
  5. Good questions are becoming more valuable than good answers.  In terms of absolute costs to an individual user, generative AI is pretty cheap and the cost of a good or good enough answer is plummeting as a result.  This, in turn, implies that the value of good question is going up.  Figuring out how to ask better questions at scale is one largely unexplored way to get a lot more out of a generative AI investment.
  6. Yesterday's philosophy is tomorrow's AI safeguard.  AI is good at some ethical issues, lousy at others (and is a terrible forecaster).  A broad understanding of a couple thousand years of philosophical thinking about right and wrong can actually help you navigate these waters.
  7. There is a difference between intelligence and wisdom.  There is a growing body of researchers who are looking beyond the current fascination with artificial intelligence and towards what some of them are calling "artificial wisdom."  This difference--between intelligence and wisdom--is a useful distinction that captures much of the strategic unease with current generative AIs in a single word.
These "universals" have all held up pretty well since I first started formulating them a little over a year ago.  While I am certain they will change over time and that I might not be able to attest to any of them this time next year, right now they represent useful starting points for a wide variety of strategic thought exercises about generative AIs.

Monday, July 8, 2024

How Good AIs Make Tough Choices

Rushworth Kidder, the ethicist, died 12 years ago. I never met him, but his book "How Good People Make Tough Choices" left a mark. It was required reading in many of my classes, and I still think it is the best book available on the application of philosophy to the moral problems of today.  

Why?  For a start, it is well-organized and easy to read.  Most importantly, though, it doesn't get lost in the back-and-forth that plague some philosophical discussions.  Instead, it tries to provide a modicum of useful structure to help normal people make hard decisions.  In the tradition of some of the earliest philosophers, it is about the application of philosophical thinking to everyday life, not about abstract theorizing.

Don't get me wrong.  I am not against abstract theorizing.  I'm a futurist.  Speculation masquerading as analysis is what I do for a living, after all.  It is just, at some point, we are all faced with tough decisions and we can either let the wisdom of hundreds of philosophers over thousands of years inform that thinking or we can go on instinct.  William Irvine put the consequences even more directly: 

"Why is it important to have such a philosophy? Because without one, there is a danger that you will mislive—that despite all your activity, despite all the pleasant diversions you might have enjoyed while alive, you will end up living a bad life. There is, in other words, a danger that when you are on your deathbed, you will look back and realize that you wasted your one chance at living."

One of the most common questions I get asked these days sits at the intersection of these "tough choices" Kidder was talking about and artificial intelligence.  There is a lot of (justifiable) hand-wringing over the questions of what can we, should we, turn over to AIs on the one hand, and what are the consequences of not turning over enough to the AIs on the other.

For me, these questions begin with another:  What can AIs do already?  In other words, where can AIs clearly outperform humans today?  Fortunately, Stanford collates exactly these kinds of results in an annual AI index (Note:  They don't just collate them, they also put them in plain english with clear charts--well done Stanford!).  The results are summarized in the table below:

Items in dark red are where AIs have already surpassed humans.  The light red is where there is evidence that AIs will surpass humans soon.  This table was put together with help from Claude 3, the AI I think does the best job of reading papers.  I spot checked a number of the results and they were accurate but your mileage may vary.  The estimated time to surpass humans is all Claude, but the time frames seem reasonable to me as well.  If you want the full details, you should check out the Stanford AI Index, which you should do even if you don't want the full details.

The most interesting row (for this post, at least) is the "Moral Reasoning" row.  Here there is a new benchmark, the MoCa benchmark for moral reasoning.  The index highlighted the emergence of harder benchmarks over the last year, stating, "AI models have reached performance saturation on established benchmarks such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones."  In other words, AIs were getting so good, so fast that researchers had to come up with a whole slew of new tests for them to take, including the MoCa benchmark.

MoCa is a clever little benchmark that uses moral and causal challenges from existing cognitive science papers where humans tended to agree on factors and outcomes.  The authors of the paper then present these same challenges to a wide variety of AIs and score the AIs based on something called "discrete agreement" with human judges.  Discrete agreement appears, by the way, to be the scientific name for just plain "agreement"--go figure.  The chart below is from the AI Index not the original paper but summarizes the results:

From the Stanford AI Index.  Scores are from 0-100 with higher scores equaling higher agreement with human judgement.  

If you are scoring things at home, this chart makes AIs look pretty good until you realize that the y axis doesn't include the full range of possible values (A little data-viz sleight of hand there...).  This sort of professorial nit-picking might not matter, though.  This was a study published in late 2023 and there is already a 2024 study out of the University of North Carolina and the Allen Institute that shows significant improvement--albeit on a different benchmark and with a new LLM.  Specifically, the researchers found, "that advice from GPT-4o is rated as more moral, trustworthy, thoughtful, and correct than that of the popular The New York Times advice column, The Ethicist."  See the full chart from the paper below:

Taken from "Large Language Models as Moral Experts? GPT-4o Outperforms Expert Ethicist in Providing Moral Guidance" in pre-print here:  https://europepmc.org/article/PPR/PPR859558 

While these results suggest improvement as models get larger and more sophisticated, I don't think I would be ready to turn over moral authority for the kinds of complex, time-sensitive, and often deadly decisions that military professionals routinely have to make to the AIs anytime soon.

OK.  

Stop reading now.

Take a breath.

(I am trying to keep you from jumping to a conclusion.)  

As you read the paragraph above (the one that begins, "While these results..."), you probably thought one of two things.  Some of you may have thought, "Yeah, the AIs aren't ready now, but they will be and soon.  It's inevitable."  Others of you may have thought, "Never.  It will never happen.  AIs simply cannot replace humans for these kinds of complex moral decisions."  Both positions have good arguments in favor of them.  Both positions also suffer from some major weaknesses.  In classic Kidder-ian fashion, I want to offer you a third way--a more nuanced way--out of this dilemma.

Kidder called this "third way forward, a middle ground between two seemingly implacable alternatives" a trilemma.  He felt that taking the time to try to re-frame problems as trilemmas was an enormously useful way to help solve them. It was about stepping back long enough to imagine a new way forward.  The role of the process, he said, "is not always to determine which of two courses to take. It is sometimes to let the mind work long enough to uncover a third."

What is this third way? Once again, Kidder comes in handy.  He outlined three broad approaches to moral questions:

  • Rules-based thinking (e.g. Kant and the deontologists, etc.)
  • Ends-based thinking (e.g. Bentham and the utilitarians, etc.)
  • Care-based thinking (e.g. The Golden Rule and virtually every religion in the world)
Each of these ways of looking at moral dilemmas intersect with AIs and humans in different ways.

AI is already extremely good at rules-based thinking, for example.  We see this in instances as trivial as programs that play Chess and Go, and we see it in military systems as sophisticated as Patriot and Phalanx.  If we can define a comprehensive rule set (a big “if”) that reliably generates fair and good outcomes, then machines likely can and should be allowed to operate independently.

Ends-based thinking, on the other hand, requires machines to be able to reliably forecast outcomes derived from actions, including second, third, fourth, etc. order consequences.  Complexity Theory (specifically the concept of sensitive dependence on initial conditions) suggests that perfect forecasting is a mathematical impossibility, at least in complex scenarios.  Beyond the math, practical experience indicates that perfection in forecasting is an unrealistic standard.  All this, in turn, suggests that the standard for a machine cannot be perfection.  Rather, it should be “Can it do the job better than a human?”

The “Can the machine do the job better than a human?” question is actually composed of at least three different sub-questions:
  • Can the machine do the job better than all humans?  An appropriate standard for zero-defect environments.
  • Can the machine do the job better than the best humans?  An appropriate standard for environments where there is irreducible uncertainty.
  • Can the machine do the job better than most humans?  A standard that is appropriate where solutions need to be implemented at scale.
If "the job" we are talking about is forecasting, in turns out that the answer, currently, is: Not so much. Philipp Schoenegger, from the London School of Economics, and Peter Park from MIT recently posted a paper to ArXiv where they showed the results of entering GPT-4 into a series of forecasting challenges on Metaculus. For those unfamiliar with Metaculus, it is a public prediction market that looks to crowdsource answers to questions such as Will the People's Republic of China control at least half of Taiwan before 2050? or Will there be Human-machine intelligence parity before 2040?

The results of the study? Here, I'll let them tell you:
"Our findings from entering GPT-4 into a real-world forecasting tournament on the Metaculus platform suggest that even this state-of-the-art LLM has unimpressive forecasting capabilities. Despite being prompted with established superforecasting techniques and best-practice prompting approaches, GPT-4 was heavily outperformed by the forecasts of the human crowd, and did not even outperform a no-information baseline of predicting 50% on every question."

Ouch.

Ends-based thinking is very much a part of most military decisions. If AIs don't forecast well and ends-based thinking requires good forecasting skills, then it might be tempting to write AIs off, at least for now. The trilemma approach helps us out in this situation as well, however. There are powerful stories of hybrid human/machine teams accomplishing more than machines or humans alone that are starting to appear. As more and more of these stories accumulate, it should be possible to detect the "golden threads," the key factors that allow the human and machine to optimally integrate.

Finally, Kidder defined care-based thinking as “putting love for others first.”  It is here that machines are at their weakest against humans.  There are no benchmarks (yet) for concepts such as “care” and “love.”  Furthermore, no one seems to expect these kinds of true feelings from an AI anytime soon.  Likewise, care-based thinking requires a deep and intuitive understanding of the multitude of networks in which all humans find themselves embedded.  

While the machines have no true ability to demonstrate love or compassion, they can simulate these emotions quite readily.  Whether it is because of anthropomorphic bias, the loneliness epidemic, or other factors, humans can and do fall in love with AIs regularly.  This tendency turns the AIs' weakness into a strength in the hands of a bad faith actor.  AIs optimized to elicit sensitive information from unsuspecting people are likely already available or will be soon.

Beyond the three ways of thinking about moral problems, Kidder went on to define four scenarios that are particularly difficult for humans and are likely to be equally challenging for AIs. Kidder refers to these as “right vs right” scenarios, “genuine dilemmas precisely because each side is firmly rooted in one of our basic, core values.” They include:
  • Truth vs. loyalty
  • Individual vs. community
  • Short-term vs. long term
  • Justice v. mercy
Resolving these kinds of dilemmas involves more than just intelligence. These kinds of problems seem to require a different characteristic–wisdom–and wisdom, like intelligence can, theoretically at least, be artificial.

Artificial Wisdom is a relatively new field (almost 75% of the articles in Google Scholar that mention Artificial Wisdom have been written since 2020). The impetus behind this research seems to be a genuine concern that intelligence is not sufficient for the challenges that face humanity. As Jeste, et al. put it, “The term “intelligence” does not best represent the technological needs of advancing society, because it is “wisdom”, rather than intelligence, that is associated with greater well-being, happiness, health, and perhaps even longevity of the individual and the society.”

I have written about artificial wisdom elsewhere and I still think it is a useful way to think about the problem of morality and AIs. For leaders, "wisdom" is a useful shorthand for communicating many of the concerns they have about turning operations, particularly strategic operations, over to AIs. I think it is equally useful for software developers, however. Wisdom, conceptually, is very different from intelligence but no less desirable. Using the deep literature about wisdom to help reframe problems will likely lead to novel and useful solutions.

Monday, February 5, 2024

The Battle of Moore's Chasm And Who Will Win The Next War

There is a battle going on right now.  It is being fought by every military in the world.  

Victory in this battle is crucial.  The militaries' on the winning side will likely be on the winning side of the next large-scale war.  The losers will likely be forgotten, studied only for the mistakes they made.

This is the Battle of Moore's Chasm.

This battle is taking place everywhere.  There are physical manifestations of it in Ukraine, the Taiwan Strait, and Gaza, but there are equally important conceptual and theoretical manifestations of it in the Pentagon, on Arbatskaya Square in Moscow, and deep inside the August 1 Building in Beijing.

What this battle is about and how to win it are the subjects of this article.

What Is The Battle Of Moore's Chasm?

To understand this battle it is necessary, at first, to travel back to 1962.  It was then that a young professor of rural sociology, Everett Rogers, published what was to become the second most cited book in all the social sciences, Diffusion of Innovations 

While the book contains much that is still relevant today, the part that is important to the current battle is the idea that the "market" for an idea, an innovation, a new concept, or a technology generally follows a bell curve and that this bell curve can be divided into five major sections of users (See chart below):  Innovators, Early Adopters, Early Majority, Late Majority and Laggards. 

Source:  https://en.wikipedia.org/wiki/Diffusion_of_innovations

Fast forward to 1989, when two researchers at the famous consulting firm, Regis McKenna, Inc. (RMI), Warren Schirtzinger and James Lee, hypothesized and then demonstrated that there was a "chasm" between the early adopters and the early majority.  

This chasm existed largely due to the different motivations of the members of these groups.  Innovators and Early Adopters are very much into cool, new things.  They tend to be more enamored with the potential of a new technology or process than they are with the utility or scalability of these products.  Early and Late Majority motivations, on the other hand, typically have more to do with solving particular problems and doing so at the lowest cost and at a scale that is appropriate for their organization.

Another researcher at RMI, Geoffrey Moore, picked up on the idea and, in 1991, published what was to become one of the most influential business books ever, Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers. Now in its third edition, it has sold over a million copies and is considered by Inc. magazine as one of the top ten marketing books ever written. Think Insights has a good article that lays out the main ideas in detail, but for our purposes, their chart showing the chasm is sufficient:

Think Insights (January 3, 2024) Crossing The Chasm – Technology Adoption Lifecycle. Retrieved from https://thinkinsights.net/strategy/crossing-the-chasm/.

Most importantly, Moore's Chasm has become synonymous with the place where good ideas go to die.  Whether it is a lack of capital, innovator inexperience, or an inability to get traction in the much more lucrative Early and Late Majority markets, failure to bridge the chasm leads, at best, to relegation to a niche market and, at worst, to inevitable decline and bankruptcy. 

While almost all of these ideas and the literature accompanying the chasm have come out of business journals, it has a direct and immediate correlation with issues faced by militaries around the world.  Indeed, Secretary of the Army, Christine Wormuth recently said:

“This is a crucial moment for the Army to summon our ingenuity, to innovate and invest in emerging technologies, to test and develop in uncharted areas like artificial intelligence and contested domains like space and cyber, to reshape and transform the force to be more adaptable and flexible.”

Yet, across the globe, the difference between how much militaries want to innovate and how much they are actually innovating seems to be heading in the wrong direction.  As the Boston Consulting Group highlighted in its report last year on the defense innovation readiness gap:

"One of (the report's) most important findings is that the defense innovation readiness gap significantly increased in the year since our first study. Across 10 of the 11 dimensions of readiness assessed, MoDs failed to match their 2021 results, by an average of 8%."

Moreover, there is some evidence to suggest that this chasm exists within the US Department of Defense as well.  A recent report by the Center for Security and Emerging Technology put it this way:

"However, under the DOD’s current organizational structure, defense innovation is disconnected from defense procurement. This division limits innovation offices’ ability to affect technological change across the military and excuses procurement offices from integrating cutting-edge capabilities into major systems and platforms." (Italics mine)

The Battle of Moore's Chasm is real, and right now, no one is winning.

Who Will Fight This Battle?

While there are a number of possible ways to win the battle (the CSET paper, for example, references three), all of these courses of action require the right people to implement them.  Acquisition officers, policy wonks, commanders, and others all do and will have their role to play.  The most important warrior in this battle, however, is the innovation champion.

Developed about the same time as Diffusion of Innovations Theory, the idea of an innovation champion was first put forward by Dr. Donald Schön in the Harvard Business Review article, "Champions for radical new inventions."  Since then, thousands of articles (Google Scholar says about 2140) have been written about the role, traits, and importance of innovation champions in driving modernization and incorporating emerging technologies across a wide variety of fields.  

All of the more modern definitions of innovation champion are similar to the one developed by researchers at the German Graduate School of Management and Law:  "an innovation champion is an individual or a group of individuals who is willing to take risks to enthusiastically promote innovations through the various stages of the development process."


This same paper identified five skills, seven traits, and three different kinds of knowledge that were characteristic of innovation champions based on a systematic literature analysis looking at 85 of the most influential journal articles on the topic (See image to the left).

The approach here is similar to the approach taken by the US Army in teaching leadership.  With leadership, the Army focuses on Attributes (roughly equivalent to Traits in the chart to the left) and Competencies (roughly equivalent to Skills and Knowledge in the chart).  A fundamental premise of Army leadership training is that "most people have leadership potential and can learn to be effective leaders."  The same could be said, perhaps, for innovation champions.

While the approach is similar, there is not a one-to-one correlation between what the Army thinks makes a good leader and what is necessary for an innovation champion (See chart below and to the right).

Source:  ADP 6-22, ARMY LEADERSHIP
AND THE PROFESSION, 2019

 
In short, while routine Army leadership training likely covers many of the attributes of an innovation champion, it is equally likely that there are several gaps that will need to be filled if the Army is to have the warriors it needs for the ongoing battle.

Specifically, having the minimal technical knowledge necessary to champion particular innovations jumps out as one such requirement.  Many soldiers are so deeply involved in the day-to-day activities of running the Army or fighting in the country's conflicts, that they have little time for understanding arcane emerging technologies such as 3D printing, quantum computing, synthetic biology, 6 and 7G telecommunications systems, augmented reality, and others. Yet decisions, potentially costing billions of dollars, regarding the development, testing and fielding of these technologies will need to be made regularly and soon if the US Army's technical advantage is to remain.

Likewise, would-be innovation champions will need to learn the transformational leadership skills necessary to manage teams of experts from disparate fields.  Most military officers have grown up in an environment similar to Machiavelli's Kingdom of the Turk, which "is governed by one lord, the others are his servants; and, dividing his kingdom into sanjaks, he sends there different administrators, and shifts and changes them as he chooses."  

This hierarchical organization with its emphasis on commanders and their intent suddenly gives way when confronted by interdisciplinary teams of experts and contractors in the diverse technical fields common to innovation activities.  Here the comfortable chain of command often is replaced with something akin to Machiavelli's Kingdom of the Franks, where officers find themselves "placed in the midst of an ancient body of lords, acknowledged by their own subjects, and beloved by them; they have their own prerogatives, nor can the king take these away except at his peril."  Leading innovation activities, in short, requires different skills than leading at the tactical and operational levels.

Where Will These Champions Come From?

Some of these Skills and Knowledge categories also typically require a certain level of experience.  For example, all officers understand their organization to a certain extent, but it takes a relatively senior officer to have a feel for the entire enterprise.  Likewise, officers, as they move from one assignment to another, develop useful networks, but the kind of depth and breadth necessary to lead innovation activities typically requires a deeper rolodex.  

This kind of officer with the experience, organizational understanding, and networks to do this kind of work are generally at the level of Lieutenant Colonel and Colonel, the O5's and O6's of the Army.  LTC Richard Brown put it bluntly in his essay for AUSA, "Staff colonels are the Army’s innovation center of gravity."

Officers this senior can often come with some baggage as well, however.  For example, unless an officer's career has been carefully managed, it is certainly possible that some of the essential Traits of an innovation champion, such as creativity, risk-taking, or optimism, have been suppressed or even beaten out by an unforgiving system.  Fortunately, the right training and environment allows much of this damage to be repaired.  Creativity, for example, "is something you practice...not just a talent you are born with."

All this--filling in technical knowledge and leadership gaps while simultaneously re-energizing officers closer to the end of their careers than to the beginning--is, in military terms, a "heavy lift," a difficult, perhaps impossible, job.  Making it even more challenging is the fact that there is only one realistic opportunity to do it and that is at a senior service college.  In the Army's case, that is the US Army War College.  

The War College, as it turns out, is the critical chokepoint in the Battle of Moore's Chasm.

The 10 month stint at the War College comprises the last in-depth, formal military education most senior officers will receive.  After this, they typically move on to senior staff positions or take command of brigade sized units.  A relatively few of these graduates will go on to become generals and most will complete only one or two more assignments before retiring.  If officers don't get it at the War College, they are unlikely to get this kind of specialized education and training once they get back to the field.

Fortunately, I think the War College understands this generally and I am involved in two specific activities that are deliberately designed to address these challenges, the Futures Seminar and the Futures Lab.

The Futures Seminar use real questions from real senior defense officials to jumpstart a year long project designed, typically, to not only delve deep into the world of technology as well as more generalized "futures-thinking" but also to gain practical skills in managing highly diverse teams of experts as the students seek to integrate their thinking in pursuit of the best possible answer to their sponsor's question.

The Futures Lab also seeks to fill the tech knowledge gap but in a more hands-on way, allowing students an opportunity to spend as much or as little time as they want learning the ins-and-outs of technologies such as 3D printing, drones, virtual reality, and robots.  With a wide variety of technologies and expert assistance available, the Lab creates an environment designed to re-awaken creativity, enthusiasm, and risk-taking.

Who will win?

Andrew Krepinevich, a military strategist and award winning author, in his recent book, The Origins of Victory: How Disruptive Military Innovation Determines the Fates of Great Powers, states:

"Viewed from a lagging competitor’s perspective, failing to keep pace in exploiting the potential of an emerging military revolution risks operating at a severe disadvantage. Consequently, the common challenge for all major-power militaries in a period of military revolution is to be the first to identify its salient characteristics and exploit its potential. Silver medals are not awarded to those who come in second."

If the side that innovates best, that not only employs emerging technologies but also combines them into a system where the whole can be more than the sum of its parts, is the side that wins, then the crucial battle, the first fight, is the Battle of Moore's Chasm, and the US Army will need trained and ready innovation champions to win it.

Note:  The views expressed are those of the author and do not necessarily reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government.