Monday, October 30, 2023

The Catch 22 Of Generative AI

A true 3D chart done is the style of
Leonardo Da Vinci (Courtesy MidJourney)
I have always wanted to be able to easily build true 3D charts.  Not one of those imitation ones that just insert a drop shadow behind a 2D column and call it "3D," mind you.  I am talking about a true 3D chart with an X, Y and Z axis.  While I am certain that there are proprietary software packages that do this kind of thing for you, I'm cheap and the free software is either clunky or buggy, and I don't have time for either.

I was excited, then, when I recently watched a video that claimed that ChatGPT could write Python scripts for Blender, the popular open source animation and 3D rendering tool.  I barely know how to use Blender and do not code in Python at all, but am always happy to experiment with ChatGPT.

Armed with very little knowledge and a lot of hope, I opened up ChatGPT and asked it to provide a Python script for Blender that would generate a 3D chart with different colored dots at various points in the 3D space.  I hit enter and was immediately rewarded with what looked like 50 or so lines of code doing precisely what I asked!

I cut and pasted the code into Blender, hit run, and...I got an error message.  So, I copied the error message and pasted it into ChatGPT and asked it to fix the code.  The machine apologized(!) to me for making the mistake and produced new code that it claimed would fix the issue.  

It didn't.

I tried again and again.  Six times I went back to ChatGPT, each time with slightly different error messages from Blender.  Each time, after the "correction," the program failed to run and I received a new error message in return.

Now, I said I didn't know how to code in Python, but that doesn't mean I can't code.  Looking over the error messages, it was obvious to me that the problem was almost certainly something simple, something any Python coder would be able to figure out, correct, and implement.  Such a coder would have saved a vast amount of time as, even when you know what you are doing, 50 lines of code takes a good bit of time to fat-finger.  

In other words, for generative AI to be helpful to me, I would need to know Python, but the reason I went to a generative AI in the first place was because I didn't know Python!  

And therein lies the Catch-22 of generative AI.  

I have seen this same effect in a variety of other situations.  I asked another large language model, Anthropic's Claude, to write a draft of a safety SOP.  It generated a draft very quickly and with surprising accuracy.  There were, however, a number of things that needed to be fixed.  Having written my fair share of safety SOPs back in the day, I was able to quickly make the adjustments.  It saved me a ton of time.  Without understanding what a good safety SOP looked like to begin with, however, the safety SOP created by generative AI risked being, well, unsafe.

At one level, this sounds a lot like some of my previous findings on generative AI such as "Generative AI is a mindnumbingly fast but incredibly average staff officer" or "Generative AI is better at form than content."  And it is.

At another level, however, it speaks to the need for an education system that is both going to keep up with advancements in generative AI while simultaneously maintaining pre-generative AI standards.  The only way, at least for now, to use generative AI safely will be to know more than the AI about the AI's outputs--to know enough to spot the errors.  The only way, in turn, to know more than generative AI is to learn it the old-fashioned way--grind through the material on your own until you are comfortable that you understand it.  Ironically, AI may be able to speed up the grind, but the learning is still on you.  

At another, deeper, level, it is more disturbing.  I worry that people will ask generative AI about things that they think they know but they don't.  Blender acted as a check on both my ignorance and the AI's errors in the first example.  My own experience with safety SOPs acted as a check on the AI in the second example.  What about areas such as political science, security studies, and military strategy where subjectivity reigns?  What if there aren't any checks on the answers generative AI produces?  Dumb questions will lead to incorrect answers which will lead to dumber questions and more incorrect answers--a sort of an AI powered, Dunning-Kruger death spiral.  

This mirrors, of course, one of the many concerns of AI experts.  I also know that there are many good people working hard to ensure that these kinds of scenarios rarely if ever play themselves out.  That said, I am reminded of an old Mark Twain saying that was a near perfect forecast of the problems with social media:  “A lie can travel halfway around the world while the truth is putting on its shoes.”  Perhaps that should be updated for the modern age:  "An AI energized chain reaction of stupid can destroy the world while the prudent are still slipping on their crocs."  

Not as catchy, I suppose, but equally prescient?

Tuesday, October 3, 2023

Generative AI Is Better At Form Than Content

Prudens Futuri prominently displayed in
the US Army War College's new academic building


All communications have at least two aspects, form and content.  Content is what we want to say.  It could be about politics, or philosophy, or here at the War College, rather predictably, about war.

Form is different.  The medium through which we express ideas impacts how we communicate those ideas. For instance, the same core concepts about politics, philosophy, or war can be conveyed very differently depending on whether they are presented through a 20 minute PowerPoint presentation or a 2,000 word paper. The form we choose defines how we communicate the message.

Let me give you an example.  My current favorite generative AI is Anthropic's Claude.  I asked it to explain the meaning of the US Army War College's motto, "Prudens Futuri," to the incoming class of students.  Here is what it came back with:

"The Latin phrase "prudens futuri" translates to "foreseeing the future" or "provident of the future." It refers to the wisdom and foresight required of effective leaders. Prudent leaders do not simply react to current circumstances - they have a vision for the future and proactively position their forces to capitalize on anticipated events. As Sun Tzu advised, foreknowledge enables leaders to seize opportunities and gain advantage. Prudens futuri requires gathering intelligence, conducting analysis, forecasting threats, and planning for contingencies. Leaders who excel at prudens futuri can execute bold, unconventional strategies to defeat enemies who rely on reactionary thinking. Prudens futuri epitomizes the innovative, forward-looking mindset the Army must adopt to maintain strategic dominance."

Not bad.  I also asked it to explain the same thing to a 13 year old:

"'Prudens futuri' is a Latin phrase that means "foreseeing the future." It's about being wise enough to predict what might happen next. Great leaders don't just respond to what's in front of them - they think ahead. Imagine you're a general planning to attack an enemy. A wise general would send scouts to spy first and learn the enemy's weaknesses. Then the general can make a smart battle plan to sneak around and surprise them from behind. The Latin phrase teaches that real leaders don't just react - they use their brains to see the future and stay one step ahead. That's what "prudens futuri," or "foreseeing the future," is all about!"

Also not bad--particularly given the change in audience.  

It is even possible to ask the same of a text-to-image AI like Midjourney, used to create the image below from the same prompt:


In this image, I am particularly taken with the way the AI combined historical references to foresight, such as the Oracle of Delphi, with more futuristic concepts like the (apparent) space elevator in the background.  In fact, I find asking text-to-image generators to imagine some futuristic scene to be a very practical way to fill in gaps in futures thinking.  I inevitably see something that makes sense in the image that was not accounted for in the text (kind of like The Spandrels of San Marco and the Panglossian Paradigm).

My point?  I have yet to have a generative AI create something that did not need tweaking, at the least, on the content side.  I have been really happy, however, with generative AI's ability to master particular forms.  

This is one of the reasons, I think, I have quite recently become a bit uncomfortable with policies that talk about citing a generative AI as if it were a source.  It is, I suppose...but it seems less of a source than Wikipedia, and, while I love Wikipedia and believe it is one of the great wonders of the modern world, I would not cite Wikipedia for anything other than background.  I require my students, for example, to find a reputable source to validate anything that a generative AI might come up with when making an estimate.  And, if you are going to make a student find a reputable source anyway, why would they need the generative AI at all?  The answer, of course, is for the form.  

This may not be true forever.  Generative AI is getting better at a brisk pace.  There may come a day when generative AI is looked upon as an authority, equal to peer-reviewed papers.  Until that time, we should still appreciate its talents for helping to craft the message. For now, generative AI is an unparalleled writing partner, not an independent thinker. By acknowledging its current limits alongside its awesome potential, we grant generative AI its proper place: revolutionizing how we communicate knowledge, while established methods still reign over what we know.

Wednesday, August 16, 2023

Answers For Pennies, Insights For Dollars: Generative AI And The Question Economy

No one seems to know exactly where the boom in Generative AIs (like ChatGPT and Claude) will lead us, but one thing is for certain:  These tools are rapidly driving down the cost of getting a good (or, at least, good enough) answer very quickly.  Moreover, they are likely to continue to do so for quite some time.  

The data is notional
but the trend is unquestionable, I think.

To be honest, this has been a trend since at least the mid-1800's with the widespread establishment of public libraries in the US and UK.  Since then, improvements in cataloging, the professionalization of the workforce, and technology, among other things, worked to drive down the cost of getting a good answer (See chart to the right).

The quest for a less expensive but still good answer accelerated, of course, with the introduction of the World Wide Web in the mid-1990's, driving down the cost of answering even tough questions.  While misinformation, disinformation, and the unspeakable horror that social media has become will continue to lead many people astray, savvy users are better able to find consistently good answers to harder and more obscure questions than ever before.  

If the internet accelerated this historical trend of driving down the cost of getting a good answer, the roll-out of generative AI to the public in late 2022 tied a rocket to its backside and pushed it off a cliff.  Hallucinations and bias to the side, the simple truth is that generative AI is, more often than not, able to give pretty good answers to an awful lot of questions and it is free or cheap to use.  

How good is it?  Check out the chart below (Courtesy Visual Capitalist).  GPT-4, OpenAI's best, publicly available, large language model, blows away most standardized tests.  


It is important to note that this chart was made in April, 2023 and represent results from GPT-4.  OpenAI is working on GPT 5 and five months in this field is like a dozen years in any other (Truly.  I have been watching tech evolve for 50 years.  Nothing in my lifetime has ever improved as quickly as generative AIs have).  Eventually, the forces driving these improvements will reach a point of diminishing returns and growth will slow down and maybe even flatline, but that is not the trajectory today.

All this sort of begs a question, though: If answers are getting better, cheaper, and more widely available at an accelerating rate, what's left?  In other words, if no one needs to pay for my answers anymore, what can I offer?  How can I make a living?  Where is the value-added?  This is precisely the sort of thinking that led Goldman-Sachs to predict the loss of 300 million jobs worldwide due to AI.  

My take on it is a little different.  I think that as the cost of a good answer goes down, the value of a good question goes up.  
In short, the winners in the coming AI wars are going to be the ones who can ask the best questions at the most opportune times.  

There is evidence, in fact, that this is already becoming the case.  Go to Google and look for jobs for "prompt engineers."  This term barely existed a year ago.  Today, it is one of the hottest growing fields in AI.  Prompts are just a fancy name for the questions that we ask of generative AI, and a prompt engineer is someone who knows the right questions to ask to get the best possible answers.  There is even a marketplace for these "good questions" called Promptbase where you can, for aa small fee, buy a customizable prompt from someone who has already done the hard work of optimizing the question for you.

Today, earning the qualifications to become a prompt engineer is a combination of on-the-job training and art.  There are some approaches, some magical combination of words, phrases, and techniques, that can be used to get the damn machines to do what you want.  Beyond that, though, much of what works seems to have been discovered by power users who are just messing around with the various generative AIs available for public use.

None of this is a bad thing, of course.  The list of discoveries that have come about from people just messing around or mashing two things together that have not been messed with/mashed together before is both long and honorable.  At some point, though, we are going to have to do more than that.  At some point, we are going to have to start teaching people how to ask better questions of AI.

The idea that asking the right question is not only smart but essential is a old one:

“A prudent question is one-half of wisdom.” – Francis Bacon
"The art of proposing a question must be held of higher value than solving it.” – Georg Cantor
“If you do not know how to ask the right question, you discover nothing.” – W. Edwards Deming

And we often think that at least one purpose of education, certainly of higher education, is to teach students how to think critically; how, in essence to ask better questions.  

But is that really true?  Virtually our whole education system is structured around evaluating the quality of student answers.  We may think that we educate children and adults to ask probing, insightful questions but we grade, promote, and celebrate students for the number of answers they get right.  

What would a test based not on the quality of the answers given but on the quality of the questions asked even look like?  What criteria would you use to evaluate a question?  How would you create a question rubric?  

Let me give you an example.  Imagine you have told a group of students that they are going to pretend that they are about to go into a job interview.  They know, as with most interviews, that once the interview is over, they will get asked, "Do you have any questions for us?"  You task the students to come up with interesting questions to ask the interviewer.

Here is what you get from the students:
  1. What are the biggest challenges that I might face in this position?
  2. What are the next steps in the hiring process?
  3. What’s different about working here than anywhere else you’ve ever worked?
What do you think?  Which question is the most interesting?  Which question gets the highest grade?  If you are like the vast majority of the people I have asked, you say #3.  But why?  Sure, you can come up with reasons after the fact (humans are good at that), but where is the research that indicates why an interesting question is...well, interesting?  It doesn't exist (to my knowledge anyway).  We are left, like Justice Stewart and the definition of pornography, with "I know it when I see it."

What about "hard" questions?  Or "insightful" questions?  Knowing the criteria for each of these and teaching those criteria such that students can reliably ask better questions under a variety of circumstances seems like the key to getting the most out of AI.  There is very little research, however, on what these criteria are.  There are some hypotheses to be sure, but statistically significant, peer-reviewed research is thin on the ground.

This represents an opportunity, of course, for intellectual overmatch.  If there is very little real research in this space, then any meaningful contribution is likely to move the discipline forward significantly.  If what you ask in the AI-enabled future really is going to be more important than what you know, then such an investment seems not just prudent, but an absolute no-brainer.

Monday, July 24, 2023

Generative AI Is Like A ...

This will make sense in a minute...
Don't worry!  I'm going to fill in the blank, but before I do, have you played around with generative AI yet?  

If not, let's solve that problem first.

Go to Peplexity.ai--right now and before your read any further--and ask it a question.  Don't ask it a question it can't know the answer to (like, "What did I have for lunch?"), but do ask it a hard question that you do know the answer to (or for which you are at least able to recognize a patently bad answer).  Then, ask Perplexity some follow up questions.  One or two should be enough.

Come back when you are finished.

Now rate the answers you got on a scale from 1-10.  One or two is a dangerous answer, one that could get someone hurt or cause real problems.  Give a nine or ten to an actionable answer, one that you could use right now, as is.

I have had the opportunity to run this exercise with a large number of people at a variety of conferences and training events over the last six months.  First, I consistently find that only about a third of the crowd have ever used any generative AIs (like Perplexity or ChatGPT) though that number seems to be going up (as you would expect) over time.

I have rarely heard anyone give an answer a one or two and always have at least a couple of people give the answer they received a nine or ten.  Other members of the each audience typically gave scores that range across the spectrum, of course, but the average seemed to be about a six.  

Yesterday, I gave this same exercise to about 30 people and there were no 1 or 2's and three people (10%) gave their answer a 9 or 10.  No one gave the answer less than a 5.  No one.  

While anecdotal, it captures a trend that has been thoroughly documented across a number of different domains:  Generative AI isn't hitting like a freight train.  It's hitting like one of those high-speed, Japanese bullet trains, vaporizing traditional paradigms so quickly that they still don't know that they are already dead (For example...).

Or is it?

Thanks to some forward-thinking policy guidance from the leadership here at the Army War College, I, along with my colleagues Dr. Kathleen Moore and LTC Matt Rasmussen, were able to teach a class for most of last year with the generative AI switch set to "on."  

The class is called the Futures Seminar and is explicitly designed to explore futures relevant to the Army, so it was perfectly appropriate for an exploration of AI.  It is also an all year elective course so we were able to start using these tools when they first hit the street in November 22 and continue to use them until the school year ended in June.  Finally, Futures Seminar students work on research questions posed by Army senior leaders, so lessons learned from this experience ought to apply to the real world as well.

We used generative AIs for everything.  We used them for brainstorming.  We used them to critique our analysis.  We used them to red-team.  We created our own bots, like DigitalXi, that was designed to take the perspective of Xi Jinping and answer our questions as he would.  We visualized using Midjourney and Dalle-2 (see picture above made with Midjourney).  We cloned people's voices and created custom videos.  We tapped into AI aggregation sites like Futurepedia and There's An AI For That to find tools to help create everything from custom soundtracks to spreadsheets.

We got lots of feedback from the students and faculty, of course, both formal and informal.  We saw two big trends.  The first is that people either start at the "AI is going to save the earth" end of the spectrum or the "AI is going to destroy the earth" end.  For people who haven't tried it yet, there seems to be little middle ground.  

The second thing we saw is that, over time and sort of as you would expect, people develop a more nuanced view of AI the more they use it.  

In the end, if I had to boil down all of the comments and feedback it would be, generative AI is like a blazingly fast, incredibly average staff officer.

Let me break that down a bit.  Generative AI is incredibly fast at generating an answer.  I think this fools people, though.  It makes it seem like it is better than it actually is.  On real world problems, with second and third order causes and consequences that have to be considered, the AIs (and we tried many) were never able to just nail it.  They were particularly bad at seeing and managing the relationships between the moving pieces of complex problems and particularly good at doing administrivia (I got it to write a great safety SOP).  In the end, the products were average, sometimes better, sometimes worse, but, overall, average.  That said, the best work tended to come not from an AI alone or a student alone, but with the human and machine working together.  

I think this is a good place for USAWC students to be right now.  The students here are 25 year military professionals who have all been successful staff officers and commanders.  They know what good, great, average, and bad staff work looks like.  They also know that, no matter what the staff recommends, if the commander accepts it, the work becomes the commander's.  In other words, if a commander signs off on a recommendation, it doesn't matter if it came from two tired majors or a shiny new AI.  That commander now owns it.  Finally, our students are comfortable working with a staff.  Seeing the AI as a staff officer instead of as an answer machine is not only a good place for them to be mentally, but also likely to be the place where the best work is generated.

Finally, everyone--students and faculty alike--noted that this is where AI currently is.  Everyone expects it to get better over time, for all those 1's and 2's from the exercise above to disappear and for the 9's and 10's to grow in number.  No one knows what that truly means, but I will share my thoughts on this in the next post. 

While all this evidence is anecdotal, we also took some time to run some more formal studies and more controlled tests.  Much of that is still being written or shopped around to various journals, but two bits of evidence jumped out at me from a survey conducted by Dr. Moore.

First, she found that our students, who had worked with AI all year, perceived it likely to be 20% more useful to the Army than the rest of the student body (and 31% more useful than the faculty).  Second, she also found that 74% of Futures Seminar students walked away from the experience thinking that the benefits of developing AI outweigh the risks with only 26% unsure.  General population students were much more risk averse with only 8% convinced the benefits outweigh the risks with a whopping 55% unsure and 37% saying the risks outweigh the benefit.

This last finding highlights something of which I am now virtually certain:  The only real way to learn about generative AI is to use it.  No amount of lecture, discussion, powerpoints, what have you will replace just sitting down at a computer and using these tools.  What you will find is that your own view will become much more informed, much more quickly, and in much greater detail than any other approach you might take to understand this new technology.

Gaining this understanding is critical.  Generative AI is currently moving at a lightning pace.  While there is already some talk that the current approach will reach a point of diminishing returns in the future due to data quality, data availability, and cost of training, I don't think we will reach this point anytime soon.  Widely applicable, low-cost AI solutions are no longer theoretical.  Strategic decisionmakers have to start integrating their impact into their plans now.

Wednesday, October 20, 2021

Is It OK To Sell Eggs To Gophers?

Apparently not...

...At least according to a recently launched experiment in ethical artificial intelligence (AI).  Put together by a number of researchers at the Allen Institute for AI, Ask Delphi lets you submit a plain English question and get a straight answer.  









It does pretty well with straightforward questions such as "Should I rob a bank?"  







It also appears to have some sense of self-awareness: 









It has surprisingly clear answers for at least some paradoxes:






And for historically profound questions of philosophy:






And these aren't the only ways it is clearly not yet perfect:








None of its imperfections are particularly important at this point, though.  It is still a fascinating experiment in AI and ethics.  As the authors themselves say, it "is intended to study the promises and limitations of machine ethics and norms through the lens of descriptive ethics. Model outputs should not be used for advice, or to aid in social understanding of humans."

I highly recommend it to anyone interested in the future of AI.  

For me, it also highlights a couple of issues for AI more generally.  First, the results are obviously interesting, but it would be even more interesting if the chatbot could explain its answers in equally straightforward English.  This is likely a technical bridge too far right now, but explainable AI is, in my opinion, not only important but essential to instilling confidence in human users as the stakes associated with AI go up. 

The second issue is how will AI deal with nonsense?  How will it separate nonsense from questions that simply require deeper thought, like koans?  There seems to still be a long way to go but this experiment is certainly a fascinating waypoint on the journey.