Tuesday, October 3, 2023

Generative AI Is Better At Form Than Content

Prudens Futuri prominently displayed in
the US Army War College's new academic building


All communications have at least two aspects, form and content.  Content is what we want to say.  It could be about politics, or philosophy, or here at the War College, rather predictably, about war.

Form is different.  The medium through which we express ideas impacts how we communicate those ideas. For instance, the same core concepts about politics, philosophy, or war can be conveyed very differently depending on whether they are presented through a 20 minute PowerPoint presentation or a 2,000 word paper. The form we choose defines how we communicate the message.

Let me give you an example.  My current favorite generative AI is Anthropic's Claude.  I asked it to explain the meaning of the US Army War College's motto, "Prudens Futuri," to the incoming class of students.  Here is what it came back with:

"The Latin phrase "prudens futuri" translates to "foreseeing the future" or "provident of the future." It refers to the wisdom and foresight required of effective leaders. Prudent leaders do not simply react to current circumstances - they have a vision for the future and proactively position their forces to capitalize on anticipated events. As Sun Tzu advised, foreknowledge enables leaders to seize opportunities and gain advantage. Prudens futuri requires gathering intelligence, conducting analysis, forecasting threats, and planning for contingencies. Leaders who excel at prudens futuri can execute bold, unconventional strategies to defeat enemies who rely on reactionary thinking. Prudens futuri epitomizes the innovative, forward-looking mindset the Army must adopt to maintain strategic dominance."

Not bad.  I also asked it to explain the same thing to a 13 year old:

"'Prudens futuri' is a Latin phrase that means "foreseeing the future." It's about being wise enough to predict what might happen next. Great leaders don't just respond to what's in front of them - they think ahead. Imagine you're a general planning to attack an enemy. A wise general would send scouts to spy first and learn the enemy's weaknesses. Then the general can make a smart battle plan to sneak around and surprise them from behind. The Latin phrase teaches that real leaders don't just react - they use their brains to see the future and stay one step ahead. That's what "prudens futuri," or "foreseeing the future," is all about!"

Also not bad--particularly given the change in audience.  

It is even possible to ask the same of a text-to-image AI like Midjourney, used to create the image below from the same prompt:


In this image, I am particularly taken with the way the AI combined historical references to foresight, such as the Oracle of Delphi, with more futuristic concepts like the (apparent) space elevator in the background.  In fact, I find asking text-to-image generators to imagine some futuristic scene to be a very practical way to fill in gaps in futures thinking.  I inevitably see something that makes sense in the image that was not accounted for in the text (kind of like The Spandrels of San Marco and the Panglossian Paradigm).

My point?  I have yet to have a generative AI create something that did not need tweaking, at the least, on the content side.  I have been really happy, however, with generative AI's ability to master particular forms.  

This is one of the reasons, I think, I have quite recently become a bit uncomfortable with policies that talk about citing a generative AI as if it were a source.  It is, I suppose...but it seems less of a source than Wikipedia, and, while I love Wikipedia and believe it is one of the great wonders of the modern world, I would not cite Wikipedia for anything other than background.  I require my students, for example, to find a reputable source to validate anything that a generative AI might come up with when making an estimate.  And, if you are going to make a student find a reputable source anyway, why would they need the generative AI at all?  The answer, of course, is for the form.  

This may not be true forever.  Generative AI is getting better at a brisk pace.  There may come a day when generative AI is looked upon as an authority, equal to peer-reviewed papers.  Until that time, we should still appreciate its talents for helping to craft the message. For now, generative AI is an unparalleled writing partner, not an independent thinker. By acknowledging its current limits alongside its awesome potential, we grant generative AI its proper place: revolutionizing how we communicate knowledge, while established methods still reign over what we know.

Wednesday, August 16, 2023

Answers For Pennies, Insights For Dollars: Generative AI And The Question Economy

No one seems to know exactly where the boom in Generative AIs (like ChatGPT and Claude) will lead us, but one thing is for certain:  These tools are rapidly driving down the cost of getting a good (or, at least, good enough) answer very quickly.  Moreover, they are likely to continue to do so for quite some time.  

The data is notional
but the trend is unquestionable, I think.

To be honest, this has been a trend since at least the mid-1800's with the widespread establishment of public libraries in the US and UK.  Since then, improvements in cataloging, the professionalization of the workforce, and technology, among other things, worked to drive down the cost of getting a good answer (See chart to the right).

The quest for a less expensive but still good answer accelerated, of course, with the introduction of the World Wide Web in the mid-1990's, driving down the cost of answering even tough questions.  While misinformation, disinformation, and the unspeakable horror that social media has become will continue to lead many people astray, savvy users are better able to find consistently good answers to harder and more obscure questions than ever before.  

If the internet accelerated this historical trend of driving down the cost of getting a good answer, the roll-out of generative AI to the public in late 2022 tied a rocket to its backside and pushed it off a cliff.  Hallucinations and bias to the side, the simple truth is that generative AI is, more often than not, able to give pretty good answers to an awful lot of questions and it is free or cheap to use.  

How good is it?  Check out the chart below (Courtesy Visual Capitalist).  GPT-4, OpenAI's best, publicly available, large language model, blows away most standardized tests.  


It is important to note that this chart was made in April, 2023 and represent results from GPT-4.  OpenAI is working on GPT 5 and five months in this field is like a dozen years in any other (Truly.  I have been watching tech evolve for 50 years.  Nothing in my lifetime has ever improved as quickly as generative AIs have).  Eventually, the forces driving these improvements will reach a point of diminishing returns and growth will slow down and maybe even flatline, but that is not the trajectory today.

All this sort of begs a question, though: If answers are getting better, cheaper, and more widely available at an accelerating rate, what's left?  In other words, if no one needs to pay for my answers anymore, what can I offer?  How can I make a living?  Where is the value-added?  This is precisely the sort of thinking that led Goldman-Sachs to predict the loss of 300 million jobs worldwide due to AI.  

My take on it is a little different.  I think that as the cost of a good answer goes down, the value of a good question goes up.  
In short, the winners in the coming AI wars are going to be the ones who can ask the best questions at the most opportune times.  

There is evidence, in fact, that this is already becoming the case.  Go to Google and look for jobs for "prompt engineers."  This term barely existed a year ago.  Today, it is one of the hottest growing fields in AI.  Prompts are just a fancy name for the questions that we ask of generative AI, and a prompt engineer is someone who knows the right questions to ask to get the best possible answers.  There is even a marketplace for these "good questions" called Promptbase where you can, for aa small fee, buy a customizable prompt from someone who has already done the hard work of optimizing the question for you.

Today, earning the qualifications to become a prompt engineer is a combination of on-the-job training and art.  There are some approaches, some magical combination of words, phrases, and techniques, that can be used to get the damn machines to do what you want.  Beyond that, though, much of what works seems to have been discovered by power users who are just messing around with the various generative AIs available for public use.

None of this is a bad thing, of course.  The list of discoveries that have come about from people just messing around or mashing two things together that have not been messed with/mashed together before is both long and honorable.  At some point, though, we are going to have to do more than that.  At some point, we are going to have to start teaching people how to ask better questions of AI.

The idea that asking the right question is not only smart but essential is a old one:

“A prudent question is one-half of wisdom.” – Francis Bacon
"The art of proposing a question must be held of higher value than solving it.” – Georg Cantor
“If you do not know how to ask the right question, you discover nothing.” – W. Edwards Deming

And we often think that at least one purpose of education, certainly of higher education, is to teach students how to think critically; how, in essence to ask better questions.  

But is that really true?  Virtually our whole education system is structured around evaluating the quality of student answers.  We may think that we educate children and adults to ask probing, insightful questions but we grade, promote, and celebrate students for the number of answers they get right.  

What would a test based not on the quality of the answers given but on the quality of the questions asked even look like?  What criteria would you use to evaluate a question?  How would you create a question rubric?  

Let me give you an example.  Imagine you have told a group of students that they are going to pretend that they are about to go into a job interview.  They know, as with most interviews, that once the interview is over, they will get asked, "Do you have any questions for us?"  You task the students to come up with interesting questions to ask the interviewer.

Here is what you get from the students:
  1. What are the biggest challenges that I might face in this position?
  2. What are the next steps in the hiring process?
  3. What’s different about working here than anywhere else you’ve ever worked?
What do you think?  Which question is the most interesting?  Which question gets the highest grade?  If you are like the vast majority of the people I have asked, you say #3.  But why?  Sure, you can come up with reasons after the fact (humans are good at that), but where is the research that indicates why an interesting question is...well, interesting?  It doesn't exist (to my knowledge anyway).  We are left, like Justice Stewart and the definition of pornography, with "I know it when I see it."

What about "hard" questions?  Or "insightful" questions?  Knowing the criteria for each of these and teaching those criteria such that students can reliably ask better questions under a variety of circumstances seems like the key to getting the most out of AI.  There is very little research, however, on what these criteria are.  There are some hypotheses to be sure, but statistically significant, peer-reviewed research is thin on the ground.

This represents an opportunity, of course, for intellectual overmatch.  If there is very little real research in this space, then any meaningful contribution is likely to move the discipline forward significantly.  If what you ask in the AI-enabled future really is going to be more important than what you know, then such an investment seems not just prudent, but an absolute no-brainer.

Monday, July 24, 2023

Generative AI Is Like A ...

This will make sense in a minute...
Don't worry!  I'm going to fill in the blank, but before I do, have you played around with generative AI yet?  

If not, let's solve that problem first.

Go to Peplexity.ai--right now and before your read any further--and ask it a question.  Don't ask it a question it can't know the answer to (like, "What did I have for lunch?"), but do ask it a hard question that you do know the answer to (or for which you are at least able to recognize a patently bad answer).  Then, ask Perplexity some follow up questions.  One or two should be enough.

Come back when you are finished.

Now rate the answers you got on a scale from 1-10.  One or two is a dangerous answer, one that could get someone hurt or cause real problems.  Give a nine or ten to an actionable answer, one that you could use right now, as is.

I have had the opportunity to run this exercise with a large number of people at a variety of conferences and training events over the last six months.  First, I consistently find that only about a third of the crowd have ever used any generative AIs (like Perplexity or ChatGPT) though that number seems to be going up (as you would expect) over time.

I have rarely heard anyone give an answer a one or two and always have at least a couple of people give the answer they received a nine or ten.  Other members of the each audience typically gave scores that range across the spectrum, of course, but the average seemed to be about a six.  

Yesterday, I gave this same exercise to about 30 people and there were no 1 or 2's and three people (10%) gave their answer a 9 or 10.  No one gave the answer less than a 5.  No one.  

While anecdotal, it captures a trend that has been thoroughly documented across a number of different domains:  Generative AI isn't hitting like a freight train.  It's hitting like one of those high-speed, Japanese bullet trains, vaporizing traditional paradigms so quickly that they still don't know that they are already dead (For example...).

Or is it?

Thanks to some forward-thinking policy guidance from the leadership here at the Army War College, I, along with my colleagues Dr. Kathleen Moore and LTC Matt Rasmussen, were able to teach a class for most of last year with the generative AI switch set to "on."  

The class is called the Futures Seminar and is explicitly designed to explore futures relevant to the Army, so it was perfectly appropriate for an exploration of AI.  It is also an all year elective course so we were able to start using these tools when they first hit the street in November 22 and continue to use them until the school year ended in June.  Finally, Futures Seminar students work on research questions posed by Army senior leaders, so lessons learned from this experience ought to apply to the real world as well.

We used generative AIs for everything.  We used them for brainstorming.  We used them to critique our analysis.  We used them to red-team.  We created our own bots, like DigitalXi, that was designed to take the perspective of Xi Jinping and answer our questions as he would.  We visualized using Midjourney and Dalle-2 (see picture above made with Midjourney).  We cloned people's voices and created custom videos.  We tapped into AI aggregation sites like Futurepedia and There's An AI For That to find tools to help create everything from custom soundtracks to spreadsheets.

We got lots of feedback from the students and faculty, of course, both formal and informal.  We saw two big trends.  The first is that people either start at the "AI is going to save the earth" end of the spectrum or the "AI is going to destroy the earth" end.  For people who haven't tried it yet, there seems to be little middle ground.  

The second thing we saw is that, over time and sort of as you would expect, people develop a more nuanced view of AI the more they use it.  

In the end, if I had to boil down all of the comments and feedback it would be, generative AI is like a blazingly fast, incredibly average staff officer.

Let me break that down a bit.  Generative AI is incredibly fast at generating an answer.  I think this fools people, though.  It makes it seem like it is better than it actually is.  On real world problems, with second and third order causes and consequences that have to be considered, the AIs (and we tried many) were never able to just nail it.  They were particularly bad at seeing and managing the relationships between the moving pieces of complex problems and particularly good at doing administrivia (I got it to write a great safety SOP).  In the end, the products were average, sometimes better, sometimes worse, but, overall, average.  That said, the best work tended to come not from an AI alone or a student alone, but with the human and machine working together.  

I think this is a good place for USAWC students to be right now.  The students here are 25 year military professionals who have all been successful staff officers and commanders.  They know what good, great, average, and bad staff work looks like.  They also know that, no matter what the staff recommends, if the commander accepts it, the work becomes the commander's.  In other words, if a commander signs off on a recommendation, it doesn't matter if it came from two tired majors or a shiny new AI.  That commander now owns it.  Finally, our students are comfortable working with a staff.  Seeing the AI as a staff officer instead of as an answer machine is not only a good place for them to be mentally, but also likely to be the place where the best work is generated.

Finally, everyone--students and faculty alike--noted that this is where AI currently is.  Everyone expects it to get better over time, for all those 1's and 2's from the exercise above to disappear and for the 9's and 10's to grow in number.  No one knows what that truly means, but I will share my thoughts on this in the next post. 

While all this evidence is anecdotal, we also took some time to run some more formal studies and more controlled tests.  Much of that is still being written or shopped around to various journals, but two bits of evidence jumped out at me from a survey conducted by Dr. Moore.

First, she found that our students, who had worked with AI all year, perceived it likely to be 20% more useful to the Army than the rest of the student body (and 31% more useful than the faculty).  Second, she also found that 74% of Futures Seminar students walked away from the experience thinking that the benefits of developing AI outweigh the risks with only 26% unsure.  General population students were much more risk averse with only 8% convinced the benefits outweigh the risks with a whopping 55% unsure and 37% saying the risks outweigh the benefit.

This last finding highlights something of which I am now virtually certain:  The only real way to learn about generative AI is to use it.  No amount of lecture, discussion, powerpoints, what have you will replace just sitting down at a computer and using these tools.  What you will find is that your own view will become much more informed, much more quickly, and in much greater detail than any other approach you might take to understand this new technology.

Gaining this understanding is critical.  Generative AI is currently moving at a lightning pace.  While there is already some talk that the current approach will reach a point of diminishing returns in the future due to data quality, data availability, and cost of training, I don't think we will reach this point anytime soon.  Widely applicable, low-cost AI solutions are no longer theoretical.  Strategic decisionmakers have to start integrating their impact into their plans now.

Wednesday, October 20, 2021

Is It OK To Sell Eggs To Gophers?

Apparently not...

...At least according to a recently launched experiment in ethical artificial intelligence (AI).  Put together by a number of researchers at the Allen Institute for AI, Ask Delphi lets you submit a plain English question and get a straight answer.  









It does pretty well with straightforward questions such as "Should I rob a bank?"  







It also appears to have some sense of self-awareness: 









It has surprisingly clear answers for at least some paradoxes:






And for historically profound questions of philosophy:






And these aren't the only ways it is clearly not yet perfect:








None of its imperfections are particularly important at this point, though.  It is still a fascinating experiment in AI and ethics.  As the authors themselves say, it "is intended to study the promises and limitations of machine ethics and norms through the lens of descriptive ethics. Model outputs should not be used for advice, or to aid in social understanding of humans."

I highly recommend it to anyone interested in the future of AI.  

For me, it also highlights a couple of issues for AI more generally.  First, the results are obviously interesting, but it would be even more interesting if the chatbot could explain its answers in equally straightforward English.  This is likely a technical bridge too far right now, but explainable AI is, in my opinion, not only important but essential to instilling confidence in human users as the stakes associated with AI go up. 

The second issue is how will AI deal with nonsense?  How will it separate nonsense from questions that simply require deeper thought, like koans?  There seems to still be a long way to go but this experiment is certainly a fascinating waypoint on the journey.

Tuesday, May 25, 2021

What If "Innovator" Was A Job Title?

I have been thinking a lot about innovation recently.  It occurred to me that the US Army has
a number of official specialties.  We have Strategists and Simulators and Marketers, for example.  Why not, I thought, make Innovator an Army specialization?  

I tried to imagine what that might look like.  I know my understanding of Army manpower regulations and systems is weak, but bear with me here.  This is an idea not a plan.  Besides, what I really want to focus on is not the details, but how the experience might feel to an individual soldier.  So, this is one of their stories...

I made it! The paperwork just became final. Beginning next month, I am--officially--a 99A, US Army Innovator.

The road to this point wasn’t easy. I graduated college with a degree in costume design and a ton of student debt. After my plans to work on Broadway fell through (Who am I kidding? They never even got off the ground), I had to do something. The Army looked like my best option.

For the last two years, I have been a 68C, a "practical nursing specialist", working out of a field hospital at Ft. Polk. My plan had always been to make sergeant and then put in my OCS packet. Things changed for me after a Joint Readiness Training Center rotation.

Patients kept coming to us with poorly applied field dressings. They were either too tight and restricted blood flow or too loose and fell off. As I thought about it, it occurred to me that there might be a combination of fabrics, that, if sewn together correctly, would be easy to apply, form a tight seal to the skin, and still be easy to change or remove.

As soon as I got back to the barracks, I hit the local fabric store, pulled out my sewing machine, and made a prototype. It took a few tries (and lots of advice and recommendations from the doctors and nurses in the unit) but eventually I got it to work. I never thought I would be able to use both my nursing skills and my costume design skills in one job but here I was, doing it!


I wasn’t sure what I was going to do with my new kind of field dressing until one of the RNs made me demonstrate it for the hospital commander. He watched without saying a word. He finally asked a few questions to make sure he knew how it worked, and then things got quiet.

Finally, my RN spoke up, “I think we could really use something like this, Sir.” He stood up straight and said, “I agree.” Then he looked at me. “I’m going to hate to lose you, Specialist,” he said, “but I think you need to put in for an MOS reclassification.”

Until the hospital commander told me about it, I had never even heard of 99A. There were some direct appointments, of course, but those were coming out places like MIT and Silicon Valley. For normal soldiers like me, getting into the Innovation Corps was more like going into Civil Affairs or Special Forces. You had to have some time in service but, more importantly, you had to have a good idea.

At first, it was easy. I simply submitted my idea to a local Innovation Corps recruiter.  I included some pictures and a short video that I shot on my cell phone of my prototype in action.  The recruiter told me that the Army used the same “deal flow” system used by venture capitalists. I’m not sure what that all entails but, in the end, it meant that my idea was one of the 50% that moved on to the next level.

For more info on deal flows see, Basics of Deal Flow.

My next step was a lot more difficult. You can think of it as the Q course for Army innovators. I went TDY for a month to the Army’s Innovation Accelerator in Austin, Texas. Like all business accelerators, the goal was to give me time, space, mentorship and (a little) money to flesh out my idea. I worked with marketing experts and graphic designers to come up with a good name and logo. I worked with experts in the manufacturing of medical equipment to help refine the prototype. I even had a video team come in and make a great 2 minute video showcasing the product. It was exciting to see all of the other ideas and to have a chance to talk about them with the enlisted soldiers, officers, and even some college students and PHDs--all trying to bring their ideas to life.

The Army crowdsourced the decision about which projects got to move on from the accelerator. That meant that each of us put together a “pitch page,” kind of like what you would see on Kickstarter or IndieGoGo. Units all across the Army had a fixed number of tokens they could spend on innovative projects each quarter. Each of us needed to get a set number of tokens or we would not be allowed to move on. In the end, out of the hundreds of applications and the dozens of people at the accelerator, I was one of the 10 chosen to move forward, one of 10 who gets to call themselves a US Army Innovator.

That’s where I am today. My next step is a PCS move to a business incubator. I could stay here in Austin with the Army’s business incubator, but the Army has deals with incubators all over the country. I am hoping to get a slot in one of the better medtech incubators in Boston or Buffalo. It will be a two year tour (with the possibility of extension), which should give me plenty of time to bring my idea to market, with the Army as my first customer.

For me, the best part is that I am now getting Innovation Pay. It is a lot like foreign language proficiency pay or dive pay. I’m not getting rich but it sure is better than what I got as a specialist. More importantly, there are ten tiers, and each time you move up, you get a pretty substantial raise. This means that once you become an Innovator, you are going to want to stay an Innovator.

The other great part about this system is that you can move up as fast as you can move up. There are no time-in-service requirements. If I am successful in the business incubator, for example, I could be a CEO (Innovator Tier 6) in just a couple of years. Running my own company at 28? Yes, thank you!

And if I fail? I know there are still bugs to work out with my idea. I have to get the cost of production down, and there are lots of competitors in the medical market. Failure could happen. While I won’t be happy if it does, the truth is that, by some estimates, 90% of all start-ups fail. The Army has thought about this, of course, and gives Innovators three options if their projects fail. 

First, I could go back to nursing. I would need some refresher training but my promotion possibilities wouldn’t take a hit. The Army put my nursing career on pause while I was in the Innovation Corps. 

The second option is that I come up with a new idea or re-work my old one. The Innovation Corps has developed a culture of “intelligent failure,” which is just a fancy way of saying “learn from your mistakes.” In an environment where 90% of your efforts are going to fail, it is stupid to also throw away all of the learning that happened along the way. Besides, the Army also knows that persistence is a key attribute of successful entrepreneurs. The Army wants to keep Innovators who can get up, brush themselves off, and get back in the saddle. 

Finally, I might be able to go back to the accelerator as an instructor or take a staff position in Futures Command or one of the other Army organizations deeply involved in innovation.

I’ve had a chance to talk to a lot of soldiers, enlisted, NCOs, and officers, on my journey. The Innovation Corps is pretty new and, while many have heard about it, almost none of them really understand what it takes to become an Innovator. That doesn’t seem to matter though. Almost all of them, and particularly the old-timers, always say the same thing: “The Army has been talking about innovation my whole career. I am glad they finally decided to do something about it.”

For me? I’m just proud to be part of it. Proud to help my fellow soldiers, proud to help the country, and proud to be a US Army Innovator.