Framing AI strategy

Zach Stein-Perlman, 6 February 2023

Technique is the exercise or venture of doing analysis to tell interventions to realize a specific objective. AI technique is technique from the angle that AI is vital, centered on interventions to make AI go higher. An analytic body is a conceptual orientation that makes salient some points of a problem, together with cues for what must be understood, learn how to method the problem, what your objectives and obligations are, what roles to see your self as having, what to concentrate to, and what to disregard.

This submit discusses ten technique frames, specializing in AI technique. Some frames are complete approaches to technique; some are elements of technique or prompts for enthusiastic about a facet of technique. This submit focuses on meta-level exploration of frames, however the second and final sections have some object-level ideas inside a body.

Sections are overlapping however impartial; give attention to sections that aren’t already in your toolbox of approaches to technique.

Epistemic standing: exploratory, brainstormy.

Make a plan

See Jade Leung’s Priorities in AGI governance analysis (2022) and How can we see the impression of AI technique analysis? (2019).

One output of technique is a plan describing related (sorts of) actors’ conduct. Extra usually, we are able to goal for a playbook– one thing like a operate from (units of observations about) world-states to plans. A plan is sweet insofar because it improves vital choices within the counterfactual the place you attempt to implement it, in expectation.

To make a plan or playbook, establish (sorts of) actors that could be affectable, then determine

what they may do,
what it could be good for them to do,
what their incentives are (if related), after which
learn how to trigger them to behave higher.

Additionally it is doable to give attention to choices slightly than actors: decide what choices you need to have an effect on (presumably as a result of they’re vital and affecting them appears tractable) and how one can have an effect on them.

For AI, related actors embrace AI labs, states (notably America), non-researching non-governmental organizations (notably standard-setters), compute suppliers, and the AI danger and EA communities.

Insofar as an agent (not essentially an actor that may take immediately vital actions) has distinctive skills and is more likely to attempt to execute good concepts you’ve got, it may be useful to give attention to what the agent can do or learn how to leverage the agent’s distinctive skills slightly than backchain from what can be good.

Affordances

As within the earlier part, a pure manner to enhance the longer term is to establish related actors, decide what it could be good for them to do, and trigger them to do these issues. “Affordances” in technique are “doable partial future actions that might be communicated to related actors, such that they’d take related actions.” The motivation for looking for and bettering affordances is that there in all probability exist actions that will be nice and related actors can be blissful to take, however that they wouldn’t devise or acknowledge by default. Discovering nice affordances is aided by a deep understanding of how an actor thinks and its incentives, in addition to a deep exterior understanding of the actor, to give attention to its blind spots and establish possible actions. Individually, the actor’s participation would generally be important.

Affordances are related not simply to cohesive actors but in addition to non-structured teams. For instance, for AI technique, discovering affordances for ML researchers (as people or for collective motion) might be beneficial. Maybe there additionally exist nice doable affordances that don’t rely a lot on the actor– usually useful actions that folks simply aren’t conscious of.

For AI, two related sorts of actors are states (notably America) and AI labs. One method to uncover affordances is to brainstorm the sorts of actions specific actors can take, then discover inventive new plans inside that record. Going much less meta, I made lists of the sorts of actions states and labs can take which may be strategically important, since such lists appear worthwhile and I haven’t seen something like them.

Sorts of issues states can do which may be strategically related (or penalties or traits of doable actions):

Regulate (and implement regulation of their jurisdiction and examine doable violations)
Expropriate property and nationalize firms (of their territory)
Carry out or fund analysis (notably together with by means of Manhattan/Apollo-style tasks)
Purchase capabilities (notably together with navy and cyber capabilities)
Assist specific individuals, firms, or states
Disrupt or assault specific individuals, firms, or states (exterior their territory)
Have an effect on what different actors consider on the article degree
- Share info
- Make info salient in a manner that predictably impacts beliefs
- Specific attitudes that others will comply with
Negotiate with different actors, or have an effect on different actors’ incentives or meta-level beliefs
Make agreements with different actors (notably together with contracts and treaties)
Set up requirements, norms, or rules
Make unilateral declarations (as a world authorized dedication) [less important]

Sorts of issues AI labs can do—or select to not do—which may be strategically related (or penalties or traits of doable actions):

Deploy an AI system
Pursue capabilities
- Pursue dangerous (and kind of alignable techniques) techniques
- Pursue techniques that allow dangerous (and kind of alignable) techniques
- Pursue weak AI that’s principally orthogonal to progress in dangerous stuff for a particular (strategically important) activity or objective
  - This might allow or abate catastrophic dangers in addition to unaligned AI
Do alignment (and associated) analysis (or: lower the alignment tax by doing technical analysis)
Advance world capabilities
- Publish capabilities analysis
- Trigger funding or spending in large AI tasks to extend
Advance alignment (or: lower the alignment tax) in methods apart from doing technical analysis
- Assist and coordinate with exterior alignment researchers
Try to align a specific system (or: attempt to pay the alignment tax)
Work together with different labs
- Coordinate with different labs (notably together with coordinating to keep away from dangerous techniques)
  - Make themselves clear to one another
  - Make themselves clear to an exterior auditor
  - Merge
  - Successfully decide to share upsides
  - Successfully decide to cease and help
- Have an effect on what different labs consider on the article degree (about AI capabilities or danger on the whole, or relating to specific memes)
- Negotiate with different labs, or have an effect on different labs’ incentives or meta-level beliefs
Have an effect on public opinion, media, and politics
- Publish analysis
- Make demos or public statements
- Launch or deploy AI techniques
Enhance their tradition or operational adequacy
- Enhance operational safety
- Have an effect on attitudes of efficient management
- Have an effect on attitudes of researchers
- Make a plan for alignment (e.g., OpenAI’s); share it; replace and enhance it; and coordinate with capabilities researchers, alignment researchers, or different labs if related
- Make a plan for what to do with highly effective AI (e.g., CEV or some specification of lengthy reflection), share it, replace and enhance it, and coordinate with different actors if related
- Enhance their capability to make themselves (selectively) clear
Attempt to higher perceive the longer term, the strategic panorama, dangers, and doable actions
Purchase assets
- E.g., cash, {hardware}, expertise, affect over states, standing/status/belief
- Seize scarce assets
  - E.g., language information from language mannequin customers
Have an effect on different actors’ assets
- Have an effect on the movement of expertise between labs or between tasks
Plan, execute, or take part in pivotal acts or processes

(These lists additionally exist on the AI Impacts wiki, the place they could be improved sooner or later: Affordances for states and Affordances for AI labs. These lists are written from an alignment-focused and misuse-aware perspective, however prosaic dangers could also be vital too.)

Perhaps making or studying lists like these can assist you discover good techniques. However modern affordances are essentially not issues which can be already a part of an actor’s conduct.

Perhaps making lists of related issues related actors have executed previously would illustrate doable actions, construct instinct, or assist communication.

This body looks like a probably helpful complement to the usual method backchain from objectives to actions of related actors. And it appears good to know actions that needs to be objects on lists like these—each like understanding these list-items properly and increasing or reframing these lists—so you possibly can discover alternatives.

Intermediate objectives

No nice sources are public, however illustrating this body see “Catalysts for achievement” and “State of affairs variables” in Marius Hobhannon et al.’s What success seems to be like (2022). On objectives for AI labs, see Holden Karnofsky’s Nearcast-based “deployment drawback” evaluation (2022).

An intermediate/instrumental objective is a objective that’s beneficial as a result of it promotes a number of ultimate/terminal objectives. (“Aim” sounds discrete and binary, like “there exists a treaty to forestall dangerous AI growth,” however typically needs to be steady, like “achieve assets and affect.”) Intermediate objectives are helpful as a result of we regularly want extra particular and actionable objectives than “make the longer term go higher” or “make AI go higher.”

Figuring out what particularly can be good for individuals to do is a bottleneck on individuals doing helpful issues. If the AI technique group had higher strategic readability, by way of data concerning the future and notably intermediate objectives, it may higher make the most of individuals’s labor, affect, and assets. Maybe an overlapping technique framing is discovering or unlocking efficient alternatives to spend cash. See Luke Muehlhauser’s A private tackle longtermist AI governance (2021).

Additionally it is generally helpful to think about objectives about specific actors.

Risk modeling

Illustrating risk modeling for the technical part of AI misalignment, see the DeepMind security crew’s Risk Mannequin Literature Overview and Clarifying AI X-risk (2022), Sam Clarke and Sammy Martin’s Distinguishing AI takeover eventualities (2021), and GovAI’s Survey on AI existential danger eventualities (2021).

The objective of risk modeling is deeply understanding a number of dangers for the aim of informing interventions. An incredible causal mannequin of a risk (or class of doable failures) can allow you to establish factors of intervention and decide what countering the risk would require.

A associated venture includes assessing all threats (in a sure class) slightly than a specific one, to assist account for and prioritize between totally different threats.

Technical AI security analysis informs AI technique by means of risk modeling. A causal mannequin of (a part of) AI danger can generate a mannequin of AI danger abstracted for technique, with related options made salient and irrelevant particulars black-boxed. This abstracted mannequin offers us info together with mandatory and ample circumstances or intermediate objectives for averting the related threats. These in flip can inform affordances, techniques, insurance policies, plans, influence-seeking, and extra.

Theories of victory

I’m not conscious of nice sources, however illustrating this body see Marius Hobhannon et al.’s What success seems to be like (2022).

Contemplating theories of victory is one other pure body for technique: contemplate eventualities the place the longer term goes properly, then discover interventions to nudge our world towards these worlds. (Insofar because it’s not clear what the longer term going properly means, this method additionally includes clarifying that.) To seek out interventions to make our world like a victorious state of affairs, I generally attempt to discover mandatory and ample circumstances for the victory-making facet of that state of affairs, then contemplate learn how to trigger these circumstances to carry.

Nice threat-model evaluation will be a superb enter to theory-of-victory evaluation, to make clear the threats and what their options should seem like. And it might be helpful to think about eventualities during which the longer term goes properly and eventualities the place it doesn’t, then study the variations between these worlds.

Techniques and coverage growth

Accumulating progress on doable authorities insurance policies, see GovAI’s AI Coverage Levers (2021) and GCRI’s Coverage concepts database.

Given a mannequin of the world and high-level objectives, we should determine learn how to obtain these objectives within the messy actual world. For a objective, what would trigger success, which of these potentialities are tractable, and the way may they develop into extra more likely to happen? For a objective, what are mandatory and ample circumstances for achievement and the way may these happen in the actual world?

Memes & frames

I’m not conscious of nice sources on memes & frames in technique, however see Jade Leung’s How can we see the impression of AI technique analysis? (2019). See additionally the tutorial literature on framing, e.g. Robert Entman’s Framing (1993).

(“Frames” on this context refers back to the lenses by means of which individuals interpret the world, not the analytic, research-y frames mentioned on this submit.)

If sure actors held sure attitudes, they’d make higher choices. One method to have an effect on attitudes is to unfold memes. A meme might be specific settlement with a particular proposition; the perspective that sure organizations, tasks, or objectives are (seen as) shameful; the perspective that sure concepts are smart and respectable or not; or merely an inclination to pay extra consideration to one thing. The objective of meme analysis is discovering good memes—memes that will enhance choices if extensively accepted (or accepted by a specific set of actors) and are tractable to unfold—and determining learn how to unfold them. Meme analysis is complemented by work truly inflicting these memes to unfold.

For instance, potential good memes in AI security embrace issues like AI is highly effective however not strong, and specifically [specification gaming or Goodhart or distributional shift or adversarial attack] is an enormous deal. Maybe misalignment as catastrophic accidents is simpler to know than misalignment as powerseeking brokers, or vice versa. And maybe misuse danger is straightforward to know and unlikely to be catastrophically misunderstood, however much less valuable-if-spread.

A body tells individuals what to note and learn how to make sense of a facet of the world. Frames will be internalized by an individual or contained in a textual content. Frames for AI would possibly embrace frames associated to consciousness, Silicon Valley, AI racism, nationwide safety, or particular sorts of purposes corresponding to chatbots or weapons.

Larger-level analysis may be beneficial. This could contain matters like learn how to talk concepts about AI security and even learn how to talk concepts and how teams kind beliefs.

This method to technique may additionally contain researching learn how to stifle dangerous memes, like maybe “highly effective actors are incentivized to race for extremely succesful AI” or “we want a Manhattan Mission for AI.”

Exploration, world-modeling, and forecasting

Typically technique enormously relies on specific questions concerning the world and the longer term.

Extra usually, you possibly can moderately count on that growing readability about important-seeming points of the world and the longer term will inform technique and interventions, even with out enthusiastic about particular objectives, actors, or interventions. For AI technique, exploration consists of central questions on the way forward for AI and related actors, understanding the consequences of doable actions, and maybe additionally matters like choice principle, acausal commerce, digital minds, and anthropics.

Setting up a map is a part of many various approaches to technique. This roughly includes understanding the panorama and discovering analytically helpful ideas, like reframing victory means inflicting AI techniques to be aligned to it’s mandatory and ample to trigger the alignment tax to be paid, so it’s mandatory and ample to scale back the alignment tax and improve the amount-of-tax-that-would-be-paid such that the latter is larger.

One exploratory, world-model-y objective is a high-level understanding of the strategic panorama. One doable method to this objective is making a map of related doable occasions, phenomena, actions, propositions, uncertainties, variables, and/or analytic nodes.

Nearcasting

Discussing nearcasting, see Holden Karnofsky’s AI technique nearcasting (2022). Illustrating nearcasting, see Karnofsky’s Nearcast-based “deployment drawback” evaluation (2022).

Holden Karnofsky defines “AI technique nearcasting” as

making an attempt to reply key strategic questions on transformative AI, underneath the idea that key occasions (e.g., the event of transformative AI) will occur in a world that’s in any other case comparatively much like at the moment’s. One (however not the one) model of this assumption can be “Transformative AI will probably be developed quickly, utilizing strategies like what AI labs give attention to at the moment.”

After I take into consideration AI technique nearcasting, I ask:

What would a close to future the place highly effective AI might be developed seem like?
On this doable world, what objectives ought to we’ve got?
On this doable world, what vital actions may related actors take?
- And what info concerning the world make these actions doable? (For instance, some actions would require {that a} lab has sure AI capabilities, or most individuals consider a sure factor about AI capabilities, or all main labs consider in AI danger.)
On this doable world, what interventions can be found?
Relative to this doable world, how ought to we count on the actual world to be totally different?

And the way do these variations have an effect on the objectives we should always have, and the interventions which can be out there to us?

Nearcasting appears to be a great tool for

predicting related occasions concretely and
forcing you to note the way you assume the world will probably be totally different sooner or later and the way that issues.

Leverage

I’m not conscious of different public writeups on leverage. See additionally Daniel Kokotajlo’s What concerns affect whether or not I’ve extra affect over brief or lengthy timelines? (2020). Associated idea: crunch time.

When doing technique and planning interventions, what do you have to give attention to?

A significant subquestion is: how do you have to prioritize focus between doable worlds? Ideally you’ll prioritize engaged on the worlds that engaged on has highest anticipated worth, or one thing like the worlds which have the best product of likelihood and the way a lot better they’d go in the event you labored on them. However how are you going to guess which worlds are high-leverage so that you can work on? There are numerous causes to prioritize sure doable worlds, each for reasoning about technique and for evaluating doable interventions. For instance, it appears higher-leverage to work on making AI go properly conditional on human-level AI showing in 2050 than in 3000: the previous is extra foreseeable, extra affectable, and extra uncared for.

We at the moment lack a very good account of leverage, so (going much less meta) I’ll start one for AI technique right here. Given a baseline of weighting doable worlds by their likelihood, all else equal, it is best to usually:

Upweight worlds that you’ve extra management over and you could higher plan for
- Upweight worlds with short-ish timelines (since others will exert extra affect over AI in long-timelines worlds, and since we’ve got extra readability concerning the nearer future, and since we are able to revise methods in long-timelines worlds)
- Have in mind future technique analysis
  - For instance, in the event you give attention to the world in 2030 (or assume that human-level AI is developed in 2030) you will be deferring, not neglecting, some work on 2040
  - For instance, in the event you give attention to worlds during which vital occasions occur with out a lot advance warning or clearsightedness, you will be deferring, not neglecting, some work on worlds during which vital occasions occur foreseeably
- Deal with what you possibly can higher plan for and affect; for AI, maybe this implies:
  - Quick timelines
  - The deep studying paradigm continues
  - Highly effective AI is resource-intensive
  - Perhaps some propositions about danger consciousness, warning pictures, and world-craziness
- Upweight worlds the place the likelihood of victory is comparatively near 50%
- Upweight extra uncared for worlds (assume on the margin)
Upweight short-timelines worlds insofar as there’s extra non-AI existential danger in long-timelines worlds
Upweight evaluation that higher generalizes to or improves different worlds
Discover the likelihood that you just dwell in a simulation (if that’s decision-relevant; sadly, the sensible implications of residing in a simulation are at the moment unclear)
Upweight worlds that you’ve higher private match for analyzing
- Upweight worlds the place you’ve got extra affect, if related
Take into account uncomfortable side effects of doing technique, together with what you achieve data about, testing match, and gaining credible indicators of match

In apply, I tentatively assume the most important (analytically helpful) concerns for weighting worlds past likelihood are usually:

Quick timelines
1. Extra foreseeable
2. Extra affectable
3. Extra uncared for (by the AI technique group)
  1. Future individuals can work on the additional future
    1. The AI technique subject is more likely to be greater sooner or later
4. Much less planning or affect exerted from exterior the AI technique group
Quick takeoff
1. Shorter, much less foreseeable a sure time prematurely, and fewer salient to the world prematurely
  1. Extra uncared for by the AI technique group; the group would have an extended clear-sighted interval to work on sluggish takeoff
  2. Much less planning or affect exerted from exterior the AI technique group

(However there are presumably diminishing returns to specializing in specific worlds, at the least on the group degree, so the group ought to diversify the worlds it analyzes.) And I’m most confused about

Upweighting worlds the place likelihood of victory is nearer to 50% (I’m confused about what the likelihood of victory is in numerous doable worlds),
How leverage pertains to variables like complete affect exerted to have an effect on AI (the remainder of the world exerting affect means that you’ve much less relative affect insofar as you’re pulling the rope alongside related axes, however some interventions are amplified by one thing like higher consideration on AI) (and associated variables like consideration on AI and normal craziness because of AI), and
The likelihood and implications of residing in a simulation.

A background assumption or approximation on this part is that you just allocate analysis towards a world and the analysis is efficient simply if that world obtains. This assumption is considerably crude: the impression of most analysis isn’t so binary, being absolutely efficient in some doable futures and completely ineffective in the remaining. And considering by way of affect over a world is crude: affect relies on the particular person and on the intervention. However, reasoning about leverage by way of worlds to allocate analysis towards would possibly generally be helpful for prioritization. And we would uncover a greater account of leverage.

Leverage concerns ought to embrace not simply prioritizing between doable worlds but in addition prioritizing inside a world. For instance, it appears high-leverage to give attention to vital actors’ blind spots and on sure vital choices or “crunchy” intervals. And for AI technique, it could be high-leverage to give attention to the primary few deployments of highly effective AI techniques.

Technique work is complemented by

truly executing interventions, particularly inflicting actors to make higher choices,
gaining assets to higher execute interventions and enhance technique, and
field-building to higher execute interventions and enhance technique.

A person’s technique work is complemented by informing the related group of their findings (e.g., for AI technique, the AI technique group).

On this submit, I don’t attempt to make an ontology of AI technique frames, or do comparative evaluation of frames, or argue concerning the AI technique group’s prioritization between frames. However these all appear to be affordable issues for somebody to do.

Associated sources are linked above as related; see additionally Sam Clarke’s The longtermist AI governance panorama (2022), Allan Dafoe’s AI Governance: Alternative and Principle of Affect (2020), and Matthijs Maas’s Strategic Views on Lengthy-term AI Governance (2022).

If I wrote a submit on “Framing AI governance,” it could considerably overlap with this record, and it could considerably draw on The longtermist AI governance panorama. See additionally Allan Dafoe’s AI Governance: A Analysis Agenda (2018) and hanadulset and Caroline Jeanmaire’s A Map to Navigate AI Governance (2022). I don’t know whether or not an identical “Framing technical AI security” would make sense; in that case, I’d be enthusiastic about such a submit.

Many due to Alex Grey. Thanks additionally to Linch Zhang for dialogue of leverage and to Katja Grace, Eli Lifland, Rick Korzekwa, and Jeffrey Heninger for feedback on a draft.

Source link

Make a plan

Affordances

Intermediate objectives

Risk modeling

Theories of victory

Techniques and coverage growth

Memes & frames

Exploration, world-modeling, and forecasting

Nearcasting

Leverage

Popular Post

The Best AI-Powered SEO Content Software to Improve Your Rankings

Debunking AI & RPA Myths in Insurance

Neuralink Rival’s Biohybrid Implant Connects to the Brain With Living Neurons

AI Breakthroughs in Endoscopy – Unite.AI

The Tech World Is ‘Disrupting’ Book Publishing. But Do We Want Effortless Art?

Subscribe

Make a plan

Affordances

Intermediate objectives

Risk modeling

Theories of victory

Techniques and coverage growth

Memes & frames

Exploration, world-modeling, and forecasting

Nearcasting

Leverage

You may also like

Popular Post

Subscribe