人格盆地 Personality Basins

Posted on 
发布于 2024 年 6 月 4 日 by near

Personality basins are a mental model that I use to reason about humans within their environment: from modelling why people are they way they are, how they change over time, how mental illnesses and addiction function along with how we should look for their cures, and how the attention economy optimizes itself to consume all of your free time.
人格盆地是一个心理模型,我用来分析人类在其环境中的行为:解释人们为何成为现在的样子,他们如何随时间变化,精神疾病和成瘾如何发挥作用,以及我们应如何寻找治疗方法,以及注意力经济如何优化自身,消耗你的所有空闲时间。这样的表述更加自然,易于理解。

What is personality?什么是人格?——这是一个关于人格定义的问题

Note: This post contains many analogies to concepts from deep learning. Please do not interpret these comparisons too literally!
注意:本文中使用了大量与深度学习概念的类比。请勿对这些类比进行过于字面的理解!

Your personality is formed by a process conceptually similar to RLHF. You are first born with a set of traits in a given environment. After this, you perform many interactions with your environment. If an interaction goes well, you’re likely to do it more often, and if it goes poorly, you’ll probably do less of it.
你的个性形成过程与强化学习与人类反馈的融合(RLHF)类似。你最初在一个特定环境中出生,并带有一定的特质。随后,你与环境进行了大量互动。如果互动结果良好,你可能会更频繁地重复这一行为;反之,如果互动结果不佳,你可能就会减少这种行为。

See the learning agent? That’s You!
看到学习代理了吗?那不就是你自己吗?
If your interaction with this post goes well, you’re likely to read more of them later.
如果您与这篇帖子的互动愉快,您很可能之后还会继续阅读。

If you were born tall and with a commanding voice you might find that you get what you want by confidently demanding it, and this will help to result in a confident personality. If you attempt this strategy as someone born small with a soft voice, it will probably have weaker results and encourage you to try something else out instead.
如果你天生高大嗓门洪亮,你可能会发现,自信地提出要求就能得到你想要的东西,这有助于培养自信的性格。但如果你天生矮小嗓音柔和,这种策略可能效果不佳,你可能会尝试其他方法。

Genetics has a large influence on most traits including personality. This topic is outside of the scope of this post: it is best to think of this post as providing some scaffolding for the question of “why is this person X, when they could have instead been Y if they had been in a different environment” or “what helps to explain the differences in outcome between two genetically identical people” (see also: niche construction and gene-environment correlation).
遗传对大多数特征,包括性格,都有显著影响。本文不涉及这一话题:本文旨在为探讨“为何某人成为 X,而非在另一种环境中成为 Y”或“如何解释基因完全相同的人之间结果差异”等问题提供一些基本思路(亦见:生态位构建与基因-环境关联)。

Periods of high social and environmental entropy during adolescence are the most formative because you will learn the most information about which actions perform well in your environment and which don’t (of course, our meta-learning algorithm knows this, and this is why you have higher neuroplasticity and thus a higher learning rate and more energy during this period. It’s time to learn how to succeed in your newfound environment!)
青少年时期,社会和环境混乱的高峰期最具塑造性,因为你会学到哪些行为在你的环境中有效,哪些无效(当然,我们的元学习算法已经知道这一点,这也是为什么在这个时期你会有更高的神经可塑性、更快的学习速度和更多的能量。现在是时候学习如何在你的新环境中取得成功了!)

Your personality basin你的个性源泉

As you go about your life, you will continue to modify your personality in response to your environment, and eventually you will end up in something that resembles a basin. Maybe you were born tall and attractive and then this led you to engage in a lot of athletic activities and socialization, and at the end of all of the positive feedback you have ended up with a jock personality that goes on to become a professional football player.
在你的人生旅程中,你会不断适应环境,调整自己的个性,最终可能变成一个类似盆的形态。或许你天生高挑俊朗,这促使你投身于众多体育活动和社会交往,在不断的正面反馈下,你逐渐形成了运动员特质,并最终成为了一名职业足球运动员。

This is a landscape of personalities. The black line is your personality over time, and the last point is the person you currently are. Just like in machine learning, the way that you’ve progressed as a person has been by trying out many things and then doing more of the things that worked well.
这是性格的画卷。黑色线条描绘了你随时间变化的性格轨迹,最后一个点代表你现在的自我。正如在机器学习过程中,你作为个体的成长是通过不断尝试和坚持那些成效显著的行为实现的。

If instead you grew up scrawny yet intelligent you might have found things go well for you when you adopt a more quiet persona and focus on solving technical problems in programming or mathematics, perhaps eventually leading to a career as a software engineer or academic. Just like training a model in machine learning, the general gist is that you will try out a lot of things and then do more of the things that went well.
如果你长得瘦弱却聪明,那么变得安静,专注于解决编程或数学中的技术问题,可能会让你过得很好,也许最终能成为一名软件工程师或学者。这就像训练机器学习模型一样,总的来说,你会尝试很多方法,然后多做那些成功的事情。

The above image is of a loss landscape in machine learning. Since we are discussing personality, all of the points on the landscape represent different personalities you could have, with the lower points being personalities which are more successful. The personality basin that you find yourself in solidifies over time as you find out who you are and choose your friend group, career path, social and aesthetic preferences, and so on.
上述图像展示的是机器学习中的损失景观。讨论个性时,景观上的每一个点都代表一种可能的个性,其中位置较低的点代表更成功的个性。随着时间的推移,当你逐渐了解自己,并选择朋友、职业道路、社会和审美偏好等,你所在的个性盆地便会逐渐稳固。

Most personality changes are unconscious
大多数性格的改变都是无意识的

Most of your movement within personality-space happens outside of your conscious awareness. Although there are many times in life you’ll consciously decide to act in a certain way, this is the exception, not the norm. Your brain is always making millions of gradient updates a day based on what is and isn’t going well and often the most you can do is try to be as observant as possible. This is why techniques like nonviolent communicationdialectical behavior therapy, and mindfulness have observation and introspection as a core facet, because it’s something that you have to consciously practice to become good at rather than something you’re born with.
你在性格空间中的大部分移动都发生在你的意识之外。尽管生活中有很多时候你会主动决定如何行动,但这只是少数,不是常态。你的大脑每天都在根据事情是否顺利进行数百万次的微调,而你所能做的最多就是尽量保持警觉。正因如此,非暴力沟通、辩证行为疗法和正念等技巧将观察和内省作为核心,因为这需要你主动练习才能变得擅长,而不是天生的能力。

Most addictive behaviors start without us noticing what is happening until we are sufficiently addicted such that the habit is hard to break. Relatedly, if you introspect on many seemingly-innate preferences you will often notice some of the environmental and social gradients that have helped shape them. An interesting thought experiment you can perform on yourself is to pick a random personality trait that you have and try to answer the questions “why am I like this? could I imagine a version of myself that is not like this, and if so, what happened differently to them?”
最上瘾的行为往往在我们没有意识到发生什么的时候就开始了,直到我们足够上瘾,以至于很难戒掉。相关地,如果你反思许多看似天生的偏好,你通常会注意到一些帮助塑造它们的内外部环境和社会因素。你可以对自己进行一个有趣的思维实验:随机选择一个你拥有的性格特征,并尝试思考“我为什么会是这个样子?我能想象出一个完全不同的我吗?如果可以,那又是发生了什么不同的事情?”这样的翻译更加自然、易懂。

Many people think their music and fashion preferences are innate to them and are solely based off of how their favorite music sounds and their favorite outfits look. But if their most hated political party (or often in the case of adolescents, their parents) adopted the same aesthetic preferences, you can imagine they might start to literally like them less!
人们常常觉得自己的音乐和时尚品味是与生俱来的,完全取决于他们喜欢的音乐和服装。然而,如果他们最不喜欢的政党(或青少年通常情况下是他们的父母)采纳了同样的审美,他们可能会开始不那么喜欢这些了!

Your conscious experience of a stimuli is not dictated by a single-variable function f(stimuli), but rather f(stimuli, personality, environment), for broad definitions of ‘personality’ and ‘environment’. If you have a favorite song that your friend thinks sounds terrible, this is because they are literally experiencing it differently from you due to the latter two variables given to this function. They don’t think the thing that you hear sounds terrible, they think the thing that they hear sounds terrible, and it is probably very dissimilar from what you hear. The average conscious experiences of most people are likely wildly different from one another (see also: What Universal Human Experiences Are You Missing Without Realizing It). For more thoughts on the signaling, environmental, and self-deceptive aspects here I’d suggest reading about signaling theory and checking out The Elephant in the Brain by Robin Hanson and Kevin Simler.
你的意识体验并非由单一变量函数 f(stimuli)决定,而是由 f(stimuli, 个性, 环境)决定,这里的“个性”和“环境”有广泛的定义。如果你有一首你朋友认为很糟糕的歌,那是因为他们在很大程度上由于后两个变量而与你体验不同。他们并不是认为你听到的东西很糟糕,而是认为他们自己听到的东西很糟糕,而且这很可能与你听到的完全不同。大多数人的平均意识体验可能彼此之间差异很大(参见:你未意识到却错过的普遍人类体验有哪些)。 关于这里的信号、环境和自我欺骗方面,我建议阅读信号理论的相关书籍,并阅读罗宾·汉森与凯文·西姆勒合著的《大脑中的大象》以获取更多思考。

How do you know if you’re in the right basin?
你怎么知道你所在的流域是否正确?

If you’re reading this you probably have a vague idea of what type of personality basin you’re currently in which you can recall by asking yourself the question “What type of person am I?” But an important question remains: how can you find out if this is the right basin to be in?
如果你正在阅读这篇文章,你可能对目前所处的性格盆地有一个大致的了解,你可以通过自问“我是哪种类型的人?”来回忆。但有一个关键问题:你该如何判断自己是否真的处于合适的盆地中呢?

A simple answer would be that you could try out other basins to see how they feel. Maybe you’re having a great life as a devops programmer, but you could try to become an artist or a woodworker or a stay-at home parent and see how that fares for you.
简单来说,你可以尝试不同的领域,感受一下。或许你现在作为 DevOps 程序员的生活很充实,但不妨试试成为艺术家、木工或全职妈妈,看看哪种生活更适合你。

The reason why this is hard is that the optimal personality for this basin is not immediately accessible to you – to truly test optimality you will need to go through a full RLHF process. If you want to know how good of a life you’d have as a professional pianist, you will have to practice the instrument for a decade to find out.
这之所以难,是因为你无法立即获得这个盆地最理想的人格特质——要真正检验最优性,你需要完成整个 RLHF 过程。想要知道成为一名专业钢琴家的生活会怎样,你得练习乐器十年才能知晓。

You may wonder if you could simply try your hand at the piano for a month or two and see how it goes, and of course you can do this too. Your time (and your meta-learning algorithm’s number of epochs and learning rate) is limited, and it’s reasonable to make the trade-off of sacrificing depth-first search in favor of more breadth-first search.
你可能会想,试着弹弹钢琴看看一个月或两个月的效果如何,这当然是可以的。你的时间有限,元学习算法的 epoch 数量和学习率也是有限的,所以牺牲深度优先搜索,转而进行更广泛的广度优先搜索是合理的。

As you progress in life, you will usually perform less exploration for new personalities and more exploiting with your developed personality
随着人生不断前进,你探索新个性的次数会逐渐减少,转而更多地运用你已形成的个性特点

Usually this breadth-first search of trying out many different and creative strategies for life (prioritizing exploration over exploitation) automatically happens during your adolescence, but one of the magic things about the modern world is that there are so many societies, cultures, countries, and fields of work one can move into, and for each different environment could exist a slightly-different-you which finds their own distinct personality that maximizes success. Had you been born as a hunter-gatherer or within the Roman Empire or in ancient China, you’d probably have ended up quite different as a person. Similarly, if you decide to move countries or communities or careers, the optimal-you-for-your-environment will change a lot too.
通常,这种尝试多种不同和创造性的生活策略(优先探索而非利用)的广度优先搜索会在青春期自动进行。现代世界的奇妙之处在于,有众多社会、文化、国家和工作领域可供选择。每个不同的环境都可能塑造出一个略微不同的你,他们能找到适合自己的独特个性,从而实现成功最大化。如果你出生在狩猎采集时代,或是罗马帝国或古代中国,你的人生轨迹可能截然不同。同样,如果你选择迁移到其他国家、社区或职业,最适合你环境的“最佳你”也会发生很大变化。

Personality-space is adversarial
人格空间具有对抗性

One interesting thing to note about personality-space is that it is adversarial. Rather than a static training set to iterate through, your training data consists of other RL agents, many of which are other people, and all of whom want different things from you.
关于人格空间,值得注意的是它具有对抗性。你的训练数据并非静态的训练集,而是由其他强化学习智能体构成,其中许多是其他人,他们对你有着不同的期望。

This is what leads to the concept of Personality Capture. Personality capture is when your environment RLHFs you into becoming a personality that benefits the agents around you rather than yourself.
这就是“人格捕捉”概念产生的缘由。所谓人格捕捉,就是你的环境 RLHFs 让你成为一个对周围代理有利的性格,而不是仅仅对自己有利。这样的翻译更加自然、易懂。

If a school bully threatens to hurt you unless you do their homework for them, they are attempting to modify your RLHF process so that it results in an agent which is beneficial to them, hopefully resulting in someone who will always give in to their demands.
如果一个校园霸凌者威胁要伤害你,除非你帮他们做作业,那么他们实际上是在试图操纵你的 RLHF 过程,使其结果产生对他们有利的代理,希望最终能找到一个总是满足他们要求的人。

Those familiar with high school psychology will find high similarity with this concept and that of classical and operant conditioning as well as concept of a Skinner box. The attempted addition to these concepts here is that of modelling the personality as a reinforcement learning process and changes in personality as gradient updates, which then allow us to view personality-space as a high-dimensional area which will give us some interesting tools to think with. As the saying goes, all models are wrong, but some are useful.
熟悉高中心理学的读者会发现这个概念与经典和操作性条件反射以及斯金纳箱的概念高度相似。在这里,我们尝试将这些概念与人格建模为强化学习过程相结合,将人格的变化视为梯度更新。这样一来,我们就可以将人格空间视为一个高维区域,这将为我们的思考提供一些有趣的工具。正如人们常说的,所有的模型都有其局限性,但其中一些是有价值的。

Luckily for humans there exist many symbiotic equilibria where multiple parties can find mutually-beneficial feedback loops within the epochs of personality-space. Parent/child relationships, marriages, and best friends are often good examples of such a situation.
幸运的是,人类社会中存在着许多共生平衡,多种主体可以在个性空间的各个时代中找到互惠的反馈循环。父母与子女的关系、婚姻和挚友便是这种情形的常见例子。

Personality Capture性格捕捉

It’s easy to become susceptible to various forms of personality capture when your environment changes. When asked why he isn’t on Twitter, Dario Amodei, CEO of Anthropic, responds to Dwarkesh Patel with:
当环境发生变化时,人们很容易受到各种性格吸引的影响。当被问及为何不上推特时,Anthropic 公司首席执行官达里奥·阿莫迪回应德瓦克什·帕特尔说:

I’ve just seen cases with a number of people I’ve worked with, where attaching your incentives very strongly to the approval or cheering of a crowd can destroy your mind, and in some cases, it can destroy your soul.
我最近看到一些案例,我认识的人中就有,将激励与群众的赞同或欢呼紧密绑定,可能会摧毁你的心智,甚至有可能摧毁你的灵魂。这样的说法更符合中文的表达习惯。

I’ve deliberately tried to be a little bit low profile because I want to defend my ability to think about things intellectually in a way that’s different from other people and isn’t tinged by the approval of other people.
我有意降低自己的曝光度,因为我希望以与众不同的方式思考问题,不受他人赞同的影响。

Illustration of a monkey being personality captured by excessive twitter usage
被过度使用推特所捕捉到的猴子性格插图

Most people around you want to personality-capture you in some way. Your boss might want you to work harder, your children might want you to give them more attention, and political parties want you to vote for them. Some of these things will be beneficial for you as well, but it’s easy to get trapped into bad habits when your adversary is sufficiently motivated and intelligent (e.g. social media feeds).
你周围的大多数人都会试图以某种方式捕捉你的个性。你的上司可能希望你更加卖力工作,你的孩子可能希望你能多花些时间陪伴他们,政党也希望你能投他们一票。这些事情对你可能有利,但若对手足够有动力且聪明(比如社交媒体),你很容易陷入不良习惯。

One interesting way to frame personality capture is by combining it with the concept of attention economics. All of the apps on your phone want to turn you into the type of person that uses them all day because that is beneficial for their revenue models. In many cases this is mutually beneficial, but it’s nonetheless clear that the cat and mouse game is starting to favor the felines more and more over the last two decades as they have learned to perfect their craft of user acquisition, retention, and ARPU maximization.
将个性捕捉与注意力经济概念相结合是一种有趣的方法。你手机上的所有应用都希望让你成为整天使用它们的人,因为这有利于它们的盈利模式。在很多情况下,这种互惠互利是显而易见的,但不可否认的是,在过去二十年里,随着它们在用户获取、用户保留和 ARPU 最大化方面技艺的日益精湛,猫鼠游戏开始越来越有利于猫。

As I discussed in where are the builders, the game becomes particularly skewed when there is a large difference in ability or judgement between counterparties, with one common example being children and adolescents. It’s easy to become personality-captured by minecraft or roblox at the age of 10 – such games are not only fun and addictive, but a child also has little understanding of the level of optimization their counterparty has put in to making sure that they remain a user for life. The reason it’s so hard to put your phone away is because it’s a battlefield of yourself versus thousands of intelligent and well-compensated engineers trying their hardest to ensure you do just the opposite.
如我在《哪里有建造者》一书中所讨论的那样,当参与者在能力或判断力上存在显著差距时,游戏体验会变得极为不平衡,其中一个典型的例子就是儿童和青少年。十岁时,孩子们很容易沉迷于《我的世界》或《Roblox》等游戏——这些游戏不仅好玩且容易上瘾,但孩子们往往对对方在确保他们成为长期用户方面所进行的优化工作知之甚少。手机之所以难以放下,是因为它成了一场你与数千名聪明且待遇优厚的工程师之间的较量,他们正努力确保你做出相反的行为。

How do I leave my personality basin?
我该如何走出我的性格低谷?

Perhaps you have decided that you don’t like your personality basin. Maybe it used to be working out for you but no longer is, or maybe you’ve always been unhappy with it. Or maybe you just have reason to believe you’re trapped in a local maxima which is far inferior to the global one. What should you do?
可能你已经决定不喜欢你的性格盆地。或许它曾让你觉得不错,但现在不再如此,或者你一直对它不太满意。又或者,你可能只是觉得你陷入了一个远不如全局最优的局部最优解。那你该怎么办呢?

The first thing you’ll want to do is to change your environment. If both you and your environment are a constant, you shouldn’t expect to end up in a different basin any time soon. For every new environment exists a new optimal-you, and the world offers many environments to choose from.
您首先需要调整自己的环境。如果你和环境都是一成不变的,那么你很快就不会期待进入不同的领域。每个新环境都对应一个最佳状态的“你”,而世界提供了众多可供选择的环境。

The second thing you’ll want to do is increase your learning rate. There are a lot of ways to do this. One interesting note is that your learning rate will automatically increase if your environment changes. This may be why so many people find they are able to be more thoughtful and creative while going on long walks in nature rather than sitting in a cubicle.
你需要做的是提升学习速率。这有多种途径。一个有趣的观察是,当环境发生变化时,学习率会自动提升。这或许能解释为何许多人发现,在户外散步时比在办公室里更能进行深入思考和创意发挥。

This is also a reason why it’s good to constantly be trying new things, because new things will likely involve new environments and new people. If you wonder why trying new things is hard, it is likely because this trait was more maladaptive in our ancestral environment than it is today, as we had less control over our surroundings in the past (If anything, we may have too many options in some cases of the present: our society is so large that defection from a group is less costly as you can simply find a new group to join afterwards. This seems to create challenging game-theoretic equilibria in match-making where commitment to a partner is devalued due to the ease of finding alternatives, the effects of which can be seen by how discontent much of the population is with dating apps).
这也是我们不断尝试新事物的好处之一,因为新事物往往意味着新的环境和新人。你可能想知道为什么尝试新事物这么难,可能是因为这种特质在我们祖先的环境中比现在更不适应。过去我们对周围环境的控制较少(实际上,现在在某些情况下我们可能面临太多选择:我们的社会如此庞大,离开一个群体成本较低,因为你可以轻松地找到一个新群体加入。这导致在匹配中出现了具有挑战性的博弈论均衡,由于寻找替代方案变得容易,伴侣的承诺价值因此降低,这一点可以从人们对约会应用的普遍不满中看出)。

A common mistake in life is to let your personality basin solidify too early. Your parents and schooling environment have a disproportionately large influence on who you become as an adolescent. But as soon as you gain the freedom to act independently as an adult, it’s usually a good idea to force yourself to try as many new things as you can, including moving cities (or countries!) and considering drastically different lines of work. Even if you feel content with where you are, the potential return is literally life-changing. Moving away from where I was born was one of my most important life choices, but it still took me several years longer than it should have to give it a shot.
生活中常见的错误之一是过早固化自己的个性。父母和学校环境对你的青少年时期影响巨大。然而,一旦你获得了作为成年人的独立行动自由,就应该尽量尝试各种新事物,包括搬家(甚至出国)和考虑截然不同的职业道路。即便你对现状感到满意,这种潜在的回报也能真正改变生活。离开我的出生地是我人生中最重要的决定之一,但真正付诸实践却比预期晚了几年。

Although you have a general learning rate curve for how quickly your personality adapts to a new environments, different stimuli will also be paired with differing gradient magnitudes. High-magnitude experiences which result in strong gradient updates can move you within personality space much more quickly.
虽然你有一个通用的学习率曲线,描述了你的个性如何快速适应新环境,但不同的刺激会导致不同的梯度幅度。强烈的梯度更新带来的高幅度体验可以更快地改变你的个性空间。

Humans have many sets of learning rate curves which govern different parts of their brain. In addition to the baseline learning curve, our learning curves are heavily modified by our environment.
人类大脑的不同区域都有许多学习率曲线,这些曲线决定了各自的学习过程。除了基本的学习曲线外,我们的学习曲线还受到周围环境的重要影响,使其发生了显著变化。

If someone uses a psychedelic drug which explicitly gives them high-magnitude gradients they will probably move a lot more in personality space than if they had stayed sober. Similarly if someone undergoes a highly traumatic event, it may push them a long distance within personality space as they quickly adapt to ensure that they don’t have to go through the same experience again. Both of these activities involve large gradient updates.
如果有人服用能产生明显高梯度效应的迷幻药物,他们可能在性格空间中的变化会比保持清醒时更大。同样,如果有人遭遇极度创伤事件,这可能会在性格空间中推动他们走得很远,因为他们会迅速适应以避免再次经历类似事件。这两种情况都涉及到较大的梯度更新。

Common activities which seem to give the largest gradient updates to humans are meditation, drug usage, trauma, religious events, love, gambling, and sex.
常见对人类产生最大梯度更新的活动有冥想、吸毒、创伤、宗教活动、爱情、赌博和性行为。这些活动似乎能给人带来最大的心理变化。

Some of these concepts are more negatively-coded than others, for example trauma. But the intended purpose of trauma is obvious, which is to avoid really bad things from happening to you in the future. One of the reasons why overcoming trauma isn’t as hard-coded into us as strongly as we might hope for is because our present society is so much larger than that which we evolved in such that there’s more opportunity to change your environment as to remove the potential source of trauma. Trauma was likely more adaptive in our ancestral environment than it was today due to an inability to drastically change your surroundings and social group in the past.
一些概念比其他概念更具负面编码,例如创伤。但创伤的本意显而易见,就是为了避免未来发生真正糟糕的事情。我们之所以没有像期望的那样将克服创伤深深植入我们的基因,其中一个原因是因为我们现在的社会远比我们进化时的社会庞大,因此有更多机会改变环境,消除潜在的创伤来源。在祖先的环境中,由于过去无法大幅改变周围环境和社交群体,创伤可能比现在更具适应性。

This is why strong psychedelic drugs like ayahuasca can be dangerous: whatever happens to you during your experience will be fed to you via high-magnitude gradients. Because users may experience hallucinations and delusional thinking during usage of such drugs, it’s possible for their location in personality-space to be thrown far out-of-distribution and into an area which has little overlap with the rest of humanity (See also: Psychedelics reopen the social reward learning critical period; Ketamine: ~48 hours, Psilocybin/MDMA: ~2 weeks, LSD: ~3 weeks, Ibogaine: ~4 weeks).
这就是为什么像圣草饮料这样的强力迷幻药物可能危险:你在体验过程中发生的一切都会通过高强度的梯度传递给你。由于使用这类药物时可能会出现幻觉和妄想性思维,他们的个性空间位置可能会被抛出正常分布范围,进入一个与人类其他部分重叠很少的区域。具体来说,氯胺酮的影响可能持续约 48 小时,麦角酸二乙基酰胺/摇头丸的影响可能持续约 2 周,裸盖菇素的影响可能持续约 3 周,伊博甘的影响可能持续约 4 周(参见:迷幻药物重新开启社会奖励学习关键期)。

This isn’t to say there can’t be high-magnitude positive outcomes as well, but just that there is a high potential for variance when large gradients are involved. Romantic love can be a similarly dangerous force and has pushed thousands to suicide, yet our society near-universally regards it as a good thing! While there are many other reasons for this, high-variance is not inherently bad and is likely necessary at the societal level in order to promote long-term antifragility (this is also the very reason I am so bullish on America).
这并不是说不可能有高幅度积极结果,但只是说当涉及大梯度时,存在很高的变异性。浪漫的爱情也可能是一种同样危险的力量,它促使数千人走向自杀,然而我们的社会几乎普遍认为这是一件好事!尽管有其他许多原因,但高变异性本身并不一定是坏事,而且在促进社会长期韧性方面可能是必要的(这也是我如此看好美国的原因)。

Personality basins and mental illness
个性区域与精神障碍

Personality basins are an interesting way to model many mental illnesses. Similar to attractor states or trapped priors, they allow us to have a simple model with which we can plan to manipulate in order to solve our problems. Just as your personality basin decides how introverted you are, how funny you are, and what type of music you enjoy, it also helps to curate which psychiatric conditions affect you.
人格盆地是模拟众多精神疾病的一种有趣方法。与吸引子状态或被困先验类似,它们使我们能够拥有一个简单的模型,通过操纵这个模型来解决问题。正如你的性格盆地决定了你的内向程度、幽默感以及你喜欢的音乐类型一样,它还能帮助你了解哪些精神疾病会对你产生影响。

One of the reasons why curing depression is so hard is because you need a very large gradient update to escape the basin you’re trapped in. This gradient update could come all at once via an excessively strong positive stimuli, for example a drug which explicitly increases your learning rate like ketamine. But this is often hard to reliably induce, and so the gradient updates instead usually have to be small and continual over a long period of time.
治疗抑郁症之所以困难,其中一个关键原因在于需要通过非常大的梯度更新来摆脱你所陷入的困境。这种更新可能通过一次过度的强烈积极刺激实现,例如使用明确提高学习率的药物,比如氯胺酮。然而,这种刺激往往难以可靠地引发,因此梯度更新通常需要长期进行,且每次更新都应较小且持续。

This is what most cognitive behavioral therapy techniques are: we find a simple way to make a small positive gradient update to push you ever-so-slightly out of the personality basin you’re trapped in, and then we keep doing it for months or years until we finally push you all the way out of the undesirable basin.
这就是大多数认知行为疗法技术的核心:我们找到一种简单的方法,通过微小的积极梯度更新,逐步帮助您走出您所陷入的性格低谷。我们持续这样做数月或数年,直至最终帮助您完全摆脱不理想的低谷,使您重获新生。

This is also a nice way to model something like drug addictions: drugs personality-capture you into a basin which feeds off of and depends on them, and this basin can become arbitrarily deep due to the high magnitude of gradients drugs can apply to you (and thus be very hard to escape from). The concept of relapsing on a drug is equivalent to falling back down to the bottom of the basin, and the concept of tapering off dosage over time is equivalent to providing small and continual gradient updates over time.
这是一种建模药物成瘾等事物的不错方法:药物将你的个性捕获到一个盆地中,它从中吸取养分并依赖它们。由于药物施加给你的梯度幅度很高,这个盆地可以变得非常深,因此很难逃脱。药物复吸的概念就像是从盆地的底部跌落,而逐渐减少剂量随时间推移则相当于持续提供小而稳定的梯度更新。

I have a lot of hot takes that society is collectively becoming so efficient at some forms of personality capture that we will end up inducing various psychiatric conditions in the majority of our population. Societies end up with their own hyperdimensional personality basins just as people do, and just like us, the two ways they can move out of their basin are either gradually via many slow updates (e.g. the industrial revolution), or all at once via a very strong update (e.g. the french revolution). It’s worth thinking about the effects that different types of memetic information may have on our society’s collective personality basins as we become more and more efficient at communication.
我有很多关于社会正在集体变得擅长捕捉某些人格特征的观点,以至于我们可能会在大多数人口中引发各种精神健康问题。社会就像个人一样,会形成自己独特的“人格盆地”。就像我们一样,社会摆脱这种盆地的方式有两种:一种是逐渐通过多次缓慢的变革(如工业革命),另一种是一次性通过剧烈的变革(如法国大革命)。随着我们沟通能力的不断提升,不同类型的文化信息可能对我们社会的集体人格盆地产生何种影响,值得我们深思。

Can’t I be in multiple personality basins?
我为什么不能同时存在于多个性格盆地中呢?

One thing you may notice from the above sections is that your personality appears much more malleable and dynamic than one described by a static point: you probably act differently around your family than you do around your friends or your co-workers.
从上述部分中,你可能发现,你的个性似乎比静态点所描述的更加灵活多变:你可能在家人面前、朋友面前和同事面前表现出不同的行为。

To solve this discrepancy you can simply model personality space and your personality basin with additional dimensions, allowing you to model yourself not as a 1d point, but as a three-dimensional landscape.
解决这一差异,您可以通过添加额外的维度来模拟人格空间和您的性格盆地,这样您就可以将自己视为一个三维景观,而不是一个一维点,使翻译更加自然易懂。

I model my own personality basin with an extra dimension (i.e., 4d): at any given point in time there exists a “me” which implements a given personality landscape in a given personality basin, but I also have many sub-basins which implement my different moods. The set of actions I might perform when I’m angry is very different from that when I’m sad, and these are simply different sub-basins within the containing higher-dimensional basin. You could similarly increment the model’s dimensionality in order to model yourself using internal family systems or even dissociative identity disorder.
我用额外的维度(即 4 维)构建自己的个性盆地模型:在任何给定的时间点,都存在一个“我”,它在一个给定的个性盆地中实现一个给定的个性景观。同时,我还有许多子盆地,它们分别实现我不同的情绪。例如,当我生气时可能采取的行动与悲伤时非常不同,这些情绪只是包含在更高维盆地中的不同子盆地。类似地,你可以通过增加模型的维度,使用内部家庭系统或分离性身份障碍来模拟自己,使其表达更加自然和易于理解。

Further reading延伸阅读

This post was heavily inspired by other posts including Trapped PriorsDynamical Systems, and Singing The Blues by Scott Alexander and Personality: The Body in Society by Kevin Simler.
这篇文章深受 Scott Alexander 的《被困先验》、《动力系统》和《唱蓝调》,以及 Kevin Simler 的《人格:社会中的身体》等文章的启发。

I’d strongly suggest reading The Others Within UsThe Arctic HysteriasCrazy Like Us, and Neurons Gone Wild as an addendum to this post. Other related topics to explore include signal theorycontrol theoryset point theorygame theoryreinforcement learning, and deep learning.
我强烈建议阅读《我们体内的他人》、《北极狂热症》、《疯狂如我们》和《失控的神经元》作为本文的补充。探索其他相关主题,如信号理论、控制理论、设定点理论、博弈论、强化学习和深度学习,也将很有帮助。

Although the concepts presented in this post are similar to pre-existing ideas, I find that applying the analogy of loss landscapes, basins, and basic RL and DL concepts to be useful tools for thought and encourage readers to do further exploration with this mental model in case they find other useful analogies (what might a linear transformation on the loss landscape of personality-space look like and compare to? how can we develop a more comprehensive model of learning rate in humans and how we can modify it? are there any mental illnesses we can use this model with to try to come up with novel types of cures? how can we integrate this with bayesian theories of learning and perception? which other ideas in LLMsRL or ML might we find useful to further analogize with?)
虽然本文提出的概念与现有想法相似,但我发现将损失地形、盆地以及基本的强化学习和深度学习概念应用于类比,是一种有助于思考的工具。我鼓励读者运用这种思维模型进行深入探索,并尝试寻找其他有用的类比。例如,线性变换在人格空间损失地形上可能呈现何种形态,与何种事物相类似?我们能否构建一个更全面的人类学习率模型,并对其进行调整?我们能否利用这一模型来探索针对精神疾病的新疗法?此外,我们如何将这一理论与学习与感知的贝叶斯理论相结合?还有哪些在LLMs、强化学习或机器学习中的观点,可以进一步进行类比研究?

The explicit goal of this post is to help RLHF you into a personality basin which more easily allows for thoughtful analogies and practical tools for introspection. Try something new today you’ve never done before or spend some time with no distractions to think about yourself and others! If you liked this post consider checking out my home page or twitter. Feedback is welcome!
这篇文章的明确目标是帮助你进入一个更容易进行深思熟虑的类比和反思实用工具的人格盆地。今天不妨尝试一些你从未做过的新事物,或者花些时间不受打扰地反思自己和他人!如果你喜欢这篇文章,不妨看看我的主页或推特。期待您的宝贵意见!