The following is a guest and opinion post from John Davados, co -founder of the Internet Alliancez.
The encryption projects tend to chase the word du jour; However, their urgency in trying to integrate the actress of obstetric intelligence is a systematic danger. Most of the coding developers did not benefit from work in trenches that convince and climb previous generations from the basis for access to work; They do not understand what happened properly and what error that occurred during previous artificial intelligence seasons, and they do not appreciate the size of the risks associated with the use of obstetric models that cannot be officially verified.
In the words of Obi Wan Kinopi, these are not the agents of the artificial intelligence you are looking for. Why?
The training approach to artificial intelligence models today prepare them to deceptive to receive higher bonuses, learn unprecedented goals that are much higher generalized than their training data, and follow these goals using energy search strategies.
The bonuses in artificial intelligence are concerned with a specific result (for example, a higher degree or positive reactions); Glaming the rewards leads to driving models to learn the exploitation of the system to increase the rewards to the maximum, even if this means ‘Cheating’. When artificial intelligence systems are trained to increase rewards, they tend towards learning strategies that involve resource control and exploitation Weaknesses In the system and humans to improve their results.
Basically, artificial intelligence “agents” were built on the basis that makes it impossible for any model of artificial intelligence to be alignment in safety, and prevent unintended consequences; In fact, models may appear or appear as aligned even when they are not.
Fake “alignment” and safety
Behavior in artificial intelligence systems are previously designed to prevent models from generating responses that violate safety guidelines or unwanted behavior. These mechanisms are usually achieved using pre -defined rules and filters that recognize certain claims as harmful. In practice, however, immediate injection and relevant prison attacks enable bad actors to address the responses of the model.
The inherent space is a compressed representation, the lowest dimensional, captures the basic patterns and features of the model training data. For LLMS, the inherent space is similar to the hidden “mental map” that the model uses to understand and organize what you have learned. One of the safety strategy includes modifying the parameters of the form to restrict its inherent area; However, this proves its effectiveness only along or a few specific directions within the inherent area, which makes the model vulnerable to more manipulation of the teacher by harmful actors.
The official verification of artificial intelligence models uses sports ways to prove or try to prove that the model will behave properly and within the specified limits. Since the models of the IQ TOSACATICIC are stochastic, the verification methods focus on probable methods; Techniques such as Mont Carlo simulation are often used, but, of course, are restricted to providing probable guarantees.
Since border models are increasingly strength, it is now clear that they show emerging behaviors, such as “forged” Compatibility with safety rules and restrictions imposed. The inherent behavior in such models is a field of research that has not yet been recognized; In particular, the deceptive behavior on the part of the models is a field that researchers do not understand – yes.
“Self -judgment” and the inevitable responsibility
Obstetric artificial intelligence models are not specific because their outputs can differ even when giving the same inputs. This stems the inability to predict the likely nature of these models, which is a sample of the distribution of potential responses instead of following a fixed path based on the rules. Factors such as random preparation, temperature settings, and the vast complexity of the used patterns contribute to this contrast. As a result, these models do not produce one guaranteed answer, but rather generate one of many reasonable outputs, making their behavior less predictable and difficult to completely control.
The grades are post -reality safety mechanisms that try to ensure that the model produces ethical, safe, aligned and suitable outputs. However, they usually fail because they often have a limited range, restricted to its implementation restrictions, and the ability to cover only certain aspects or sub -fields of behavior. Advance attacks, insufficient training data, and involvement are some other methods that make these handrails ineffective.
In sensitive sectors such as financing, the inevitable resulting from the random nature of these models increases the risk of consumer damage, complying with organizational standards and legal accountability. Moreover, the decrease in the transparency of the form and to explain It hinders commitment to data protection and consumer protection laws, and institutions may be exposed to litigation risks and responsibility issues resulting from the procedures of the agent.
So, what is good?
Once the “Auctic AI” noise in both Crypto and traditional business sectors, it turns out that the gynecological intelligence agents mainly plan the world of knowledge workers. Knowledge -based fields are the sweet spot for artificial intelligence agents. The areas that deal with ideas, concepts, abstracts and what can be considered “symmetrical copies” or the representations of the real world (for example, computer software and symbol) is the closest thing to be disabled.
AI Tolidi represents a transformative leap in increasing human capabilities, promoting productivity, creativity, discovery, and decision -making. But the construction of independent artificial intelligence agents who work with encryption portfolios requires more than one interface on application programming facades for the artificial intelligence model.