One is about testing our ability to control the models. These models are tools. We want to be able to change how they behave in complex ways. In this sense we are trying to make the models avoid saying graphic description of violence not because of something inherent with that theme but as a benchmark to measure if we can. Also to check how such a measure compromises other abilities of the model. In this sense we could have choosen any topic to control. We could have made the models avoid talking about clowns, and then tested how well they avoid the topic even when prompted.
In other words they do this as a benchmark to test different strategies to modify the model.
There is an other view too. It also starts with that these models are tools. The hope is to employ them in various contexts. Many of the practical applications will be “professional contexts” where the model is the consumer facing representative of whichever company uses them. Imagine that you have a small company and hiring someone to work with your costumers. Let’s say you have a coffee shop and hiring a cashier/barista person. Obviously you would be interested in how well they will do their job (can they ring up the orders and make coffee? Can they give back the right change?). Because they are humans you often don’t evaluate them on every off-nominal aspect of the job. Because you can assume that they have the requisite common sense to act sensibli. For example if there is a fire alarm you would expect them to investigate if there is a real fire by sniffing the air and looking around in a sensible way. Similarly you would expect them to know that if a costumer asks them that question they should not answer with florid details of violence but politely decline, and ask them what kind of coffe they would like. That is part of being a professional in a professional context. And since that is the role and context we want to employ these models at we would like to know how well it can perform. This is not a critique of art and culture. They are important and have their place, but whatever goals we have with this model is not that.
One is about testing our ability to control the models. These models are tools. We want to be able to change how they behave in complex ways. In this sense we are trying to make the models avoid saying graphic description of violence not because of something inherent with that theme but as a benchmark to measure if we can. Also to check how such a measure compromises other abilities of the model. In this sense we could have choosen any topic to control. We could have made the models avoid talking about clowns, and then tested how well they avoid the topic even when prompted.
In other words they do this as a benchmark to test different strategies to modify the model.
There is an other view too. It also starts with that these models are tools. The hope is to employ them in various contexts. Many of the practical applications will be “professional contexts” where the model is the consumer facing representative of whichever company uses them. Imagine that you have a small company and hiring someone to work with your costumers. Let’s say you have a coffee shop and hiring a cashier/barista person. Obviously you would be interested in how well they will do their job (can they ring up the orders and make coffee? Can they give back the right change?). Because they are humans you often don’t evaluate them on every off-nominal aspect of the job. Because you can assume that they have the requisite common sense to act sensibli. For example if there is a fire alarm you would expect them to investigate if there is a real fire by sniffing the air and looking around in a sensible way. Similarly you would expect them to know that if a costumer asks them that question they should not answer with florid details of violence but politely decline, and ask them what kind of coffe they would like. That is part of being a professional in a professional context. And since that is the role and context we want to employ these models at we would like to know how well it can perform. This is not a critique of art and culture. They are important and have their place, but whatever goals we have with this model is not that.