Agent & Repository
Many problems feature a large set of data, which is operated on by many different independent tasks. How can this be efficiently and correctly supported?
Consider a large shared data structure. This structure might contain source code, a set of learned clauses in a Boolean satisfiability solver, or a full-blown, general-purpose database. The computation requires that many independent tasks be able to perform operations on this structure. The tasks operating on this structure are independent, so there is a good deal of task level parallelism. However, the data structure is shared, which can lead to bottlenecks, or incorrect behavior due to data races.
The software architecture is defined in terms of the following components:
The repository is usually a large data structure. The repository could be stored in central location or it could be distributed. The agents operate on the repository. They are independent of each other (not aware of other agents). Usually, the agents act on a small piece of data of the repository. For efficiency reasons, it might be better for the agents to copy the working data locally and write it back to the repository after applying all operations, but care must be taken to ensure that the repository remains in a consistent state. Each agent believes it controls all of the repository it has access to. Thus, it is necessary to control the operations on the repository to ensure the repository behaves consistently with this premise. The manager controls repository accesses by the agents. The main purpose of the manager is to maintain data consistency, meaning that all agents reading the same part of the repository must have the same data. Writes by agents to the same part of the repository must be managed to avoid data getting corrupted. The manager could be
Concurrency control can be achieved in a variety of ways.
It is not necessary to have all agents be given access to all of the data in the repository. Some systems like databases and version control have strict permissions associated with the agents with respect to data accesses and modifications.
Pre-condition: A repository large enough to accommodate all the data created by the agents, a set of independent agents acting on the data, a manager to control the data accesses by the agents to the repository.
Invariants: Data consistency is maintained at all times, i.e. reads to the same parts of repository return the same values and writes to the same parts of the repository are consistent. Tasks cannot be allowed to starve due to contention over shared resources.
Source Version Control System relies on the agent-repository pattern. The repository stores all the source files shared among several groups. The agents are the users who use the repository to develop programs. The manager defines the permissions of the agents and decides which parts of the repository are accessible to whom. Data is copied locally by the agents, operated on and uploaded back to the main repository. The agents can query if any other agents have modified parts of the data they are working on and if so, can update their local copies. Data corruption (Conflict) has to be resolved by the agents before updating the repository.
The following PLPP patterns are related to agent-repository
Bryan Catanzaro, Narayanan Sundaram, Bor-Yiing Su