Evolution of rewards and learning mechanisms in Cyber Rodents
Finding the design principle of reward functions is a big challenge in both artificial intelligence and neuroscience. Successful acquisition of a task usually requires rewards to be given not only for goals but also for intermediate states to promote effective exploration. We propose a method to design “intrinsic” rewards for autonomous robots by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use the Cyber Rodent robots, in which collision avoidance, recharging from battery pack, and “mating” by software reproduction are three major “extrinsic” rewards. We show in hardware experiments that the robots can find appropriate intrinsic rewards for the visual properties of battery packs and potential mating partners to promote approach behaviors.