Does the AI prefer sources higher up in the dataset? Or is it just representative of how much data the source contains? If they ranked by "tokens," is the rating indicative of quality or just quantity?
Training an AI based on a Reddit or forum comment chain would be extremely useful for learning how to direct online conversation.