Visualizing Redundant Data Validation

Ushahidi
May 9, 2010

data visualization The following visualizations represent the various methods that go into calculating the reputation and veracity scores for users and content within the SwiftRiver platform. They are in part a response to this comment from reader Charles Bernard on this post. His comment:

In many instances, there are entities with a vested interest in preventing valid information regarding things such as voting, battles and even disasters, both natural and man-made. For nearly any human effort, there exist a group of entities which would profit by either the details or the extent of a problem being kept from the public–and that can include relief agencies. While tracking particular sources and their validity of reports is a step in the right direction, some entities, in particular governments and large corporations have access to the resources needed to generate thousands or even 100,00s of thousands of false data reports, flooding the system with misinformation.

In other words, what steps are we taking to prevent individuals with malicious intent from gaming SwiftRiver? Here was my response:

With Swift, we aren’t just validating content, we’re also validating users, users validate each other and content validates users. Content can also be used to verify other content. This creates a system that’s difficult to dupe, as one looking to falsify information would need to thousands of false reports from a number of different ‘users’, locations, and media channels. What would be absolutely possible is for a group to download Swift, set up their own instance with all sorts of fake information and publicize it as fact. However, our distributed, decentralized reputation system River ID would show that outside of that instances ‘ecosystem’ no one trusts those users, or the instance. If the administrators opt out of tracking…they also forfeit any sort of benefits that come from River ID (trust from users who don’t know you or your site). In this case falsifying information is indeed easy, but promoting it becomes self-defeating, as the more people who aren’t under your influence see it, the less authority your Swift instance (with all it’s fake reports) actually holds.

I thought these concepts might be hard to grasp so I made the following Arc Diagrams to give a visual representation of what I actually mean. Click the images for high-versions. In the images below, the light grey color is simply used to indicate that content isn't important for what that particular chart is showing you. voting Fig. 1 Individual Voting Against the Community Figure 1 represents the most classic scenario of 'gaming', spam, bots or human individuals who are trying to vote bogus content 'up' so it will be weighted higher than other content. Section "A" represents User 1. Section "B" represents the activity of User 2 (our spammer). Section "E" represents the community within this particular Swift instance. Section "F" represents the users of our distributed trust system River ID or the global SwiftRiver economy. Section "C" represents individual content items. Section "D" represents the source that content is coming from. The thickness of the lines connecting the users to the content and the source, represents how they've voted on those particular things. The thickness of the line for User 2 tells us that he's rating these things very highly. Perhaps they come from his blog, and he wants them at the top! The thickness of the lines from the local community of the SwiftRiver instance as well as the global users tells us that these content sources are suspect. We can see that User 1 (who represents our average, active user) is voting closer to the how the community is voting, in fact even harsher than the community votes both the content and the source (represented by thinner lines). This dynamic relationship between users and their interactions with content (in contrast to the local and global community) is considered when scoring users, content, and the sources. In this case the person voting against the tide is actually damaging his or her own reputation both locally and globally. However, this isn't the only thing we consider, otherwise it would encourage conformity which also isn't good (sometimes the outlier knows something the rest don't.) voting Fig. 2 Factors Considered in Rating Content In Figure 2 we can see that things like Time, Location, Activeness as well as Global and Local interaction, are all considered. Time (green) and Location (dark grey) are optional, for scenarios like a conflict or war. The content producer's location, or proximity to 'ground zero' tells the system to factor this in to its score. Also the length of time that content is produced after the initial event may also tell us a lot. Things like 'time' and 'location' are optional because if your Swift instance is tracking something like a political scandal, time and proximity may not actually add any value to authority calculations. Purple represents how active Users 1 and 2 are. In and of itself how much someone uses a Swift instance is irrelevants. It could mean that they are an eager member providing valuable assistance, or it could mean they are attempting a brute force attack on the system similar to the Figure 1 scenario. However, when coupled with other factors, frequency of interaction is considered and can positively or negatively weight the score for a user. voting Fig. 3 Ratings Visible to Users In Figure 3 I'm illustrating what information is visibly shared in the scenarios above. The trust the local community has for Users 1 and 2 is displayed. The trust the global RiverID system has for Users 1 and 2 is also displayed. Thus, the trust Users 1 and 2 should have for each other is inferred.

Swift's strength is in multiple points of redundancy. All scores are calculated against a multitude of other factors which may or may not be independent to the local community. This allows users to build scores more organically than x=bad y=good. There are some probabilistic calculations as well as algorithmic intricacies that make all this a lot more complex (a lot of math beyond my paygrade). We also calculate things like tags and content influence which compound the complexity. Unless the local Swift instance administrators opt-in to participating in the global Swift ecosystem, their instance only holds authority with the people using it. In theory, their 'gaming' would then be contained to their local Swift instance. The fact that global authority isn't considered would be an indicator that the public shouldn't trust it. If they do opt-in to the global ecosystem, it becomes increasingly harder to continue gaming the system, as your scores are constantly weighted against the global community's. Because Swift is open source, it's easy to reverse engineer or hack parts of the local system. But this is why we announced Swift Web Services last month, core components to the global system are centralized and well protected. This protects the global ecosystem, but still allows for independent uses of SwiftRiver, and all of it's components as open, locally deployable apps. Some users, for example election monitors, may not want their SwiftRiver instance online at all. In that case, global authority doesn't matter, the instance can and should only be influential amongst the people using it. This is why we opted for cloud solutions in addition to local deployment options, yet another redundancy to ensure the platform's usefulness in multiple scenarios. Post any follow up questions to the newsgroup or in the comments below.