Social Network Analysis from Project Management Data
This is what I'll be investigating for the next week. So far, I haven't found any closely similar projects, but the field itself is daunting. "Who should fix this bug," discussed in a previous post, was an ambitious analysis tool of bug tracking information with a smaller results scope and a better sense of what would constitute success, and they got only so-so results out of their project. Wikipedia's coverage of building social network graphs is making my head explode. It's a lot to take in, so I'll try to list issues to investigate here:
- Where can I get input data? Ideally, I'd grab the full backend database for an instantiation of a Trac variant supporting a real, somewhat long-lived and complex project. I'm told I should ask Greg Wilson and David Wolover about getting DrProject history.
- Once I have data and have processed it, what do I plan on doing with it? How would I test my results? The bug assignment team could compare predicted bugfix assignees to who actually closed the ticket, what's my metric? Would a comparison to some sort of aggregated graph of contact like from Google's social graphing results be fruitful? It's unlikely, since we're ranking social contact within a work environment, while most social networking data online is voluntary. Maybe some sort of survey set up for participants to rate their working relationships with one another? This seems like the best route, but I'd have to set it up ahead of time so as not to fall into post-hoc analysis trap.
- What about the graph itself? Should connections be directionally weighted (I think that's the term) that is, if everyone contacts the intern to assign her small tasks but she usually only contacts her direct supervisor, should we keep track of the distinction or just collapse it into "has contact with many people"? Should we count mentions of each other's names in communication? Changes in assigned-to status from A to B as a link between them? Actual emails? Should some links count more than others? By how much? What sort of crazy voodoo could possibly guide my choice there? I think one thing to do would be to construct different graphs for different contact types, with the ability to overlay/combine them later. Another possibility is to take a page from how these scientists run their models and gather survey results first, then run experiments on our program to change weightings to get it to closely match the survey results.
- What sort of out-of-the-box solutions are available to me for visualizing social networks? What about for graphs like this in general?
- Should I be planning on making something that's specifically suited to their team? Or a more general tool?
- Apparently people are trying for an open standard on disambiguating social links. They're kind of cool, and could be useful for a variety of our projects here.