Tuesday, May 19, 2009

Social networking on Trac: pinning down the specifics

I'm working with Ainsley Lawson now, and I'll defer to her excellent post for a summary of the purpose and evaluation issues she's been researching. Speaking with her about the project gave me a much clearer idea of the specifics of the network we'll be building, so here goes the clearest summary I can make so far:

To build the graphs:

Create a social relationship graph.

Look at email to: and from: fields in the tracked communications and give each pair of people a relationship point for each time one emails the other. Use the relationship points to determine strength of connection in a relationship graph.

Create a code relatedness graph.

For each pair of code modules, give them a relatedness point for each time they've been checked in at the same time. This code relatedness thing could get much more complex, but I understand there's a lot of source visualization software out there that's already solved these problems, so we could look at them.

Create a module-by-module expertise listing.

For each code module, look at the subversion history and record the number of lines of code each distinct author has added, changed, and deleted over the life of the module (LOC edited).

Created a shared authorship graph. This one's still very rough
  • For each pair of people, for each code module, give them min(A's LOC edited, B's LOC edited) shared authorship points.
  • For each pair of people, for each pair of related code modules, give them (min(A's LOC edited in both, B's LOC edited in both)*relatedness/something) shared authorship points.
  • Rationale: two heavy editors should get a higher rating than one heavy editor and one light editor, hence the min() construction.
  • Edits in related modules should count for less than edits in the same module, hence the "/something," denominator probably to be determined by dumb tweaking until it lines up with results of surveying the coders about their network or something.
  • Total shared authorship points between each pair of people is strength of connection in the graph
So what do we do with them?

The primary purpose would be to decide on a threshold difference between relationship points and shared authorship points at which we'd consider a pair of people not to be communicating effectively. If Alice and Bob have 2000 authorship points but only 500 relationship points, we would add them to each other's recommended collaborators feed, available as a widget down the side of the Trac project home page with a link to one another's emails or something.

Other possibilities:
  • People can input the name of a module and get back a list of the experts on that module (determined by LOC edited), and maybe a list of related module expertise search links.
  • To really reach, the above could be smarter, perhaps. If I'm writing an in-trac email or bug report that mentions modules by name, it could automatically suggest additional people to copy the ticket to.
  • You could have a list of experts in modules you've recently checked in as a quick-contact box (with manual add and stickying people allowed).
  • Managers can see a visualization of discrepancies between the social and shared authorship graphs to help diagnose organizational inefficiencies.
  • When Bob shows up on Alice's collaborators feed, she can click "Who's Bob?" and see a graph of of the social network with paths between her and Bob highlighted.
Things to consider
  • Should expertise slowly expire? It could make sense for experience within the last year to count more than experience from several years back. This would mean counting expertise points as LOC edited as a function of time - not hard to do since we'll be getting our info from diffs anyways, but it stinks of unnecessary complexity.
  • Should we allow for diff-by-diff updates of the graphs, or assume it'll just be fully rebuilt once a week or whatever? Probably the latter to start off, until we have an idea of just how big the organization is.
  • Must make sure to keep in mind that we're doing all this fancy footwork in order to deliver a final product that's extremely simple so people might actually use it. Other social network graphing solutions exist, we need to focus on making ours simple and directed. The recommended collaborators feature fits, but not all of the others do.

Note: Thanks to Ainsley for terminology correction, and please see her similar post for more information on these ideas.

No comments:

Post a Comment