abstract :
Software developers are increasingly having conversations about software
development via online chat services. In particular, developers are
turning to public chat communities hosted on services such as Slack,
IRC, and Gitter to discuss specific programming languages or
technologies. The emerging trend of increased participation in developer
chats motivated us to investigate and develop techniques to extract
useful knowledge available in developers’ chat communication channels.
In a preliminary study, we found several new opportunities in mining
chat conversations. We found that chats contain valuable information,
such as descriptions of code snippets and specific APIs, good
programming practices, and causes of common errors/exceptions. We also
observed that developers use chats to share opinions on best practices,
APIs, and tools. Q&A forums such as Stack Overflow explicitly forbid the
use of opinions on their sites. The availability of these information in
chats may lead to new mining opportunities for software tools.
Different from many sources of software development-related
communication, the information on chat forums is shared in an
unstructured, informal, and asynchronous manner. There is no predefined
delineation of conversation in chats; multiple questions are discussed
and answered in parallel by different participants. Therefore, a
technique is required to separate, or disentangle, the conversations for
analysis by researchers or automatic mining tools. Understanding the
quality of the information in the mining source is essential for
building effective data-driven software tools. Currently, there is a
lack of a formal mechanism of quality assessment in chat
platforms. Thus, in this talk I will focus on automatic techniques for:
chat disentanglement, quality assessment, and extraction of opinions
from developer chats.