2019
Bowen Xu Amirreza Shirani, David Lo
Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models Conference
AAAI Reasoning for Complex Question Answering Workshop (AAAI 2019), 2019.
Abstract | Links | BibTeX | Tags: Community question answering, cQA, Question relatedness, stack overflow
@conference{shirani2019question,
title = {Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models},
author = {Amirreza Shirani, Bowen Xu, David Lo, Thamar Solorio, Amin Alipour},
url = {https://arxiv.org/pdf/1905.01966.pdf},
year = {2019},
date = {2019-01-03},
booktitle = {AAAI Reasoning for Complex Question Answering
Workshop (AAAI 2019)},
abstract = {Domain-specific community question answering is becoming an integral part of professions. Finding related questions
and answers in these communities can significantly improve
the effectiveness and efficiency of information seeking. Stack
Overflow is one of the most popular communities that is being
used by millions of programmers. In this paper, we analyze
the problem of predicting knowledge unit (question thread)
relatedness in Stack Overflow. In particular, we formulate the
question relatedness task as a multi-class classification problem with four degrees of relatedness.
We present a large-scale dataset with more than 300K pairs.
To the best of our knowledge, this dataset is the largest
domain-specific dataset for Question-Question relatedness.
We present the steps that we took to collect, clean, process,
and assure the quality of the dataset. The proposed dataset
on Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the
prediction of relatedness in technical community question-answering forums.
We adapt a neural network architecture and a traditional
model for this task that effectively utilize information from
different parts of knowledge units to compute the relatedness
between them. These models can be used to benchmark novel
models, as they perform well in our task and in a closely similar task.},
keywords = {Community question answering, cQA, Question relatedness, stack overflow},
pubstate = {published},
tppubtype = {conference}
}
Domain-specific community question answering is becoming an integral part of professions. Finding related questions
and answers in these communities can significantly improve
the effectiveness and efficiency of information seeking. Stack
Overflow is one of the most popular communities that is being
used by millions of programmers. In this paper, we analyze
the problem of predicting knowledge unit (question thread)
relatedness in Stack Overflow. In particular, we formulate the
question relatedness task as a multi-class classification problem with four degrees of relatedness.
We present a large-scale dataset with more than 300K pairs.
To the best of our knowledge, this dataset is the largest
domain-specific dataset for Question-Question relatedness.
We present the steps that we took to collect, clean, process,
and assure the quality of the dataset. The proposed dataset
on Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the
prediction of relatedness in technical community question-answering forums.
We adapt a neural network architecture and a traditional
model for this task that effectively utilize information from
different parts of knowledge units to compute the relatedness
between them. These models can be used to benchmark novel
models, as they perform well in our task and in a closely similar task.
and answers in these communities can significantly improve
the effectiveness and efficiency of information seeking. Stack
Overflow is one of the most popular communities that is being
used by millions of programmers. In this paper, we analyze
the problem of predicting knowledge unit (question thread)
relatedness in Stack Overflow. In particular, we formulate the
question relatedness task as a multi-class classification problem with four degrees of relatedness.
We present a large-scale dataset with more than 300K pairs.
To the best of our knowledge, this dataset is the largest
domain-specific dataset for Question-Question relatedness.
We present the steps that we took to collect, clean, process,
and assure the quality of the dataset. The proposed dataset
on Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the
prediction of relatedness in technical community question-answering forums.
We adapt a neural network architecture and a traditional
model for this task that effectively utilize information from
different parts of knowledge units to compute the relatedness
between them. These models can be used to benchmark novel
models, as they perform well in our task and in a closely similar task.