NLP - ML - RL - Data Science

Understanding the complexities of human language with algorithms

Ethical Challenges in Data-driven Dialog Systems

Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau

The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven systems, the rise of adversarial examples, potential sources of privacy violations, safety concerns, special considerations for reinforcement learning systems, and reproducibility concerns. We also suggest areas stemming from these issues that deserve further investigation. Through this initial survey, we hope to spur research leading to robust, safe, and ethically sound dialogue systems.

  • Paper : Accepted in AAAI/ACM conference on Ethics and Safety, 2017
  • Webpage
  • Code

Adversarial Generation of Dialog with contextual information

Koustuv Sinha, Prasanna Parthasarathy, Peter Henderson and Joelle Pineau

Generative models such as Seq2Seq, HRED, VHRED models have typically been able to generate dialog responses over short contexts. Recently, researchers are investigating the possibility of using Generative Adversarial Networks to enhance these generative models to generate proper responses. Our work is to investigate the role of context and to possibly train and enhance generative models with adversarial training to be able to handle varied contexts.

  • Paper : WIP

Analyzing and predicting Interactions in Literature

Koustuv Sinha, Derek Ruths and Andrew Piper

In literature, social networks among the characters are formed on the basis of interactions among each other. Some characters interact with each other directly, while some characters are connected on the basis of implied interactions. Our research is to devise a novel method to detect these interactions linguistically, and to analyze which features of the discourse are prominent in identifying these interactions.

  • Paper : “On the unreasonable complexity of detecting social interactions in literature” - In review

Gtopics - Inferencing human oriented topics from data

Koustuv Sinha, Derek Ruths and David Jurgens

When training a topic based document classification system, how do we infer which topics to train? Traditionally, researchers have used unsupervised topic models such as LDA to generate clusters and then tag meaningful topics. Our work, however is to prove that meaningful topics can be inferred from social media discourse itself, which provides an easy way to just train your classifier on the topics inferred by our pipeline.

  • Paper : WIP

Real Time Crime News Extraction

Saptarsi Goswami, Koustuv Sinha, Debasish Banik, Urmi Saha, Subhashree Bose

In Indian context, Crime data from NCRB is a major source for researchers to analyze the data. However. NCRB data is at least two years old, so it is hard for researchers to properly understand the real crime trends. Our work is to extract real time Crime data from online sources, such as News Papers and Tweets, and create a real time latest updated dataset for researchers to work on.

  • Paper : “Extraction, Identification & Classification of Crime Data from Real Time News Sources”