Research Questions & Motivation

A common mistake in a research project is to not have a research goal in mind. Students can seek out data and then aimlessly proceed to execute an exploratory process without direction. While exploration is certainly part of Data Science, you should have some specific questions in mind that you intend to answer. Your questions should revolve around identifying correlations and not causations. Don’t worry if you don’t expose an obvious correlation; your plots are successful if they identify non-correlations.

Another common mistake is to take on a project that is only mildly interesting and without consequence or impact. For example, one might identify the most popular genre of movies without an explanation of why we should care. Perhaps there is a very good reason we should care. Don’t simply answer mild curiosities.

Keep your research questions updated

In conducting research, the formulation of a research question is a crucial step that evolves over time. As researchers delve deeper into the subject matter, gain new insights, and adapt to changing circumstances, the initial research question may undergo modifications or even a complete transformation. Changes can occur due to many reasons such as:

  1. Pilot Study or Data Collection
  2. Data Analysis and Initial Findings
  3. Feedback and Peer Review

Focus on different demographics

Having multiple research questions that focus on different areas or demographics within a project can enhance the depth and breadth of your study. Here’s a description of how to accomplish this and a specific example to illustrate the concept:

To create multiple research questions focusing on different areas or demographics, you should first identify the key dimensions they wish to explore within the broader research topic. These dimensions could be distinct variables, subgroups, or specific factors of interest. Each research question should be clear, relevant, and aligned with the unique characteristics of the subgroup or area under investigation.

Examples

Research Questions

  1. How does the number and rate of aviation accidents correlate with different aspects of the temporal dimension (technological progression)?
  2. What is the relationship between aircraft mechanical factors and the likelihood of aviation accidents?
  3. How do external environmental factors such as geographical patterns and weather conditions correlate with the occurrence of aviation accidents?
  4. What is the role of human factors, including communication, crew-related factors, and staffing levels, in aviation accidents and incidents?
  5. To what extent can machine learning extract factorial information from personal narratives of aviation accidents?

Comments

In this example, the researchers have formulated five distinct research questions, each focusing on a different area and demographic. By having multiple research questions, the researchers can collect data specifically tailored to each potential hazard in aerospace engineering as well as factors of ML.

The first two focus on aircraft accidents and their connection to technology. The third question covers environmental and geographical demographics. The fourth is centered towards human behavior and psychology. And lastly, the fifth questions aims to draw connections using a Machine Learning model.

The different perspectives covered in this research, enviornmental, technology, physiological, and more make it complex and incredibly insightful. It also sets up the stage for conversation between findings.

Other Examples (The good & bad)

Good Research Questions (Clear, Specific, and Data-Driven)

  1. How does air quality vary across different cities, and what factors contribute most to poor air quality?
    (This question allows for data collection on air quality, weather, and pollution sources, making it suitable for Pandas analysis.)

  2. What are the trends in student performance based on study time and extracurricular activities?
    (You can analyze datasets on student grades, study habits, and activity participation to find correlations.)

  3. How do different machine learning algorithms perform on predicting house prices based on real estate data?
    (This allows for a comparative analysis using regression models and Pandas for data handling.)

Poor Research Questions (Too Broad, Opinion-Based, or Lacking Data)

  1. Why do students struggle with math?
    (Too vague and qualitative; lacks a clear dataset or measurable variables.)

  2. Is social media good or bad for mental health?
    (Overly broad and subjective; needs a more specific data-driven approach like analyzing sentiment analysis of social media usage.)

  3. How has technology changed over time?
    (Too general; should focus on a specific technology and measurable aspects like adoption rates or performance metrics.)