data collection
(noun)
Data collection is a term used to describe a process of preparing and collecting data.
Examples of data collection in the following topics:
-
Defining the Sample and Collecting Data
- Defining the sample and collecting data are key parts of all empirical research, both qualitative and quantitative.
- Sampling and data collection are a key component of this process.
- In both cases, it behooves the researcher to create a concrete list of goals for collecting data.
- Good data collection involves following the defined sampling process, keeping the data in time order, noting comments and other contextual events, and recording non-responses.
- Natural scientists collect data by measuring and recording a sample of the thing they're studying, such as plants or soil.
-
Use of Existing Sources
- The study of sources collected by someone other than the researcher, also known as archival research or secondary data research, is an essential part of sociology .
- In archival research or secondary research, the focus is not on collecting new data but on studying existing texts.
- Common sources of secondary data for social science include censuses, organizational records, field notes, semi-structured and structured interviews, and other forms of data collected through quantitative methods or qualitative research.
- Primary data, by contrast, are collected by the investigator conducting the research.
- The primary reason is that secondary data analysis saves time that would otherwise be spent collecting data.
-
Objective vs. Critical vs. Subjective
- The selection of data (this selection reveals data the author believes is reliable whether or not it is)
- If the researcher decides to collect their own data, then they must:
- Decide how to analyze the data collected (if mathematically, which protocols will be used and which software program, and if qualitatively which themes will be looked for and / or what software program)
- Decide how to measure or categorize the data (if mathematically, what set of parameters counts as a good measure, and if qualitatively what must a category contain)
- If the researcher decides to use secondary data, this becomes even more complicated.
-
Ego network data
- Surveys may be used to collect information on ego networks.
- Data collected in this way cannot directly inform us about the overall embeddedness of the networks in a population, but it can give us information on the prevalence of various kinds of ego networks in even very large populations.
- When data are collected this way, we essentially have a data structure that is composed of a collection of networks.
- The second major way in which ego network data arise is by "extracting" them from regular complete network data.
- For this task, the Data>Egonet tool is ideal.
-
Introduction to relations
- The other half of the design of network data has to do with what ties or relations are to be measured for the selected nodes.
- There is also a second kind of sampling of ties that always occurs in network data.
- When we collect network data, we are usually selecting, or sampling, from among a set of kinds of relations that we might have measured.
-
Analyzing Data and Drawing Conclusions
- In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA).
- Usually the approach is decided before data is collected.
- In an exploratory analysis, no clear hypothesis is stated before analyzing the data, and the data is searched for models that describe the data well.
- How data is coded depends entirely on what the researcher hopes to discover in the data; the same qualitative data can be coded in many different ways, calling attention to different aspects of the data.
- Coded data is quantifiable.
-
Sampling ties
- Because we collect information about ties between all pairs or dyads, full network data give a complete picture of relations in the population.
- Unfortunately, full network data can also be very expensive and difficult to collect.
- This kind of approach can be quite effective for collecting a form of relational data from very large populations, and can be combined with attribute-based approaches.
- Such data are, in fact, micro-network data sets -- samplings of local areas of larger networks.
- Data like these are not really "network" data at all.
-
Introduction
- For a classic study of the American south (Deep South, University of Chicago Press, 1941), Davis and his colleagues collected data on which of 18 women were present at each of the 14 events of the "social season" in a community.
- The Davis data is a bit different.
- Data like these involve two levels of analysis (or two "modes").
- Often, such data are termed "affiliation" data because they describe which actors are affiliated (present, or members of) which macro structures.
- The data set has two modes: donors and initiatives.
-
Selecting sub-sets of the data
- Data>Remove isolates creates a new data set that contains all cases that are not isolated.
- Sometimes, when we collect information by doing a census of all the actors of a given type, or in a given location, some are "isolated. " While this is usually an interesting social fact, we may wish to focus our attention on the community of actors who are connected (though not necessarily forming a single "component").
- Data>Unpack is a tool for creating a new data set that contains a sub-set of matrices from a larger data set.
- Data>Join is a tool that can be used to combine separate sets of data into a new data set.
- Often we collect attribute information about actors in several different settings (e.g. several classrooms in a school) and store these as separate files.
-
Introduction: Manipulating network data structures
- It is possible, for a data structure or data object to have more than two dimensions.
- Network analysts work with a variety of data structures.
- One major "type" of data structure is the actor-by-actor matrix (like the friendship data above).
- Network analysts think of this kind of "rectangular" array of actors by attributes simply as a collection of vectors.
- The "rectangular" data structure (called an "attribute" data set) is used in a number of ways in network analysis.