data transformation
(noun)
The application of a deterministic mathematical function to each point in a data set.
Examples of data transformation in the following topics:
-
When to Use These Tests
- "Ranking" refers to the data transformation in which numerical or ordinal values are replaced by their rank when the data are sorted.
- In statistics, "ranking" refers to the data transformation in which numerical or ordinal values are replaced by their rank when the data are sorted.
- Data transformation refers to the application of a deterministic mathematical function to each point in a data set—that is, each data point $z_i$ is replaced with the transformed value $y_i = f(z_i)$, where $f$ is a function.
- Data can also be transformed to make it easier to visualize them.
- Indicate why and how data transformation is performed and how this relates to ranked data.
-
Exercises
- If the arithmetic mean of log10 transformed data were 3, what would be the geometric mean?
- Using Tukey's ladder of transformation, transform the following data using a λof 0.5: 9, 16, 25
- In the ADHD case study, transform the data in the placebo condition (D0) with λ's of .5, 0, -.5, and -1.
- How does the skew in each of these compare to the skew in the raw data.
- Which transformation leads to the least skew?
-
Transforming data (special topic)
- When data are very strongly skewed, we sometimes transform them so they are easier to model.
- A transformation is a rescaling of the data using a function.
- Transformed data are sometimes easier to work with when applying statistical models because the transformed data are much less skewed and outliers are usually less extreme.
- While there is a positive association in each plot, the transformed data show a steadier trend, which is easier to model than the untransformed data.
- (b) A scatterplot of the same data but where each variable has been log-transformed.
-
Log Transformations
- State how a log transformation can help make a relationship clear
- The log transformation can be used to make highly skewed distributions less skewed.
- The comparison of the means of log-transformed data is actually a comparison of geometric means.
- Therefore, if the arithmetic means of two sets of log-transformed data are equal then the geometric means are equal.
- Scatter plots of brain weight as a function of body weight in terms of both raw data (upper panel) and log-transformed data (lower panel).
-
Conclusion
- We've described some of the basic "nuts and bolts" tools for entering and transforming network data.
- The "bigger picture" is to think about network data (and any other, for that matter) as having "structure. " Once you begin to see data in this way, you can begin to better imagine the creative possibilities: for example, treating actor-by-attribute data as actor-by-actor, or treating it as attribute-by-attribute.
- Different research problems may call for quite different ways of looking at, and transforming, the same data structures.
-
Box-Cox Transformations
- Data that are normal lead to a straight line on the q-q plot.
- Such data are often strongly skewed, as is clear from Figure 3.
- The kernel density plot of the optimally transformed data is shown in the left frame of Figure 4.
- (L) Density plot of the 1973 British income data.
- (L) Density plot of the 1973 British income data transformed with λ = 0.21.
-
Linear Transformations
- Often it is necessary to transform data from one measurement scale to another.
- To transform feet to inches, you simply multiply by 12.
- Similarly, to transform inches to feet, you divide by 12.
- The transformation consists of multiplying by a constant and then adding a second constant.
- Such transformations are therefore called linear transformations.
-
The Discrete Fourier Transform
- Suppose we have discrete data, not a continuous function.
- This is the discrete version of the Fourier transform (DFT).
- $f_n$ are the data and $c_k$ are the harmonic coefficients of a trigonometric function that interpolates the data.
- In the handout you will see some Mathematica code for computing and displaying discrete Fourier transforms.
- The reason is that Mathematica uses a special algorithm called the FFT (Fast Fourier Transform).
-
Analyzing Data
- Data Analysis is an important step in the Marketing Research process where data is organized, reviewed, verified, and interpreted.
- Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.
- In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA).
- All are varieties of data analysis.
- Summarize the characteristics of data preparation and methodology of data analysis
-
Tukey Ladder of Powers
- Plotting the data on a scatter diagram is the first step.
- These data are plotted two ways in Figure 1.
- The right frame displays the transformed data, together with the linear fit for the 1790-1960 period.
- The demonstration in Figure 7 shows distributions of the data from the Stereograms case study as transformed with various values of λ.
- Keep in mind that λ = 1 is the raw data.