Skip to contents

Load R package

Preprocessing

The function preprocessing() takes a .csv file of summarized protein abundances, exported from Spectronaut. The most important columns that need to be included in this file are: R.Condition, R.Replicate, PG.ProteinAccessions, PG.ProteinNames, PG.NrOfStrippedSequencesIdentified, and PG.Quantity. This function will reformat the data and provide functionality for some initial filtering (based on the number of unique peptides). The steps below describe the functions that happen in the Preprocessing code.

1. Loads the raw data

  • If the raw data is in a .csv file Toy_Spectronaut_Data.csv, specify the fileName to read the raw data file into R.

  • If the raw data is stored as an .RData file Toy_Spectronaut_Data.RData, first load the data file directly, then specify the dataSet in the function.

2. Filters out identified proteins that exhibit “NaN” quantitative values

NaN, which stands for ‘Not a Number,’ can be found in the PG.Quantity column for proteins that were identified by MS and MS/MS evidence in the raw data, but all peptides from that protein lack an associated integrated peak area or intensity. This usually occurs in low abundance peptides that exhibit intensities close to the limit of detection resulting in poor signal-to-noise (S/N) and/or when there is interference from other co-eluting peptide ions with very similar or identical m/z values that lead to difficulty in parsing out individual intensity profiles.

3. Applies a unique peptides per protein filter

General practice in the proteomics field is to filter out proteins from which only 1 unique peptide was identified. This adds increased confidence to results already filtered to a 1% false discovery rate (FDR), since proteins that are identified with 2 or more peptides are less likely to be false positives. We recommend filtering out these protein entries in order to focus on more confident targets in the identified proteome. However, 1-peptide proteins can still be observed in the original protein report from Spectronaut.

4. Adds accession numbers to identified proteins without informative names

Spectronaut reports contain 4 different columns of identifying information:

  • PG.Genes, which is the gene name (e.g. CDK1).
  • PG.ProteinAccessions, which is the UniProt identifier number for a unique entry in the online database (e.g. P06493).
  • PG.ProteinDescriptions, which is the protein name as provided on UniProt (e.g. cyclin-dependent kinase 1).
  • PG.ProteinNames, which is a concatenation of an identifier and the species (e.g. CDK1_HUMAN).

Every entry in UniProt will have an accession number, but may not have all of the other identifiers, due to incomplete annotation. Because Uniprot includes entries for fragments of proteins and some proteins entries are redundant, a peptide can match to multiple entries for the same protein, which generates multiple possible identifiers in Spectronaut. Further, the ProteinNames entry in Spectronaut can switch formats: the preference is accession number and species, but can also be gene name and species instead.

This option tells msDiaLogue to substitute the accession number for an identifier if it tries to pull an identifier from a column with no information.

Note: Not all proteins can be identified unambiguously. In many cases, the identified peptides can be found in multiple protein sequences, which yields a protein group or protein cluster rather than a single protein identification. When this happens, the accession numbers for all potential matches are concatenated into one string, separated by periods. When you see long strings of multiple identifiers later in your data processing, this is why. Spectronaut sorts these alphanumerically, so you should not assume that the first protein in the list is most likely to be correct (as is the case in other search algorithms).

5. Saves a document to your working directory with all filtered out data, if desired

If saveRm = TRUE, the data removed in step 2 (preprocess_Filtered_Out_NaN.csv) and step 3 (preprocess_Filtered_Out_Unique.csv) will be saved in the current working directory.

As part of the preprocessing(), a histogram of log2log_2-transformed protein abundances is provided. This is a helpful way to confirm that the data have been read in correctly, and there are no issues with the numerical values of the protein abundances. Ideally, this histogram will appear fairly symmetrical (bell-shaped) without too much skew towards smaller or larger values.

## if the raw data is in a .csv file
fileName <- "../tests/testData/Toy_Spectronaut_Data.csv"
dataSet <- preprocessing(fileName,
                         filterNaN = TRUE, filterUnique = 2,
                         replaceBlank = TRUE, saveRm = TRUE)
Note: preprocessing() does not perform a transformation on your data. You still need to use the function transform().
## if the raw data is in an .Rdata file
load("../tests/testData/Toy_Spectronaut_Data.RData")
dataSet <- preprocessing(dataSet = Toy_Spectronaut_Data,
                         filterNaN = TRUE, filterUnique = 2,
                         replaceBlank = TRUE, saveRm = TRUE)
#> Warning: Removed 62 rows containing non-finite outside the scale range
#> (`stat_bin()`).

#> Summary of Full Data Signals (Raw):
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>     20.93    263.87    669.79   6897.92   1963.53 117803.49

#> Levels of Condition: 100pmol 200pmol 50pmol 
#> Levels of Replicate: 1 2 3 4
R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN KRT16_MOUSE F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
100pmol 1 1547.983 3168.32568 2819.7874 318.54376 495.5136 456.3309 213.21727 237.1306 111209.7 10737.953 15097.67 1799.391 630.1937 1311.8127 1279.6390 280.6318 299.51523 1154.5566 16461.2012 179.3190 516.1104 1234.587 27599.42 13798.590 23840.03 614.0895 990.5613 440.0417 132.31737 150.6033 3578.014 26872.50 109.55331 211.6450 1292.5234 1963.5321 189.79155 1106.1482 981.11432 180.6320 199.14555 209.7806 NA NA NA
100pmol 2 1680.730 4576.37158 1061.9502 404.25836 556.8611 501.0473 184.89574 314.0320 111659.9 10655.384 15840.28 NA 575.0490 1114.2773 1294.9751 271.8160 248.04329 1032.0381 1460.7496 213.1137 492.3771 1186.433 27221.59 13880.411 23963.31 640.2153 1077.4829 364.5241 128.78983 128.2592 3412.794 26742.22 155.37483 348.6104 1066.3511 1509.1512 153.90802 1303.6520 388.65823 122.7458 751.19849 247.3832 1420.1351 NA NA
100pmol 3 1414.811 4675.13281 2177.8496 275.09167 559.3206 NA 111.24314 501.2060 105982.9 10663.714 15022.21 NA 613.3968 1224.3837 946.0795 309.7599 270.67770 1808.1924 21555.3555 200.7485 342.1992 1227.435 26587.62 13723.719 22957.35 551.6828 1176.7791 319.0364 NA 118.5104 3499.113 26124.20 91.82145 319.1320 1003.3372 1342.4712 143.12419 1352.7024 430.13318 144.6799 171.13177 221.9161 1889.0665 835.6825 NA
100pmol 4 1620.490 3828.19971 2062.8384 385.05573 558.0967 422.0465 84.27336 334.6389 104442.6 10843.115 15160.49 NA 886.5406 1148.7343 1091.7800 NA 229.40149 901.5703 22937.2500 240.7981 418.1846 1190.952 26168.72 13944.603 22311.30 438.5425 1162.6656 351.5390 NA 137.8860 3481.821 25910.39 88.26187 217.7478 489.8084 1721.8601 99.95578 990.6649 393.55930 134.5238 145.17339 216.3736 1610.2407 950.3087 913.3416
200pmol 1 1512.770 4232.05078 2004.8613 338.27777 156.3478 364.5416 146.80331 NA 109245.3 19524.863 21577.97 2212.190 491.7787 1246.4460 1080.4132 270.1487 252.09808 1454.3271 21113.4512 223.8396 313.7860 1176.982 48693.35 24344.188 41234.67 364.7307 1203.0853 385.5154 65.40555 151.0895 3553.484 26261.47 81.22160 185.4865 939.8899 2149.7632 131.13179 381.0588 429.62201 239.4998 145.04378 424.7914 2337.8496 NA 837.8737
200pmol 2 1480.490 3496.84155 2177.9534 NA 550.4083 NA 135.78349 295.8571 113357.5 20072.297 22968.96 NA 669.7894 1068.2001 NA 285.4891 259.50000 1049.7526 25760.0527 190.3054 452.8294 1220.266 49866.29 24742.227 42899.43 633.5656 1234.5601 414.1271 NA 135.8605 3686.869 27638.89 69.56509 250.4035 1020.4291 725.6615 116.20615 877.0164 438.22589 133.4297 160.92671 155.0986 NA 1053.8444 1000.5491
200pmol 3 1555.834 356.43225 2280.6846 379.62103 564.2863 496.0772 103.30424 473.9141 114321.8 20787.127 20720.13 1451.198 586.7260 1378.0652 1194.8448 291.6754 184.18954 1123.7469 NA 174.5702 432.1681 1216.306 50704.73 24803.633 42904.95 446.4135 1082.7312 357.6343 NA 129.0676 3530.710 27101.22 62.08423 136.7023 1171.5715 1675.6870 109.60301 938.3956 568.89239 315.7039 146.75146 198.4779 1397.9890 837.2197 694.5791
200pmol 4 1529.628 350.70822 2223.3093 410.82349 292.9041 522.1325 95.18819 318.4948 116439.8 19924.240 22153.40 NA 539.0703 923.3237 1115.3848 322.9086 97.65465 957.0436 NA 164.7767 NA 1183.197 53744.70 26381.047 43279.84 527.1628 1121.3438 342.5055 NA 121.3068 3751.769 27545.24 70.39470 199.2453 996.0696 1696.6189 125.31519 611.6407 506.49115 204.4332 161.96100 376.5362 895.9138 NA NA
50pmol 1 1480.210 561.38837 189.9275 264.24271 308.9420 NA 599.90497 192.3859 117803.5 6758.298 12183.81 NA 594.8999 899.5010 1163.1122 291.4431 176.21545 620.2048 14107.1250 152.5492 292.2440 1186.543 16408.28 7169.955 14728.67 2984.7190 1029.7336 288.4770 891.24725 129.7482 3547.950 25668.78 846.95880 146.3040 NA 461.3821 86.84789 373.6308 49.93938 236.2902 20.92994 142.3466 NA NA NA
50pmol 2 1486.144 NA 1462.2559 325.74991 351.2331 NA 254.75084 308.6775 110086.7 6721.135 12521.78 NA 582.8912 531.7106 1119.5256 287.1180 103.58258 849.2368 24912.3613 140.6493 362.3117 1260.574 16444.63 7797.536 14736.71 857.5026 NA 361.4482 179.10303 166.8891 3530.004 26351.25 207.83086 165.6463 265.2173 1184.9562 93.91448 768.2026 489.40918 146.9422 88.41573 101.6087 NA NA NA
50pmol 3 1468.554 42.51457 1364.9075 83.99377 296.5147 396.0038 257.78970 279.2477 105640.2 6172.877 11926.22 1373.660 569.8922 NA 1067.0791 294.0919 88.48861 738.7719 666.5015 NA NA 1175.953 16618.11 7432.793 14160.20 916.4893 992.5451 319.6350 128.63672 120.6974 3458.023 26017.54 203.64948 132.5755 291.4759 932.9668 93.50905 547.0935 263.86734 313.0341 111.88376 85.4563 NA NA NA
50pmol 4 1497.531 927.07886 1435.5588 275.60831 242.4643 425.7305 197.71338 382.4084 110446.0 6028.398 12021.50 NA NA 593.1353 1302.1250 339.3387 30.13688 873.1840 15711.3106 142.4270 291.5121 1150.711 16282.51 7543.633 14758.73 886.7808 1138.6193 NA 152.56187 NA 3575.316 25969.99 190.47060 220.1901 676.8246 996.8993 31.57284 523.4712 450.08408 164.1874 143.96025 135.2896 NA NA NA

Transformation

Raw mass spectrometry intensity measurements are often unsuitable for direct statistical modeling because the shape of the data is usually not symmetrical and the variance is not consistent across the range of intensities. Most proteomic workflows will convert these raw values with a log2_2 transformation, which both reshapes the data into a more symmetrical distribution, making it easier to interpret mean-based fold changes, and also stabilizes the variance across the intensity range (i.e. reduces heteroscedasticity).

dataTran <- transform(dataSet, logFold = 2)

R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN KRT16_MOUSE F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
100pmol 1 10.59617 11.629505 11.461371 8.315348 8.952781 8.833937 7.736180 7.889538 16.76292 13.39043 13.88204 10.81329 9.299651 10.357346 10.321521 8.132535 8.226486 10.173123 14.006782 7.486384 9.011536 10.26981 14.75235 13.75223 14.54110 9.262305 9.952103 8.781496 7.047859 7.234610 11.80494 14.71384 6.775489 7.725502 10.335975 10.939236 7.568272 10.111329 9.938277 7.496910 7.637679 7.712738 NA NA NA
100pmol 2 10.71487 12.159989 10.052500 8.659134 9.121174 8.968803 7.530568 8.294768 16.76875 13.37929 13.95131 NA 9.167541 10.121893 10.338709 8.086487 7.954448 10.011280 10.512493 7.735480 8.943620 10.21241 14.73246 13.76076 14.54854 9.322413 10.073449 8.509870 7.008875 7.002919 11.73674 14.70683 7.279609 8.445472 10.058467 10.559522 7.265925 10.348343 8.602358 6.939530 9.553050 7.950604 10.471813 NA NA
100pmol 3 10.46639 12.190792 11.088689 8.103769 9.127531 NA 6.797573 8.969260 16.69347 13.38042 13.87481 NA 9.260677 10.257840 9.885818 8.275007 8.080432 10.820332 14.395759 7.649245 8.418693 10.26143 14.69847 13.74438 14.48667 9.107695 10.200628 8.317577 NA 6.888870 11.77277 14.67310 6.520759 8.318009 9.970591 10.390675 7.161124 10.401629 8.748640 7.176720 7.418964 7.793871 10.883458 9.706811 NA
100pmol 4 10.66221 11.902450 11.010415 8.588923 9.124371 8.721258 6.397005 8.386462 16.67235 13.40449 13.88803 NA 9.792043 10.165829 10.092467 NA 7.841731 9.816296 14.485405 7.911680 8.707996 10.21790 14.67556 13.76742 14.44549 8.776573 10.183221 8.457541 NA 7.107332 11.76563 14.66124 6.463718 7.766514 8.936074 10.749752 6.643218 9.952253 8.620437 7.071718 7.181633 7.757381 10.653061 9.892252 9.835011
200pmol 1 10.56298 12.047141 10.969287 8.402065 7.288615 8.509940 7.197741 NA 16.73721 14.25302 14.39727 11.11126 8.941866 10.283605 10.077367 8.077610 7.977841 10.506136 14.365875 7.806321 8.293637 10.20088 15.57144 14.57129 15.33157 8.510688 10.232523 8.590645 6.031341 7.239260 11.79502 14.68066 6.343792 7.535170 9.876348 11.069962 7.034874 8.573870 8.746924 7.903880 7.180345 8.730611 11.190966 NA 9.710589
200pmol 2 10.53186 11.771837 11.088757 NA 9.104358 NA 7.085164 8.208757 16.79052 14.29292 14.48740 NA 9.387564 10.060966 NA 8.157292 8.019591 10.035834 14.652848 7.572173 8.822824 10.25298 15.60578 14.59469 15.38867 9.307350 10.269781 8.693930 NA 7.085982 11.84818 14.75441 6.120292 7.968111 9.994960 9.503153 6.860543 9.776460 8.775531 7.059936 7.330260 7.277041 NA 10.041446 9.966576
200pmol 3 10.60347 8.477484 11.155251 8.568416 9.140283 8.954421 6.690756 8.888482 16.80274 14.34340 14.33875 10.50303 9.196543 10.428428 10.222608 8.188220 7.525047 10.134101 NA 7.447663 8.755449 10.24829 15.62983 14.59826 15.38886 8.802237 10.080459 8.482341 NA 7.011984 11.78574 14.72607 5.956155 7.094894 10.194229 10.710537 6.776144 9.874052 9.152012 8.302428 7.197231 7.632834 10.449137 9.709462 9.439995
200pmol 4 10.57897 8.454127 11.118493 8.682375 8.194285 9.028272 6.572711 8.315126 16.82923 14.28224 14.43524 NA 9.074329 9.850693 10.123326 8.334982 6.609617 9.902441 NA 7.364369 NA 10.20847 15.71383 14.68721 15.40141 9.042105 10.131013 8.419983 NA 6.922516 11.87336 14.74952 6.137395 7.638402 9.960103 10.728447 6.969417 9.256541 8.984393 7.675486 7.339503 8.556645 9.807216 NA NA
50pmol 1 10.53159 9.132855 7.569305 8.045720 8.271192 NA 9.228590 7.587860 16.84602 12.72244 13.57268 NA 9.216503 9.812981 10.183775 8.187071 7.461197 9.276601 13.784136 7.253131 8.191030 10.21255 14.00214 12.80775 13.84634 11.543379 10.008055 8.172313 9.799682 7.019571 11.79277 14.64773 9.726148 7.192825 NA 8.849818 6.440419 8.545470 5.642106 7.884416 4.387496 7.153265 NA NA NA
50pmol 2 10.53736 NA 10.513980 8.347621 8.456285 NA 7.992943 8.269956 16.74828 12.71449 13.61215 NA 9.187083 9.054498 10.128672 8.165500 6.694638 9.730023 14.604574 7.135959 8.501088 10.29986 14.00533 12.92880 13.84713 9.743997 NA 8.497645 7.484646 7.382746 11.78545 14.68558 7.699266 7.371963 8.051031 10.210618 6.553276 9.585343 8.934897 7.199104 6.466231 6.666879 NA NA NA
50pmol 3 10.52018 5.409885 10.414587 6.392210 8.211960 8.629371 8.010051 8.125402 16.68880 12.59173 13.54185 10.42381 9.154545 NA 10.059451 8.200124 6.467420 9.528985 9.380464 NA NA 10.19961 14.02047 12.85969 13.78955 9.839974 9.954989 8.320282 7.007159 6.915251 11.75573 14.66720 7.669944 7.050670 8.187233 9.865682 6.547034 9.095644 8.043669 8.290176 6.805857 6.417115 NA NA NA
50pmol 4 10.54837 9.856548 10.487397 8.106476 7.921629 8.733797 7.627267 8.578971 16.75298 12.55756 13.55333 NA NA 9.212217 10.346652 8.406582 4.913458 9.770142 13.939516 7.154078 8.187412 10.16831 13.99104 12.88104 13.84928 9.792434 10.153070 NA 7.253251 NA 11.80386 14.66456 7.573424 7.782606 9.402638 9.961304 4.980612 9.031966 8.814051 7.359200 7.169527 7.079907 NA NA NA

Filtering

In some cases, a researcher may wish to filter out a specific protein or proteins from the dataset. The most common instance of this would be proteins identified from the common contaminants database, where the identification is necessary to avoid incorrect matching but the result is irrelevant to the experimental question and would not be included in data visualization. Other scenarios might include a mixed-species experiment where the researcher wants to evaluate data from only one species at a time. This step allows you to set aside specific proteins from downstream analysis, using the gene_species identifier format.

Note: The proteins to be selected or removed is the union of those specified in listName and those matching the regular expression pattern in regexName.
Keep in mind: Removal of any proteins, including common contaminants, will affect any global calculations performed after this step (such as normalization). This should not be done without a clear understanding of how this will affect your results.

Case 1. Remove proteins specified by the user in this step and keep everything else.

For example, the proteins named “ALBU_BOVIN” and those containing “HUMAN” are chosen to be filtered out.

filterOutIn(dataTran, listName = "ALBU_BOVIN", regexName = "HUMAN",
            removeList = TRUE, saveRm = TRUE)

where removeList = TRUE indicates the removal of proteins from the union of listName and regexName in dataTran. Please note that if saveRm = TRUE, the excluded data (“ALBU_BOVIN” + “*HUMAN”) will be saved as a .csv file named filtered_out_data.csv in the current working directory.

R.Condition R.Replicate CYC_BOVIN TRFE_BOVIN KRT16_MOUSE ADH1_YEAST LYSC_CHICK BGAL_ECOLI
100pmol 1 13.39043 13.88204 10.81329 14.75235 13.75223 14.54110
100pmol 2 13.37929 13.95131 NA 14.73246 13.76076 14.54854
100pmol 3 13.38042 13.87481 NA 14.69847 13.74438 14.48667
100pmol 4 13.40449 13.88803 NA 14.67556 13.76742 14.44549
200pmol 1 14.25302 14.39727 11.11126 15.57144 14.57129 15.33157
200pmol 2 14.29292 14.48740 NA 15.60578 14.59469 15.38867
200pmol 3 14.34340 14.33875 10.50303 15.62983 14.59826 15.38886
200pmol 4 14.28224 14.43524 NA 15.71383 14.68721 15.40141
50pmol 1 12.72244 13.57268 NA 14.00214 12.80775 13.84634
50pmol 2 12.71449 13.61215 NA 14.00533 12.92880 13.84713
50pmol 3 12.59173 13.54185 10.42381 14.02047 12.85969 13.78955
50pmol 4 12.55756 13.55333 NA 13.99104 12.88104 13.84928

Case 2. Keep the proteins specified by the user in this step and remove everything else.

Alternatively, if we would to keep proteins like “ALBU_BOVIN” and “*HUMAN”, simply set removelist = FALSE.

filterOutIn(dataTran, listName = "ALBU_BOVIN", regexName = "HUMAN",
            removeList = FALSE)
R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
100pmol 1 10.59617 11.629505 11.461371 8.315348 8.952781 8.833937 7.736180 7.889538 16.76292 9.299651 10.357346 10.321521 8.132535 8.226486 10.173123 14.006782 7.486384 9.011536 10.26981 9.262305 9.952103 8.781496 7.047859 7.234610 11.80494 14.71384 6.775489 7.725502 10.335975 10.939236 7.568272 10.111329 9.938277 7.496910 7.637679 7.712738 NA NA NA
100pmol 2 10.71487 12.159989 10.052500 8.659134 9.121174 8.968803 7.530568 8.294768 16.76875 9.167541 10.121893 10.338709 8.086487 7.954448 10.011280 10.512493 7.735480 8.943620 10.21241 9.322413 10.073449 8.509870 7.008875 7.002919 11.73674 14.70683 7.279609 8.445472 10.058467 10.559522 7.265925 10.348343 8.602358 6.939530 9.553050 7.950604 10.471813 NA NA
100pmol 3 10.46639 12.190792 11.088689 8.103769 9.127531 NA 6.797573 8.969260 16.69347 9.260677 10.257840 9.885818 8.275007 8.080432 10.820332 14.395759 7.649245 8.418693 10.26143 9.107695 10.200628 8.317577 NA 6.888870 11.77277 14.67310 6.520759 8.318009 9.970591 10.390675 7.161124 10.401629 8.748640 7.176720 7.418964 7.793871 10.883458 9.706811 NA
100pmol 4 10.66221 11.902450 11.010415 8.588923 9.124371 8.721258 6.397005 8.386462 16.67235 9.792043 10.165829 10.092467 NA 7.841731 9.816296 14.485405 7.911680 8.707996 10.21790 8.776573 10.183221 8.457541 NA 7.107332 11.76563 14.66124 6.463718 7.766514 8.936074 10.749752 6.643218 9.952253 8.620437 7.071718 7.181633 7.757381 10.653061 9.892252 9.835011
200pmol 1 10.56298 12.047141 10.969287 8.402065 7.288615 8.509940 7.197741 NA 16.73721 8.941866 10.283605 10.077367 8.077610 7.977841 10.506136 14.365875 7.806321 8.293637 10.20088 8.510688 10.232523 8.590645 6.031341 7.239260 11.79502 14.68066 6.343792 7.535170 9.876348 11.069962 7.034874 8.573870 8.746924 7.903880 7.180345 8.730611 11.190966 NA 9.710589
200pmol 2 10.53186 11.771837 11.088757 NA 9.104358 NA 7.085164 8.208757 16.79052 9.387564 10.060966 NA 8.157292 8.019591 10.035834 14.652848 7.572173 8.822824 10.25298 9.307350 10.269781 8.693930 NA 7.085982 11.84818 14.75441 6.120292 7.968111 9.994960 9.503153 6.860543 9.776460 8.775531 7.059936 7.330260 7.277041 NA 10.041446 9.966576
200pmol 3 10.60347 8.477484 11.155251 8.568416 9.140283 8.954421 6.690756 8.888482 16.80274 9.196543 10.428428 10.222608 8.188220 7.525047 10.134101 NA 7.447663 8.755449 10.24829 8.802237 10.080459 8.482341 NA 7.011984 11.78574 14.72607 5.956155 7.094894 10.194229 10.710537 6.776144 9.874052 9.152012 8.302428 7.197231 7.632834 10.449137 9.709462 9.439995
200pmol 4 10.57897 8.454127 11.118493 8.682375 8.194285 9.028272 6.572711 8.315126 16.82923 9.074329 9.850693 10.123326 8.334982 6.609617 9.902441 NA 7.364369 NA 10.20847 9.042105 10.131013 8.419983 NA 6.922516 11.87336 14.74952 6.137395 7.638402 9.960103 10.728447 6.969417 9.256541 8.984393 7.675486 7.339503 8.556645 9.807216 NA NA
50pmol 1 10.53159 9.132855 7.569305 8.045720 8.271192 NA 9.228590 7.587860 16.84602 9.216503 9.812981 10.183775 8.187071 7.461197 9.276601 13.784136 7.253131 8.191030 10.21255 11.543379 10.008055 8.172313 9.799682 7.019571 11.79277 14.64773 9.726148 7.192825 NA 8.849818 6.440419 8.545470 5.642106 7.884416 4.387496 7.153265 NA NA NA
50pmol 2 10.53736 NA 10.513980 8.347621 8.456285 NA 7.992943 8.269956 16.74828 9.187083 9.054498 10.128672 8.165500 6.694638 9.730023 14.604574 7.135959 8.501088 10.29986 9.743997 NA 8.497645 7.484646 7.382746 11.78545 14.68558 7.699266 7.371963 8.051031 10.210618 6.553276 9.585343 8.934897 7.199104 6.466231 6.666879 NA NA NA
50pmol 3 10.52018 5.409885 10.414587 6.392210 8.211960 8.629371 8.010051 8.125402 16.68880 9.154545 NA 10.059451 8.200124 6.467420 9.528985 9.380464 NA NA 10.19961 9.839974 9.954989 8.320282 7.007159 6.915251 11.75573 14.66720 7.669944 7.050670 8.187233 9.865682 6.547034 9.095644 8.043669 8.290176 6.805857 6.417115 NA NA NA
50pmol 4 10.54837 9.856548 10.487397 8.106476 7.921629 8.733797 7.627267 8.578971 16.75298 NA 9.212217 10.346652 8.406582 4.913458 9.770142 13.939516 7.154078 8.187412 10.16831 9.792434 10.153070 NA 7.253251 NA 11.80386 14.66456 7.573424 7.782606 9.402638 9.961304 4.980612 9.031966 8.814051 7.359200 7.169527 7.079907 NA NA NA

Extension

Besides protein names, the function filterProtein() provides a similar function to filter proteins by additional protein information.

  • For Spectronaut: “PG.Genes”, “PG.ProteinAccessions”, “PG.ProteinDescriptions”, and “PG.ProteinNames”.

  • For Scaffold: “ProteinDescriptions”, “AccessionNumber”, and “AlternateID”.

filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
              text = c("Putative zinc finger protein 840", "Bovine serum albumin"),
              by = "PG.ProteinDescriptions",
              removeList = FALSE)

where proteinInformation is the file name for protein information, automatically generated by preprocessing(). In this case, the proteins whose "PG.ProteinDescriptions" match with “Putative zinc finger protein 840” or “Bovine serum albumin” will be kept. Note that the search value text is used for exact equality search.

R.Condition R.Replicate ZN840_HUMAN ALBU_BOVIN
100pmol 1 8.315348 16.76292
100pmol 2 8.659134 16.76875
100pmol 3 8.103769 16.69347
100pmol 4 8.588923 16.67235
200pmol 1 8.402065 16.73721
200pmol 2 NA 16.79052
200pmol 3 8.568416 16.80274
200pmol 4 8.682375 16.82923
50pmol 1 8.045720 16.84602
50pmol 2 8.347621 16.74828
50pmol 3 6.392210 16.68880
50pmol 4 8.106476 16.75298

Normalization

Normalization is designed to address systematic biases in the data. Biases can arise from inadvertent sample grouping during generation or preparation, from variations in instrument performance during acquisition, analysis of different peptide amounts across experiments, or other reasons. These factors can artificially mask or enhance actual biological changes.

Many normalization methods have been developed for large datasets, each with its own strengths and weaknesses. The following factors should be considered when choosing a normalization method:

  1. Experiment-Specific Normalization:
    Most experiments run with UConn PMF are normalized by injection amount at the time of analysis to facilitate comparison. “Amount” is measured by UV absorbance at 280 nm, a standard method for generic protein quantification.

  2. Assumption of Non-Changing Species:
    Most biological experiments implicitly assume that the majority of measured species in an experiment will not change across conditions. This assumption is more robust if there are thousands of species, compared to only hundreds, or tens, so for experiments of very different complexities (e.g. a purified protein vs. an immunoprecipitation vs. a full lysate), normalization should not be applied as a global process, but instead only on subsets of experiments that are relatively similar to each other.

So far, this package provides three normalization methods for use:

  1. “quant”: Quantile (Bolstad et al. 2003) (values in each run are ranked, quantile bins are applied to the entire dataset, and values in each run adjusted to their closest bin value)

  2. “median”: Protein-wise Median (a scalar factor is applied to each protein entry to make the median of each sample equal to every other sample)

  3. “mean”: Protein-wise Mean (a scalar factor is applied to each protein entry to make the mean of each sample equal to every other sample)

Quantile normalization is generally recommended by UConn SCS.

dataNorm <- normalize(dataTran, normalizeType = "quant")
#> Warning: Removed 55 rows containing non-finite outside the scale range
#> (`stat_boxplot()`).

Oh! The message “Warning: Removed 55 rows containing non-finite values” indicates the presence of 55 NA (Not Available) values in the data. These NA values arise when a protein was not identified in a particular sample or condition and are automatically excluded when generating the boxplot but retained in the actual dataset.

R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN KRT16_MOUSE F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
100pmol 1 10.37045 11.406514 10.956950 8.392426 8.710518 8.610420 7.829510 8.023133 16.75777 12.96499 13.97388 10.51096 9.136271 10.231965 10.048461 8.179306 8.279169 9.874410 14.201118 7.001503 8.832972 9.978488 15.16303 13.62766 14.44005 8.964155 9.574185 8.517979 6.420716 6.764393 12.07953 14.76033 6.004586 7.670711 10.129049 10.681337 7.242036 9.727210 9.376507 7.109682 7.393910 7.530379 NA NA NA
100pmol 2 11.40651 12.964987 9.727210 8.517979 8.832972 8.710518 7.242036 8.023133 16.75777 13.62766 14.20112 NA 8.964155 10.048461 10.231965 7.829510 7.670711 9.574185 10.681337 7.393910 8.610420 10.129049 15.16303 13.97388 14.44005 9.136271 9.978488 8.279169 6.764393 6.420716 12.07953 14.76033 7.109682 8.179306 9.874410 10.956950 7.001503 10.370449 8.392426 6.004586 9.376507 7.530379 10.510962 NA NA
100pmol 3 10.32522 11.893804 10.851852 7.868171 8.887142 NA 6.429596 8.646475 16.75777 12.81909 13.94448 NA 9.027184 9.940284 9.504539 8.074082 7.698334 10.467264 14.178809 7.413486 8.433160 10.022816 15.15272 13.55168 14.42169 8.758911 9.812352 8.213284 NA 6.777937 11.29130 14.74334 6.004586 8.311138 9.649565 10.097660 7.009435 10.195272 8.550204 7.121143 7.262869 7.555404 10.625304 9.244669 NA
100pmol 4 10.51096 12.079525 10.956950 8.279169 8.964155 8.610420 6.004586 8.023133 16.75777 12.96499 13.97388 NA 9.136271 10.048461 9.978488 NA 7.670711 9.376507 14.440054 7.829510 8.517979 10.231965 15.16303 13.62766 14.20112 8.710518 10.129049 8.179306 NA 7.109682 11.40651 14.76033 6.420716 7.530379 8.832972 10.681337 6.764393 9.874410 8.392426 7.001503 7.242036 7.393910 10.370449 9.727210 9.574185
200pmol 1 10.27762 12.256403 10.413259 8.356482 7.375266 8.476493 7.098768 NA 16.75777 13.10393 14.00189 10.74088 9.255807 10.077232 9.798890 8.142561 7.974610 10.164570 13.700017 7.644405 8.253204 9.927121 15.17286 14.22236 14.77650 8.577167 10.010298 8.782531 6.004586 7.222195 11.51625 14.45755 6.412260 7.506546 9.641587 10.556023 6.751494 8.669939 9.040857 7.792691 6.993948 8.905805 11.057043 NA 9.504539
200pmol 2 10.53171 11.078642 10.729201 NA 8.732804 NA 7.026553 8.267212 16.75777 12.49496 13.38772 NA 9.012072 10.120238 NA 8.149770 7.964140 9.954832 14.130668 7.608925 8.620541 10.229206 15.13045 13.88102 14.70670 8.866513 10.377729 8.386314 NA 7.145874 11.59517 14.38206 6.004586 7.766206 9.827232 9.232358 6.448756 9.504539 8.520402 6.807164 7.455730 7.307825 NA 10.036651 9.658384
200pmol 3 11.05704 8.142561 12.256403 8.356482 8.905805 8.782531 6.412260 8.669939 16.75777 14.00189 13.70002 10.74088 9.255807 10.413259 10.164570 7.792691 7.506546 10.010298 NA 7.375266 8.476493 10.277619 15.17286 14.22236 14.77650 8.577167 9.927121 8.253204 NA 6.993948 13.10393 14.45755 6.004586 7.098768 10.077232 11.516245 6.751494 9.798890 9.040857 7.974610 7.222195 7.644405 10.556023 9.641587 9.504539
200pmol 4 10.72920 8.520402 11.595175 8.732804 7.964140 9.012072 6.448756 8.149770 16.75777 13.38772 13.88102 NA 9.504539 9.954832 10.229206 8.267212 6.807164 10.036651 NA 7.455730 NA 10.531713 15.13045 14.13067 14.70670 9.232358 10.377729 8.386314 NA 7.026553 12.49496 14.38206 6.004586 7.608925 10.120238 11.078642 7.145874 9.658384 8.866513 7.766206 7.307825 8.620541 9.827232 NA NA
50pmol 1 10.72920 9.232358 7.766206 8.267212 8.732804 NA 9.658384 7.964140 16.75777 12.49496 13.88102 NA 9.504539 10.120238 10.377729 8.520402 7.608925 9.827232 14.130668 7.455730 8.620541 10.531713 14.70670 13.38772 14.38206 11.078642 10.229206 8.386314 10.036651 7.026553 11.59517 15.13045 9.954832 7.307825 NA 9.012072 6.807164 8.866513 6.448756 8.149770 6.004586 7.145874 NA NA NA
50pmol 2 10.96831 NA 10.662903 8.659793 8.785723 NA 8.190682 8.555305 16.75777 12.30540 13.84672 NA 9.753718 9.581714 10.189464 8.429646 7.035806 10.008606 14.686886 7.159242 9.099265 10.482590 14.36063 13.25790 14.10465 10.086066 NA 8.926299 7.806291 7.637117 11.47571 15.11842 8.017625 7.480137 8.298269 10.329078 6.459113 9.911682 9.362666 7.332126 6.004586 6.822962 NA NA NA
50pmol 3 11.59517 6.004586 10.729201 6.448756 8.732804 9.232358 8.149770 8.386314 16.75777 13.38772 14.13067 11.07864 9.658384 NA 10.377729 8.620541 7.026553 9.954832 9.827232 NA NA 10.531713 14.70670 13.88102 14.38206 10.036651 10.229206 9.012072 7.608925 7.455730 12.49496 15.13045 7.964140 7.766206 8.520402 10.120238 7.145874 9.504539 8.267212 8.866513 7.307825 6.807164 NA NA NA
50pmol 4 10.96831 10.008606 10.662903 8.298269 8.190682 8.785723 7.806291 8.659793 16.75777 12.30540 13.84672 NA NA 9.362666 10.482590 8.555305 6.004586 9.753718 14.360635 7.035806 8.429646 10.329078 14.68689 13.25790 14.10465 9.911682 10.189464 NA 7.332126 NA 11.47571 15.11842 7.637117 8.017625 9.581714 10.086066 6.459113 9.099265 8.926299 7.480137 7.159242 6.822962 NA NA NA

Imputation

The two primary MS/MS acquisition types implemented in large scale MS-based proteomics have unique advantages and disadvantages. Traditional Data-Dependent Acquisition (DDA) methods favor specificity in MS/MS sampling over comprehensive proteome coverage. Small peptide isolation windows (<3 m/z) result in MS/MS spectra that contain fragmentation data from ideally only one peptide. This specificity promotes clear peptide identifications but comes at the expense of added scan time. In DDA experiments, the number of peptides that can be selected for MS/MS is limited by instrument scan speeds and is therefore prioritized by highest peptide abundance. Low abundance peptides are sampled less frequently for MS/MS and this can result in variable peptide coverage and many missing protein data across large sample datasets.

Data-Independent Acquisition (DIA) methods promote comprehensive peptide coverage over specificity by sampling many peptides for MS/MS simultaneously. Sequential and large mass isolation windows (4-50 m/z) are used to isolate large numbers of peptides at once for concurrent MS/MS. This produces complicated fragmentation spectra, but these spectra contain data on every observable peptide. A major disadvantage with this type of acquisition is that DIA MS/MS spectra are incredibly complex and difficult to deconvolve. Powerful and relatively new software programs like Spectronaut are capable of successfully parsing out which fragment ions came from each co-fragmented peptide using custom libraries, machine learning algorithms, and precisely determined retention times or measured ion mobility data. Because all observable ions are sampled for MS/MS, DIA reduces missingness substantially compared to DDA, though not entirely.

Function dataMissing() is designed to summarize the missingness for each protein, where plot = TRUE indicates plotting the missingness, and show_labels = TRUE means that the protein names are displayed in the printed plot. Note that the visual representation is not generated by default, and the plot generation time varies with project size.

dataMissing <- dataMissing(dataNorm, plot = TRUE, show_labels = TRUE)

The percentage in the protein labels represents the proportion of missing data in the samples for that protein. For instance, the label “ZN840_HUMAN (8%)” indicates that, within all observations for the protein “ZN840_HUMAN”, 8% of the data is missing. Additionally, the percentage in the legend represents the proportion of missing data in the whole dataset. In this case, 10.2% of the data in dataNorm is missing.

Regardless of plot generation, the function dataMissing() always returns a table providing the following information:

  • count_miss: The count of missing values for each protein.

  • pct_miss_col: The percentage of missing values for each protein.

  • pct_miss_tot: The percentage of missing values for each protein relative to the total missing values in the entire dataset.

NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN KRT16_MOUSE F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
count_miss 0 1.000000 0 1.000000 0 4.000000 0 1.000000 0 0 0 8.00000 1.000000 1.000000 1.000000 1.000000 0 0 2.000000 1.000000 2.000000 0 0 0 0 0 1.000000 1.000000 5.000000 1.000000 0 0 0 0 1.000000 0 0 0 0 0 0 0 6.00000 8.00000 8.00000
pct_miss_col 0 8.333333 0 8.333333 0 33.333333 0 8.333333 0 0 0 66.66667 8.333333 8.333333 8.333333 8.333333 0 0 16.666667 8.333333 16.666667 0 0 0 0 0 8.333333 8.333333 41.666667 8.333333 0 0 0 0 8.333333 0 0 0 0 0 0 0 50.00000 66.66667 66.66667
pct_miss_tot 0 1.818182 0 1.818182 0 7.272727 0 1.818182 0 0 0 14.54545 1.818182 1.818182 1.818182 1.818182 0 0 3.636364 1.818182 3.636364 0 0 0 0 0 1.818182 1.818182 9.090909 1.818182 0 0 0 0 1.818182 0 0 0 0 0 0 0 10.90909 14.54545 14.54545

For example, in the case of the protein “ZN840_HUMAN,” there are 1 NA values in the samples, representing 8.33% of the missing data for “ZN840_HUMAN” within that sample and 1.82% of the total missing data in the entire dataset.

Various imputation methods have been developed to address the missing-value issue and assign a reasonable guess of quantitative value to proteins with missing values. So far, this package provides 10 imputation methods for use:

  1. impute.min_local(): Replaces missing values with the lowest measured value for that protein in that condition.

  2. impute.min_global(): Replaces missing values with the lowest measured value from any protein found within the entire dataset.

  3. impute.knn(): Replaces missing values using the k-nearest neighbors algorithm (Troyanskaya et al. 2001).

  4. impute.knn_seq(): Replaces missing values using the sequential k-nearest neighbors algorithm (Kim, Kim, and Yi 2004).

  5. impute.knn_trunc(): Replaces missing values using the truncated k-nearest neighbors algorithm (Shah et al. 2017).

  6. impute.nuc_norm(): Replaces missing values using the nuclear-norm regularization (Hastie et al. 2015).

  7. impute.mice_cart(): Replaces missing values using the classification and regression trees (Breiman et al. 1984; Doove, van Buuren, and Dusseldorp 2014; van Buuren 2018).

  8. impute.mice_norm(): Replaces missing values using the Bayesian linear regression (Rubin 1987; Schafer 1997; van Buuren and Groothuis-Oudshoorn 2011).

  9. impute.pca_bayes(): Replaces missing values using the Bayesian principal components analysis (Oba et al. 2003).

  10. impute.pca_prob(): Replaces missing values using the probabilistic principal components analysis (Stacklies et al. 2007).

Additional methods will be added later.

For example, to impute the NA value of dataNorm using impute.min_local(), set the required percentage of values that must be present in a given protein by condition combination for values to be imputed to 51%.

Note: There is no rule in the field of proteomics for filtering based on percentage of missingness, similar to there being no rule for the number of replicates required to draw a conclusion. However, reproducible observations make conclusions more credible. Setting the reqPercentPresent to 0.51 requires that any protein be observed in a majority of the replicates by condition in order to be considered. For 3 replicates, this would require 2 measurements to allow imputation of the 3rd value. If only 1 measurement is seen, the other values will remain NA, and will be filtered out in a subsequent step.
dataImput <- impute.min_local(dataNorm, reportImputing = FALSE,
                              reqPercentPresent = 0.51)
R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN TMC5B_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN KRT16_MOUSE F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TCPR2_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN PIP_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN A0A7I2PK40_HUMAN NBDY_HUMAN H0Y5R1_HUMAN
100pmol 1 10.37045 11.406514 10.956950 8.392426 8.710518 8.610420 7.829510 8.023133 16.75777 12.96499 13.97388 10.51096 9.136271 10.231965 10.048461 8.179306 8.279169 9.874410 14.201118 7.001503 8.832972 9.978488 15.16303 13.62766 14.44005 8.964155 9.574185 8.517979 6.420716 6.764393 12.07953 14.76033 6.004586 7.670711 10.129049 10.681337 7.242036 9.727210 9.376507 7.109682 7.393910 7.530379 10.370449 NA NA
100pmol 2 11.40651 12.964987 9.727210 8.517979 8.832972 8.710518 7.242036 8.023133 16.75777 13.62766 14.20112 NA 8.964155 10.048461 10.231965 7.829510 7.670711 9.574185 10.681337 7.393910 8.610420 10.129049 15.16303 13.97388 14.44005 9.136271 9.978488 8.279169 6.764393 6.420716 12.07953 14.76033 7.109682 8.179306 9.874410 10.956950 7.001503 10.370449 8.392426 6.004586 9.376507 7.530379 10.510962 NA NA
100pmol 3 10.32522 11.893804 10.851852 7.868171 8.887142 8.610420 6.429596 8.646475 16.75777 12.81909 13.94448 NA 9.027184 9.940284 9.504539 8.074082 7.698334 10.467264 14.178809 7.413486 8.433160 10.022816 15.15272 13.55168 14.42169 8.758911 9.812352 8.213284 NA 6.777937 11.29130 14.74334 6.004586 8.311138 9.649565 10.097660 7.009435 10.195272 8.550204 7.121143 7.262869 7.555404 10.625304 9.244669 NA
100pmol 4 10.51096 12.079525 10.956950 8.279169 8.964155 8.610420 6.004586 8.023133 16.75777 12.96499 13.97388 NA 9.136271 10.048461 9.978488 7.829510 7.670711 9.376507 14.440054 7.829510 8.517979 10.231965 15.16303 13.62766 14.20112 8.710518 10.129049 8.179306 NA 7.109682 11.40651 14.76033 6.420716 7.530379 8.832972 10.681337 6.764393 9.874410 8.392426 7.001503 7.242036 7.393910 10.370449 9.727210 9.574185
200pmol 1 10.27762 12.256403 10.413259 8.356482 7.375266 8.476493 7.098768 8.149770 16.75777 13.10393 14.00189 10.74088 9.255807 10.077232 9.798890 8.142561 7.974610 10.164570 13.700017 7.644405 8.253204 9.927121 15.17286 14.22236 14.77650 8.577167 10.010298 8.782531 6.004586 7.222195 11.51625 14.45755 6.412260 7.506546 9.641587 10.556023 6.751494 8.669939 9.040857 7.792691 6.993948 8.905805 11.057043 NA 9.504539
200pmol 2 10.53171 11.078642 10.729201 8.356482 8.732804 8.476493 7.026553 8.267212 16.75777 12.49496 13.38772 NA 9.012072 10.120238 9.798890 8.149770 7.964140 9.954832 14.130668 7.608925 8.620541 10.229206 15.13045 13.88102 14.70670 8.866513 10.377729 8.386314 NA 7.145874 11.59517 14.38206 6.004586 7.766206 9.827232 9.232358 6.448756 9.504539 8.520402 6.807164 7.455730 7.307825 9.827232 10.036651 9.658384
200pmol 3 11.05704 8.142561 12.256403 8.356482 8.905805 8.782531 6.412260 8.669939 16.75777 14.00189 13.70002 10.74088 9.255807 10.413259 10.164570 7.792691 7.506546 10.010298 NA 7.375266 8.476493 10.277619 15.17286 14.22236 14.77650 8.577167 9.927121 8.253204 NA 6.993948 13.10393 14.45755 6.004586 7.098768 10.077232 11.516245 6.751494 9.798890 9.040857 7.974610 7.222195 7.644405 10.556023 9.641587 9.504539
200pmol 4 10.72920 8.520402 11.595175 8.732804 7.964140 9.012072 6.448756 8.149770 16.75777 13.38772 13.88102 NA 9.504539 9.954832 10.229206 8.267212 6.807164 10.036651 NA 7.455730 8.253204 10.531713 15.13045 14.13067 14.70670 9.232358 10.377729 8.386314 NA 7.026553 12.49496 14.38206 6.004586 7.608925 10.120238 11.078642 7.145874 9.658384 8.866513 7.766206 7.307825 8.620541 9.827232 NA 9.504539
50pmol 1 10.72920 9.232358 7.766206 8.267212 8.732804 NA 9.658384 7.964140 16.75777 12.49496 13.88102 NA 9.504539 10.120238 10.377729 8.520402 7.608925 9.827232 14.130668 7.455730 8.620541 10.531713 14.70670 13.38772 14.38206 11.078642 10.229206 8.386314 10.036651 7.026553 11.59517 15.13045 9.954832 7.307825 8.298269 9.012072 6.807164 8.866513 6.448756 8.149770 6.004586 7.145874 NA NA NA
50pmol 2 10.96831 6.004586 10.662903 8.659793 8.785723 NA 8.190682 8.555305 16.75777 12.30540 13.84672 NA 9.753718 9.581714 10.189464 8.429646 7.035806 10.008606 14.686886 7.159242 9.099265 10.482590 14.36063 13.25790 14.10465 10.086066 10.189464 8.926299 7.806291 7.637117 11.47571 15.11842 8.017625 7.480137 8.298269 10.329078 6.459113 9.911682 9.362666 7.332126 6.004586 6.822962 NA NA NA
50pmol 3 11.59517 6.004586 10.729201 6.448756 8.732804 9.232358 8.149770 8.386314 16.75777 13.38772 14.13067 11.07864 9.658384 9.362666 10.377729 8.620541 7.026553 9.954832 9.827232 7.035806 8.429646 10.531713 14.70670 13.88102 14.38206 10.036651 10.229206 9.012072 7.608925 7.455730 12.49496 15.13045 7.964140 7.766206 8.520402 10.120238 7.145874 9.504539 8.267212 8.866513 7.307825 6.807164 NA NA NA
50pmol 4 10.96831 10.008606 10.662903 8.298269 8.190682 8.785723 7.806291 8.659793 16.75777 12.30540 13.84672 NA 9.504539 9.362666 10.482590 8.555305 6.004586 9.753718 14.360635 7.035806 8.429646 10.329078 14.68689 13.25790 14.10465 9.911682 10.189464 8.386314 7.332126 7.026553 11.47571 15.11842 7.637117 8.017625 9.581714 10.086066 6.459113 9.099265 8.926299 7.480137 7.159242 6.822962 NA NA NA

If reportImputing = TRUE, the returned result structure will be altered to a list, adding a shadow data frame with imputed data labels, where 1 indicates the corresponding entries have been imputed, and 0 indicates otherwise.

After the above imputation, any entries that did not pass the percent present threshold will still have NA values and will need to be filtered out.

dataImput <- filterNA(dataImput, saveRm = TRUE)

where saveRm = TRUE indicates that the filtered data will be saved as a .csv file named filtered_NA_data.csv in the current working directory.

The dataImput is as follows:

R.Condition R.Replicate NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN
100pmol 1 10.37045 11.406514 10.956950 8.392426 8.710518 7.829510 8.023133 16.75777 12.96499 13.97388 9.136271 10.231965 10.048461 8.179306 8.279169 9.874410 7.001503 8.832972 9.978488 15.16303 13.62766 14.44005 8.964155 9.574185 8.517979 6.764393 12.07953 14.76033 6.004586 7.670711 10.129049 10.681337 7.242036 9.727210 9.376507 7.109682 7.393910 7.530379
100pmol 2 11.40651 12.964987 9.727210 8.517979 8.832972 7.242036 8.023133 16.75777 13.62766 14.20112 8.964155 10.048461 10.231965 7.829510 7.670711 9.574185 7.393910 8.610420 10.129049 15.16303 13.97388 14.44005 9.136271 9.978488 8.279169 6.420716 12.07953 14.76033 7.109682 8.179306 9.874410 10.956950 7.001503 10.370449 8.392426 6.004586 9.376507 7.530379
100pmol 3 10.32522 11.893804 10.851852 7.868171 8.887142 6.429596 8.646475 16.75777 12.81909 13.94448 9.027184 9.940284 9.504539 8.074082 7.698334 10.467264 7.413486 8.433160 10.022816 15.15272 13.55168 14.42169 8.758911 9.812352 8.213284 6.777937 11.29130 14.74334 6.004586 8.311138 9.649565 10.097660 7.009435 10.195272 8.550204 7.121143 7.262869 7.555404
100pmol 4 10.51096 12.079525 10.956950 8.279169 8.964155 6.004586 8.023133 16.75777 12.96499 13.97388 9.136271 10.048461 9.978488 7.829510 7.670711 9.376507 7.829510 8.517979 10.231965 15.16303 13.62766 14.20112 8.710518 10.129049 8.179306 7.109682 11.40651 14.76033 6.420716 7.530379 8.832972 10.681337 6.764393 9.874410 8.392426 7.001503 7.242036 7.393910
200pmol 1 10.27762 12.256403 10.413259 8.356482 7.375266 7.098768 8.149770 16.75777 13.10393 14.00189 9.255807 10.077232 9.798890 8.142561 7.974610 10.164570 7.644405 8.253204 9.927121 15.17286 14.22236 14.77650 8.577167 10.010298 8.782531 7.222195 11.51625 14.45755 6.412260 7.506546 9.641587 10.556023 6.751494 8.669939 9.040857 7.792691 6.993948 8.905805
200pmol 2 10.53171 11.078642 10.729201 8.356482 8.732804 7.026553 8.267212 16.75777 12.49496 13.38772 9.012072 10.120238 9.798890 8.149770 7.964140 9.954832 7.608925 8.620541 10.229206 15.13045 13.88102 14.70670 8.866513 10.377729 8.386314 7.145874 11.59517 14.38206 6.004586 7.766206 9.827232 9.232358 6.448756 9.504539 8.520402 6.807164 7.455730 7.307825
200pmol 3 11.05704 8.142561 12.256403 8.356482 8.905805 6.412260 8.669939 16.75777 14.00189 13.70002 9.255807 10.413259 10.164570 7.792691 7.506546 10.010298 7.375266 8.476493 10.277619 15.17286 14.22236 14.77650 8.577167 9.927121 8.253204 6.993948 13.10393 14.45755 6.004586 7.098768 10.077232 11.516245 6.751494 9.798890 9.040857 7.974610 7.222195 7.644405
200pmol 4 10.72920 8.520402 11.595175 8.732804 7.964140 6.448756 8.149770 16.75777 13.38772 13.88102 9.504539 9.954832 10.229206 8.267212 6.807164 10.036651 7.455730 8.253204 10.531713 15.13045 14.13067 14.70670 9.232358 10.377729 8.386314 7.026553 12.49496 14.38206 6.004586 7.608925 10.120238 11.078642 7.145874 9.658384 8.866513 7.766206 7.307825 8.620541
50pmol 1 10.72920 9.232358 7.766206 8.267212 8.732804 9.658384 7.964140 16.75777 12.49496 13.88102 9.504539 10.120238 10.377729 8.520402 7.608925 9.827232 7.455730 8.620541 10.531713 14.70670 13.38772 14.38206 11.078642 10.229206 8.386314 7.026553 11.59517 15.13045 9.954832 7.307825 8.298269 9.012072 6.807164 8.866513 6.448756 8.149770 6.004586 7.145874
50pmol 2 10.96831 6.004586 10.662903 8.659793 8.785723 8.190682 8.555305 16.75777 12.30540 13.84672 9.753718 9.581714 10.189464 8.429646 7.035806 10.008606 7.159242 9.099265 10.482590 14.36063 13.25790 14.10465 10.086066 10.189464 8.926299 7.637117 11.47571 15.11842 8.017625 7.480137 8.298269 10.329078 6.459113 9.911682 9.362666 7.332126 6.004586 6.822962
50pmol 3 11.59517 6.004586 10.729201 6.448756 8.732804 8.149770 8.386314 16.75777 13.38772 14.13067 9.658384 9.362666 10.377729 8.620541 7.026553 9.954832 7.035806 8.429646 10.531713 14.70670 13.88102 14.38206 10.036651 10.229206 9.012072 7.455730 12.49496 15.13045 7.964140 7.766206 8.520402 10.120238 7.145874 9.504539 8.267212 8.866513 7.307825 6.807164
50pmol 4 10.96831 10.008606 10.662903 8.298269 8.190682 7.806291 8.659793 16.75777 12.30540 13.84672 9.504539 9.362666 10.482590 8.555305 6.004586 9.753718 7.035806 8.429646 10.329078 14.68689 13.25790 14.10465 9.911682 10.189464 8.386314 7.026553 11.47571 15.11842 7.637117 8.017625 9.581714 10.086066 6.459113 9.099265 8.926299 7.480137 7.159242 6.822962

Summarization

This summarization provides a table of values for each protein in the final dataset that include the final processed abundances and fold changes in each condition, and that protein’s statistical relation to the global dataset in terms of its mean, median, standard deviation, and other parameters.

dataSumm <- summarize(dataImput, saveSumm = TRUE)
Condition Stat NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN
100pmol n 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.00000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000
100pmol mean 10.6532858 12.0862074 10.6232407 8.2644365 8.8486965 6.8764321 8.1789684 16.75777 13.0941802 14.0233390 9.0659703 10.0672927 9.9408634 7.9781022 7.8297313 9.8230915 7.4096023 8.5986327 10.0905795 15.1604531 13.6952175 14.3757288 8.8924636 9.8735187 8.2974346 6.7681823 11.7142153 14.7560796 6.3848929 7.9228836 9.6214990 10.6043210 7.0043419 10.0418355 8.6778910 6.8092287 7.8188307 7.5025180
100pmol sd 0.5083417 0.6509735 0.5994044 0.2816079 0.1066916 0.8168644 0.3116711 0.00000 0.3622393 0.1193272 0.0851570 0.1210474 0.3098987 0.1768750 0.2999079 0.4757395 0.3381959 0.1721820 0.1134697 0.0051583 0.1891971 0.1167288 0.1962332 0.2378070 0.1527627 0.2813447 0.4244383 0.0084914 0.5214944 0.3804140 0.5609915 0.3619004 0.1950282 0.2935700 0.4716454 0.5391294 1.0406244 0.0733600
100pmol median 10.4407055 11.9866645 10.9044011 8.3357977 8.8600566 6.8358159 8.0231329 16.75777 12.9649865 13.9738814 9.0817277 10.0484610 10.0134747 7.9517960 7.6845225 9.7242975 7.4036982 8.5641998 10.0759323 15.1630323 13.6276559 14.4308715 8.8615327 9.8954204 8.2462263 6.7711653 11.7430198 14.7603253 6.2126514 7.9250089 9.7619876 10.6813371 7.0054690 10.0348410 8.4713152 7.0555925 7.3283898 7.5303791
100pmol trimmed 10.6532858 12.0862074 10.6232407 8.2644365 8.8486965 6.8764321 8.1789684 16.75777 13.0941802 14.0233390 9.0659703 10.0672927 9.9408634 7.9781022 7.8297313 9.8230915 7.4096023 8.5986327 10.0905795 15.1604531 13.6952175 14.3757288 8.8924636 9.8735187 8.2974346 6.7681823 11.7142153 14.7560796 6.3848929 7.9228836 9.6214990 10.6043210 7.0043419 10.0418355 8.6778910 6.8092287 7.8188307 7.5025180
100pmol mad 0.1376914 0.4989033 0.0779088 0.1770303 0.0972461 0.9173216 0.0000000 0.00000 0.1081516 0.0217987 0.0808661 0.0801917 0.1879022 0.1813007 0.0204763 0.3690956 0.3054035 0.1314032 0.1116108 0.0000000 0.0563232 0.0136146 0.1880212 0.2347672 0.0740280 0.2559629 0.4989033 0.0000000 0.3084771 0.4747480 0.3554415 0.2043118 0.1783076 0.3469740 0.1169605 0.0886895 0.1125840 0.0185508
100pmol min 10.3252183 11.4065141 9.7272105 7.8681709 8.7105178 6.0045864 8.0231329 16.75777 12.8190919 13.9444754 8.9641549 9.9402840 9.5045393 7.8295103 7.6707114 9.3765069 7.0015026 8.4331596 9.9784883 15.1527157 13.5516770 14.2011179 8.7105178 9.5741849 8.1793065 6.4207163 11.2912962 14.7433424 6.0045864 7.5303791 8.8329716 10.0976598 6.7643932 9.7272105 8.3924265 6.0045864 7.2420363 7.3939101
100pmol max 11.4065141 12.9649865 10.9569499 8.5179795 8.9641549 7.8295103 8.6464751 16.75777 13.6276559 14.2011179 9.1362711 10.2319649 10.2319649 8.1793065 8.2791689 10.4672641 7.8295103 8.8329716 10.2319649 15.1630323 13.9738814 14.4400544 9.1362711 10.1290492 8.5179795 7.1096825 12.0795254 14.7603253 7.1096825 8.3111376 10.1290492 10.9569499 7.2420363 10.3704495 9.3765069 7.1211431 9.3765069 7.5554037
100pmol range 1.0812958 1.5584724 1.2297394 0.6498087 0.2536371 1.8249239 0.6233422 0.00000 0.8085640 0.2566425 0.1721162 0.2916809 0.7274256 0.3497962 0.6084576 1.0907573 0.8280077 0.3998121 0.2534766 0.0103166 0.4222044 0.2389365 0.4257533 0.5548643 0.3386730 0.6889662 0.7882292 0.0169829 1.1050961 0.7807586 1.2960775 0.8592901 0.4776431 0.6432390 0.9840804 1.1165567 2.1344706 0.1614936
100pmol skew 0.6975575 0.3239948 -0.7349674 -0.4906058 -0.2153643 0.0746172 0.7500000 NaN 0.6663690 0.7189584 -0.1695960 0.3387430 -0.4796405 0.1114972 0.7458050 0.3783514 0.0392202 0.3827303 0.1991275 -0.7500000 0.6668586 -0.7378644 0.2135796 -0.1711657 0.5944647 -0.0238336 -0.0237842 -0.7500000 0.4773224 -0.0050802 -0.4861771 -0.4499095 -0.0129955 0.0322210 0.6966507 -0.7281063 0.7407580 -0.6901803
100pmol kurtosis -1.7260359 -1.8707238 -1.6982833 -1.8448761 -1.9494341 -2.1798795 -1.6875000 NaN -1.7327385 -1.7058522 -2.2461322 -1.8404530 -1.8192490 -2.3106519 -1.6904097 -1.9467364 -1.8757984 -1.9169430 -2.1242147 -1.6875000 -1.7325055 -1.6961436 -2.1614305 -2.0285590 -1.8014680 -1.8757000 -2.4101207 -1.6875000 -1.9257747 -2.3455313 -1.8463172 -1.8132208 -1.8755701 -2.2325877 -1.7280983 -1.7033814 -1.6940017 -1.7210573
100pmol se 0.2541708 0.3254868 0.2997022 0.1408039 0.0533458 0.4084322 0.1558356 0.00000 0.1811196 0.0596636 0.0425785 0.0605237 0.1549494 0.0884375 0.1499539 0.2378698 0.1690980 0.0860910 0.0567349 0.0025791 0.0945985 0.0583644 0.0981166 0.1189035 0.0763814 0.1406723 0.2122191 0.0042457 0.2607472 0.1902070 0.2804957 0.1809502 0.0975141 0.1467850 0.2358227 0.2695647 0.5203122 0.0366800
200pmol n 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.00000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000
200pmol mean 10.6488941 9.9995022 11.2485095 8.4505628 8.2445037 6.7465842 8.3091726 16.75777 13.2471248 13.7426617 9.2570565 10.1413903 9.9978891 8.0880583 7.5631150 10.0415877 7.5210813 8.4008606 10.2414149 15.1516556 14.1141044 14.7415973 8.8133014 10.1732192 8.4520910 7.0971425 12.1775775 14.4198014 6.1065048 7.4951113 9.9165721 10.5958169 6.7744045 9.4079382 8.8671576 7.5851679 7.2449244 8.1196440
200pmol sd 0.3289432 1.9911564 0.8373275 0.1881608 0.7094445 0.3664652 0.2468009 0.00000 0.6260180 0.2671713 0.2010541 0.1943193 0.2312942 0.2050339 0.5491845 0.0887966 0.1270932 0.1803556 0.2480010 0.0244819 0.1612898 0.0403015 0.3108902 0.2385763 0.2290557 0.1059008 0.7605739 0.0435834 0.2038367 0.2849943 0.2242693 0.9901032 0.2858241 0.5064716 0.2453451 0.5268763 0.1931332 0.7646020
200pmol median 10.6304571 9.7995222 11.1621879 8.3564824 8.3484718 6.7376547 8.2084906 16.75777 13.2458280 13.7905186 9.2558074 10.0987350 9.9817300 8.1461655 7.7353428 10.0234743 7.5323272 8.3648486 10.2534127 15.1516556 14.1765162 14.7415973 8.7218404 10.1940134 8.3863145 7.0862137 12.0450654 14.4198014 6.0045864 7.5577355 9.9522318 10.8173322 6.7514941 9.5814615 8.9536853 7.7794486 7.2650103 8.1324727
200pmol trimmed 10.6488941 9.9995022 11.2485095 8.4505628 8.2445037 6.7465842 8.3091726 16.75777 13.2471248 13.7426617 9.2570565 10.1413903 9.9978891 8.0880583 7.5631150 10.0415877 7.5210813 8.4008606 10.2414149 15.1516556 14.1141044 14.7415973 8.8133014 10.1732192 8.4520910 7.0971425 12.1775775 14.4198014 6.1065048 7.4951113 9.9165721 10.5958169 6.7744045 9.4079382 8.8671576 7.5851679 7.2449244 8.1196440
200pmol mad 0.3347575 2.1765166 0.8761545 0.0000000 0.6980566 0.4553758 0.0870596 0.00000 0.6618078 0.2237764 0.1806811 0.1226153 0.2710782 0.0924032 0.3469760 0.0606518 0.1398648 0.1655239 0.2242486 0.0314340 0.0679752 0.0517460 0.2144922 0.2723765 0.0986746 0.1126230 0.7255181 0.0559598 0.0000000 0.1924863 0.2172056 0.7118132 0.2244197 0.2182024 0.1292411 0.1544898 0.1731190 0.9350762
200pmol min 10.2776193 8.1425613 10.4132587 8.3564824 7.3752661 6.4122599 8.1497697 16.75777 12.4949559 13.3877224 9.0120719 9.9548324 9.7988903 7.7926907 6.8071641 9.9548324 7.3752661 8.2532043 9.9271211 15.1304537 13.8810204 14.7066952 8.5771674 9.9271211 8.2532043 6.9939476 11.5162454 14.3820570 6.0045864 7.0987676 9.6415866 9.2323577 6.4487561 8.6699394 8.5204024 6.8071641 6.9939476 7.3078254
200pmol max 11.0570429 12.2564034 12.2564034 8.7328039 8.9058051 7.0987676 8.6699394 16.75777 14.0018871 14.0018871 9.5045393 10.4132587 10.2292060 8.2672115 7.9746103 10.1645698 7.6444045 8.6205408 10.5317133 15.1728576 14.2223649 14.7764995 9.2323577 10.3777288 8.7825308 7.2221951 13.1039337 14.4575457 6.4122599 7.7662064 10.1202381 11.5162454 7.1458739 9.7988903 9.0408573 7.9746103 7.4557295 8.9058051
200pmol range 0.7794236 4.1138421 1.8431447 0.3763215 1.5305390 0.6865077 0.5201698 0.00000 1.5069312 0.6141647 0.4924674 0.4584263 0.4303157 0.4745208 1.1674462 0.2097374 0.2691384 0.3673366 0.6045922 0.0424039 0.3413445 0.0698043 0.6551903 0.4506077 0.5293266 0.2282475 1.5876883 0.0754887 0.4076734 0.6674389 0.4786515 2.2838877 0.6971178 1.1289508 0.5204548 1.1674462 0.4617819 1.5979797
200pmol skew 0.1104304 0.0985966 0.1459698 0.7500000 -0.1912656 0.0093515 0.6449274 NaN 0.0043414 -0.3241272 0.0139784 0.4541046 0.0251965 -0.5691072 -0.4497921 0.4200330 -0.1005912 0.1958428 -0.1071276 0.0000000 -0.6030234 0.0000000 0.3944221 -0.0390722 0.5750774 0.1306423 0.1932636 0.0000000 0.7500000 -0.4413140 -0.1975122 -0.4230438 0.1788057 -0.6265175 -0.5283584 -0.6785550 -0.2152922 -0.0172273
200pmol kurtosis -1.9967978 -2.3005634 -2.1833090 -1.6875000 -2.1856515 -2.4193545 -1.7713239 NaN -1.9494170 -1.9815481 -1.8749421 -1.8267257 -2.4085003 -1.7742906 -1.9525773 -1.8484890 -2.2706637 -2.2173611 -1.8856612 -2.4375000 -1.8081396 -2.4375000 -2.0080976 -2.3926118 -1.7717059 -2.2279404 -2.2134019 -2.4375000 -1.6875000 -1.8538031 -2.1927634 -1.9000912 -1.8654864 -1.7689149 -1.8768601 -1.7272841 -1.9293663 -2.3195181
200pmol se 0.1644716 0.9955782 0.4186638 0.0940804 0.3547223 0.1832326 0.1234004 0.00000 0.3130090 0.1335856 0.1005271 0.0971597 0.1156471 0.1025170 0.2745923 0.0443983 0.0635466 0.0901778 0.1240005 0.0122409 0.0806449 0.0201508 0.1554451 0.1192882 0.1145278 0.0529504 0.3802869 0.0217917 0.1019184 0.1424971 0.1121346 0.4950516 0.1429120 0.2532358 0.1226726 0.2634382 0.0965666 0.3823010
50pmol n 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.00000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000 4.0000000
50pmol mean 11.0652499 7.8125342 9.9553033 7.9185075 8.6105032 8.4512816 8.3913880 16.75777 12.6233713 13.9262822 9.6052951 9.6068208 10.3568778 8.5314736 6.9189678 9.8860972 7.1716461 8.6447744 10.4687736 14.6152277 13.4461368 14.2433512 10.2782600 10.2093348 8.6777501 7.2864883 11.7603897 15.1244369 8.3934284 7.6429483 8.6746639 9.8868633 6.7178161 9.3454998 8.2512331 7.9571364 6.6190600 6.8997407
50pmol sd 0.3708294 2.1115541 1.4597325 0.9959061 0.2809902 0.8229885 0.3063097 0.00000 0.5173426 0.1372130 0.1226801 0.3575152 0.1220664 0.0795662 0.6676726 0.1165080 0.1981263 0.3160745 0.0959664 0.1699854 0.2963109 0.1601637 0.5386107 0.0229454 0.3383377 0.3091468 0.4929381 0.0069476 1.0544361 0.3132572 0.6136998 0.5930033 0.3291763 0.4604039 1.2832479 0.7029085 0.7121212 0.1642577
50pmol median 10.9683118 7.6184720 10.6629030 8.2827405 8.7328039 8.1702259 8.4708099 16.75777 12.4001796 13.8638704 9.5814615 9.4721898 10.3777288 8.5378538 7.0311799 9.8910321 7.0975242 8.5250934 10.5071517 14.6967905 13.3228123 14.2433512 10.0613581 10.2093348 8.6563070 7.2411414 11.5354444 15.1244369 7.9908823 7.6231715 8.4093360 10.1031519 6.6331386 9.3019021 8.5967555 7.8149531 6.5819142 6.8229624
50pmol trimmed 11.0652499 7.8125342 9.9553033 7.9185075 8.6105032 8.4512816 8.3913880 16.75777 12.6233713 13.9262822 9.6052951 9.6068208 10.3568778 8.5314736 6.9189678 9.8860972 7.1716461 8.6447744 10.4687736 14.6152277 13.4461368 14.2433512 10.2782600 10.2093348 8.6777501 7.2864883 11.7603897 15.1244369 8.3934284 7.6429483 8.6746639 9.8868633 6.7178161 9.3454998 8.2512331 7.9571364 6.6190600 6.8997407
50pmol mad 0.1772529 2.3927468 0.0491467 0.2910204 0.0392287 0.2849493 0.2027294 0.00000 0.1405153 0.0254266 0.1140448 0.1623808 0.0777336 0.0742326 0.4317120 0.1344530 0.0915028 0.1415103 0.0364151 0.0146847 0.0962357 0.2056452 0.1292707 0.0294612 0.4002908 0.3181482 0.0885564 0.0089205 0.2820706 0.3397979 0.1646671 0.1801448 0.2580102 0.4729686 0.8120600 0.6061193 0.8559462 0.0117113
50pmol min 10.7292010 6.0045864 7.7662064 6.4487561 8.1906821 7.8062907 7.9641396 16.75777 12.3054034 13.8467203 9.5045393 9.3626655 10.1894635 8.4296461 6.0045864 9.7537182 7.0358064 8.4296461 10.3290776 14.3606346 13.2579022 14.1046454 9.9116818 10.1894635 8.3863145 7.0265534 11.4757140 15.1184202 7.6371168 7.3078254 8.2982695 9.0120719 6.4591131 8.8665134 6.4487561 7.3321259 6.0045864 6.8071641
50pmol max 11.5951749 10.0086064 10.7292010 8.6597927 8.7857227 9.6583837 8.6597927 16.75777 13.3877224 14.1306676 9.7537182 10.1202381 10.4825900 8.6205408 7.6089249 10.0086064 7.4557295 9.0992648 10.5317133 14.7066952 13.8810204 14.3820570 11.0786419 10.2292060 9.0120719 7.6371168 12.4949559 15.1304537 9.9548324 8.0176249 9.5817142 10.3290776 7.1458739 9.9116818 9.3626655 8.8665134 7.3078254 7.1458739
50pmol range 0.8659739 4.0040200 2.9629945 2.2110366 0.5950406 1.8520929 0.6956531 0.00000 1.0823190 0.2839473 0.2491788 0.7575726 0.2931265 0.1908947 1.6043385 0.2548883 0.4199231 0.6696188 0.2026357 0.3460606 0.6231181 0.2774116 1.1669601 0.0397425 0.6257574 0.6105634 1.0192419 0.0120335 2.3177155 0.7097995 1.2834446 1.3170057 0.6867608 1.0451685 2.9139094 1.5343875 1.3032390 0.3387098
50pmol skew 0.5345800 0.0433923 -0.7489636 -0.6758677 -0.7319474 0.6482699 -0.4719094 NaN 0.6863654 0.7197869 0.1859854 0.5810642 -0.3693859 -0.1673018 -0.3638798 -0.0569700 0.5757638 0.5855057 -0.6293395 -0.7433054 0.6602470 0.0000000 0.7076234 0.0000000 0.0207596 0.1085464 0.7215694 0.0000000 0.6913135 0.1018067 0.6878480 -0.6741805 0.3120345 0.1554845 -0.4939928 0.2926118 0.0140868 0.7453289
50pmol kurtosis -1.7865070 -2.3876770 -1.6881769 -1.7283357 -1.6985019 -1.7416477 -1.8858332 NaN -1.7364638 -1.7097999 -2.2281384 -1.8279658 -1.8337864 -1.9347058 -1.8350942 -2.2311257 -1.8328016 -1.8239282 -1.7848527 -1.6921905 -1.7582380 -2.4375000 -1.7131106 -2.4375000 -2.4135931 -2.3139585 -1.7084245 -2.4375000 -1.7208848 -2.1366725 -1.7352502 -1.7295690 -2.0930099 -2.1113622 -1.8670313 -2.0889369 -2.4212634 -1.6904792
50pmol se 0.1854147 1.0557771 0.7298663 0.4979530 0.1404951 0.4114942 0.1531548 0.00000 0.2586713 0.0686065 0.0613401 0.1787576 0.0610332 0.0397831 0.3338363 0.0582540 0.0990632 0.1580372 0.0479832 0.0849927 0.1481555 0.0800818 0.2693054 0.0114727 0.1691689 0.1545734 0.2464691 0.0034738 0.5272181 0.1566286 0.3068499 0.2965017 0.1645881 0.2302020 0.6416239 0.3514543 0.3560606 0.0821289

The column “Stat” in the generated result includes the following statistics:

  • n: Number.
  • mean: Mean.
  • sd: Standard deviation.
  • median: Median.
  • trimmed: Trimmed mean with a trim of 0.1.
  • mad: Median absolute deviation (from the median).
  • min: Minimum.
  • max: Maximum.
  • range: The difference between the maximum and minimum value.
  • skew: Skewness.
  • kurtosis: Kurtosis.
  • se: Standard error.

Analysis

The function analyze() calculates the results that can be used in subsequent visualizations.

Note: The following listed analysis compare data under two conditions. The order of conditions will affect downstream analysis, as the second condition serves as the reference of comparison.

  • If only two conditions exist in the data and conditions is not specified, conditions will automatically be generated by sorting the unique values alphabetically and in ascending order.

  • If more than two conditions exist in the data, precisely two conditions for comparison must be specified via the argument conditions.

cond <- c("100pmol", "50pmol")

Student’s t-test

The Student’s t-test is used to compare the means between two conditions for each protein, reporting both the difference in means between the conditions and the P-value of the test.

Note: The difference is calculated by subtracting the mean of the second condition from the mean of the first condition (condition 1 - Condition 2).
anlys_t <- analyze(dataImput, conditions = cond, testType = "t-test")
#> Data are essentially constant.
NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN
Difference -0.4119641 4.2736732 0.6679373 0.3459290 0.2381933 -1.5748494 -0.2124196 0 0.4708090 0.0970569 -0.5393248 0.4604719 -0.4160144 -0.5533715 0.9107635 -0.0630057 0.2379563 -0.0461417 -0.3781941 0.5452255 0.2490808 0.1323776 -1.385796 -0.3358160 -0.3803155 -0.5183059 -0.0461744 -0.3683573 -2.0085355 0.2799353 0.9468351 0.7174576 0.2865258 0.6963357 0.4266578 -1.1479077 1.1997707 0.6027773
P-value 0.2425281 0.0223332 0.4450987 0.5455559 0.1909446 0.0348158 0.3685157 NA 0.1922051 0.3275609 0.0005957 0.0767642 0.0683551 0.0041165 0.0651420 0.8119403 0.2805476 0.8086313 0.0024306 0.0076435 0.2145975 0.2343297 0.009692 0.0658082 0.1068935 0.0481610 0.8918516 0.0000000 0.0232926 0.3007562 0.0633640 0.0942018 0.1959323 0.0503672 0.5680738 0.0436514 0.1120976 0.0022576
Note: In the Student’s t-test, a warning message might appear, stating “Data are essentially constant,” which means that the data contain proteins with the same value in all samples. In this case, the P-value of t-test returns NA.

Moderated t-test

The main distinction between the Student’s and moderated t-tests (Smyth 2004) lies in how variance is computed. While the Student’s t-test calculates variance based on the data available for each protein individually, the moderated t-test utilizes information from all the chosen proteins to calculate variance.

anlys_mod.t <- analyze(dataImput, conditions = cond, testType = "mod.t-test")
#> Warning: Zero sample variances detected, have been offset away from zero
NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN
Difference -0.4119641 4.273673 0.6679373 0.3459290 0.2381933 -1.574849 -0.2124196 0 0.4708090 0.0970569 -0.5393248 0.4604719 -0.4160144 -0.5533715 0.9107635 -0.0630057 0.2379563 -0.0461417 -0.3781941 0.5452255 0.2490808 0.1323776 -1.3857964 -0.3358160 -0.3803155 -0.5183059 -0.0461744 -0.3683573 -2.0085355 0.2799353 0.9468351 0.7174576 0.2865258 0.6963357 0.4266578 -1.1479077 1.1997707 0.6027773
P-value 0.2079512 0.004610 0.3968637 0.4996458 0.1442800 0.023648 0.3400724 1 0.1583605 0.3241372 0.0002405 0.0381414 0.0355742 0.0007747 0.0337640 0.7936098 0.2433486 0.7960328 0.0018535 0.0004327 0.1823081 0.2219151 0.0014515 0.0242553 0.0694464 0.0354752 0.8844841 0.0000138 0.0085806 0.2694562 0.0465307 0.0649959 0.1605706 0.0311717 0.5272032 0.0286997 0.0827948 0.0003166
Note: In the moderated t-test, a warning message might occur stating, “Zero sample variances detected, have been offset away from zero.” This warning corresponds to examples of proteins that exhibited identical quant values, either pre- or post-imputation, and therefore no variance is present across conditions for those proteins. This does not impede downstream analysis; it merely serves to alert users to its occurrence.

MA

The result of testType = "MA" is to generate the data for plotting an MA plot, which represents the protein-wise averages within each condition.

Note: The rows of the output are ordered by conditions, impacting the subsequent MA plot visualization. Specifically, the first row represents the protein-wise average of the first condition, and the second row represents the second condition.
anlys_MA <- analyze(dataImput, conditions = cond, testType = "MA")
NUD4B_HUMAN A0A7P0T808_HUMAN A0A8I5KU53_HUMAN ZN840_HUMAN CC85C_HUMAN C9JEV0_HUMAN C9JNU9_HUMAN ALBU_BOVIN CYC_BOVIN TRFE_BOVIN F8W0H2_HUMAN H0Y7V7_HUMAN H0YD14_HUMAN H3BUF6_HUMAN H7C1W4_HUMAN H7C3M7_HUMAN TLR3_HUMAN LRIG2_HUMAN RAB3D_HUMAN ADH1_YEAST LYSC_CHICK BGAL_ECOLI CYTA_HUMAN KPCB_HUMAN LIPL_HUMAN CO6_HUMAN BGAL_HUMAN SYTC_HUMAN CASPE_HUMAN DCAF6_HUMAN DALD3_HUMAN HGNAT_HUMAN RFFL_HUMAN RN185_HUMAN ZN462_HUMAN ALKB7_HUMAN POLK_HUMAN ACAD8_HUMAN
100pmol 10.65329 12.086207 10.623241 8.264437 8.848697 6.876432 8.178968 16.75777 13.09418 14.02334 9.065970 10.067293 9.940863 7.978102 7.829731 9.823092 7.409602 8.598633 10.09058 15.16045 13.69522 14.37573 8.892464 9.873519 8.297435 6.768182 11.71422 14.75608 6.384893 7.922884 9.621499 10.604321 7.004342 10.04184 8.677891 6.809229 7.818831 7.502518
50pmol 11.06525 7.812534 9.955303 7.918507 8.610503 8.451282 8.391388 16.75777 12.62337 13.92628 9.605295 9.606821 10.356878 8.531474 6.918968 9.886097 7.171646 8.644774 10.46877 14.61523 13.44614 14.24335 10.278260 10.209335 8.677750 7.286488 11.76039 15.12444 8.393428 7.642948 8.674664 9.886863 6.717816 9.34550 8.251233 7.957136 6.619060 6.899741

Visualization

This section provides a variety of options for getting a global view of your data, making comparisons, and highlighting trends. Keep in mind that data visualization is most effective when illustrating a point or answering a question you have about your data, and not as a means to find a point/question.

heatmap

The package offers two options for plotting the heatmap.

  • Option 1 utilizes the source package pheatmap, capable of plotting the dendrogram simultaneously. It is the default choice for heatmaps in this package.
visualize(dataImput, graphType = "heatmap",
          pkg = "pheatmap",
          cluster_cols = TRUE, cluster_rows = TRUE,
          show_colnames = TRUE, show_rownames = TRUE)

When protein names are excessively long, it is recommended to set show_rownames = FALSE to view the full heatmap.

  • Option 2 use the source package ggplot2 to generate a ggplot object but does not include the dendrogram.
visualize(dataImput, graphType = "heatmap", pkg = "ggplot2")

In a heatmap, similar colors within a row indicate relatively consistent values, suggesting similar protein expression levels across different samples.

MA

An MA plot, short for “M vs. A plot,” which uses two axes:

  • M axis (vertical): Represents the logarithm (usually base 2) of the fold change, or the ratio of the expression levels, between two conditions. It is calculated as: M=log2(X/Y)=log2Xlog2YM = log_2(X/Y) = log_2 X - log_2 Y
  • A axis (horizontal): Represents the average intensity of the two conditions, calculated as: A=12log2(XY)=12[log2(X)+log2(Y)]A = \frac{1}{2}log_2(XY) = \frac{1}{2}\left[log_2(X)+log_2(Y)\right]

Most proteins are expected to exhibit little variation, leading to the majority of points concentrating around the line M = 0 (indicating no difference between group means).

Note: Again, the order of conditions in the analyze() will determine how the MA plot is visualized. The second row of anlys_MA acts as the comparison reference: the first and second rows refer to variables log2Xlog_2 X and log2Ylog_2 Y, respectively.
visualize(anlys_MA, graphType = "MA", M.thres = 1, transformLabel = "Log2")
#> Warning: Removed 32 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).

where M.thres = 1 means the M thresholds are set to −1 and 1. The scatters are split into three parts: significant up (M > 1), no significant (-1 \leq M \leq 1), and significant down (M < -1). And transformLabel = "Log2" is used to prefix the title, x-axis, and y-axis labels. Additionally, the warning message “Removed 16 rows containing missing values” indicates that there are 16 proteins with no significance.

Normalize

visualize(dataNorm, graphType = "normalize")
#> Warning: Removed 55 rows containing non-finite outside the scale range
#> (`stat_boxplot()`).

PCA

Principal component analysis (PCA) is a powerful technique used in data analysis to simplify and reduce the dimensionality of large datasets. It transforms original variables into uncorrelated components that capture the maximum variance. By selecting a subset of these components, PCA projects the data points onto these key directions, enabling visualization and analysis in a lower-dimensional space. This aids in identifying patterns and relationships within complex datasets.

In the visualization for graphType = "PCA_*", the arguments center and scale are used to center the data to zero mean and scale to unit variance, with default setting at TRUE.

Note: Data scaling is done to ensure that the scale differences between different features do not affect the results of PCA. If not scaled, features with larger scales will dominate the computation of principal components (PCs).
Note: The most common error message for the PCA is “Cannot rescale a constant/zero column to unit variance.” This clearly occurs when columns representing proteins contain only zeros or have constant values. Typically, there are two ways to address this error: one is to remove these proteins, and the other is to set scale = FALSE.

In the case of dataImput, one protein, namely “ALBU_BOVIN”, has constant values, leading to the error message. We choose to remove this protein in the PCA.

names(dataImput)[sapply(dataImput, function(col) length(unique(col)) == 1)]
#> [1] "ALBU_BOVIN"
dataPCA <- dataImput[, colnames(dataImput) != "ALBU_BOVIN"]

PCA_scree

One way to help identify how many PCs to retain, is to explore a scree plot. The scree plot shows the eigenvalues of each PC, which represent the proportion of variance explained by that component.

visualize(dataPCA, graphType = "PCA_scree", center = TRUE, scale = TRUE,
          addlabels = TRUE, choice = "variance", ncp = 10)
visualize(dataPCA, graphType = "PCA_scree", center = TRUE, scale = TRUE,
          addlabels = TRUE, choice = "eigenvalue", ncp = 10)

where choice specifies the data to be plotted, either "variance" or "eigenvalue", addlabels = TRUE adds information labels at the top of bars/points, and ncp = 10 sets the number of dimension to be displayed.

PCA_ind

The primary PCA plot of individual data visually represents the distribution of individual observations in a reduced-dimensional space, typically defined by the PCs. The x and y axes of the PCA plot represent the PCs. Each axis corresponds to a linear combination of the original variables. Individual data points on the PCA plot represent observations (e.g., samples) from the original dataset. Points that are close to the origin (0, 0), are close to the “average” across all protein abundances. If sufficient samples are present, the plot will also produce a 95% confidence ellipse, as well as a centroid (mean for each group provided), for each groups (condition) provided.

visualize(dataPCA, graphType = "PCA_ind", center = TRUE, scale = TRUE,
          addlabels = TRUE, addEllipses = TRUE, ellipse.level = 0.95)

PCA_var

This plot will be more useful if your analyses are based on a relatively small number of proteins. It represents the association, or loading of each protein on the first two PCs. Longer arrows represents stronger associations.

Note: Proteins that are weakly associated with PC1 or PC2 may still be highly correlated with other PCs not being plotted. Consult the scree plot (and other available methods) to determine the appropriate number of PCs to investigate.
visualize(dataPCA, graphType = "PCA_var", center = TRUE, scale = TRUE,
          addlabels = TRUE)

PCA_biplot

The PCA biplot includes individual and variable plots. Again, with a large number of proteins, this plot can be unwieldy.

visualize(dataPCA, graphType = "PCA_biplot", center = TRUE, scale = TRUE,
          addEllipses = TRUE, ellipse.level = 0.95, label = "all")

t-test

The function visualize() can be applied to any t-test output. It generates two useful plots: a histogram of fold changes across the analyzed proteins and a histogram of P-values. The majority of proteins are expected to show very small change between conditions, so the fold change histogram will have a peak at around zero. For the P-values, most P-values are expected to be non-significant (above 0.05). Depending on the strength of the treatment effect, there may be a peak of p-values near 0.

visualize(anlys_mod.t, graphType = "t-test")

Upset

The upset plot is a visual representation that helps display the overlap and intersection of sets or categories in a dataset. It is particularly useful for illustrating the presence or absence of elements in combinations of sets.

dataSort <- sortcondition(dataSet)
visualize(dataSort, graphType = "Upset")

This plot reveals that 42 proteins are shared by 50pmol, 100pmol, and 200pmol, while only 3 proteins are shared by 100 pmol and 200pmol, but not with 50pmol.

Venn

The Venn plot is another graphical representation of the relationships between sets. Each circle represents a set, and the overlapping regions show the elements that are shared between sets.

visualize(dataSort, graphType = "Venn",
          show_percentage = TRUE,
          fill_color = c("blue", "yellow", "green", "red"),
          saveVenn = TRUE)

where saveVenn = TRUE refers to the data containing logical columns representing sets in Venn plot information will be saved as a .csv file named Venn_information.csv in the current working directory.

In the example above, 50pmol, 100pmol, and 200pmol groups share 42 proteins. Notably, 3 proteins are exclusively found in the 100pmol and 200pmol groups.

Volcano

A volcano plot is a graphical representation commonly used in proteomics and genomics to visualize differential expression analysis results. It is particularly useful for identifying significant changes in extensive data. It displays two important pieces of information about differences between conditions in a dataset:

  • Statistical significance (vertical): Represents the negative log10 of the P-value.

  • Fold change (horizontal): Represents the fold change.

visualize(anlys_mod.t, graphType = "volcano",
          P.thres = 0.05, logF.thres = 0.6)
#> Warning: Removed 29 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).

Other useful function

The function pullProteinPath() allows you to see the values associated with specific proteins that match exactly and/or proteins that match the regular expression pattern at each step of processing. This can be useful for questions such as, “Were all of the values for my favorite protein actually measured, or were some imputed?” or “Why didn’t my favorite protein make it to the final list? Where was it filtered out?”. It can also be used to check whether some given proteins’ fold-change might have been a processing artifact.

Check <- pullProteinPath(
  listName = c("LYSC_CHICK", "BGAL_ECOLI"),
  regexName = c("BOVIN"),
  by = "PG.ProteinNames",
  dataSetList = list(Initial = dataSet,
                     Transformed = dataTran,
                     Normalized = dataNorm,
                     Imputed = dataImput))
PG.ProteinNames PG.Genes PG.ProteinAccessions PG.ProteinDescriptions R.Condition R.Replicate Initial Transformed Normalized Imputed
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 100pmol 1 111209.703 16.76292 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 100pmol 2 111659.883 16.76875 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 100pmol 3 105982.914 16.69347 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 100pmol 4 104442.562 16.67235 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 200pmol 1 109245.289 16.73721 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 200pmol 2 113357.508 16.79052 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 200pmol 3 114321.836 16.80274 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 200pmol 4 116439.820 16.82923 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 50pmol 1 117803.492 16.84602 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 50pmol 2 110086.680 16.74828 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 50pmol 3 105640.203 16.68880 16.75777 16.75777
ALBU_BOVIN ALB CON__P02769 Bovine serum albumin 50pmol 4 110446.000 16.75298 16.75777 16.75777
BGAL_ECOLI lacZ P00722 Beta-galactosidase 100pmol 1 23840.031 14.54110 14.44005 14.44005
BGAL_ECOLI lacZ P00722 Beta-galactosidase 100pmol 2 23963.307 14.54854 14.44005 14.44005
BGAL_ECOLI lacZ P00722 Beta-galactosidase 100pmol 3 22957.350 14.48667 14.42169 14.42169
BGAL_ECOLI lacZ P00722 Beta-galactosidase 100pmol 4 22311.297 14.44549 14.20112 14.20112
BGAL_ECOLI lacZ P00722 Beta-galactosidase 200pmol 1 41234.672 15.33157 14.77650 14.77650
BGAL_ECOLI lacZ P00722 Beta-galactosidase 200pmol 2 42899.434 15.38867 14.70670 14.70670
BGAL_ECOLI lacZ P00722 Beta-galactosidase 200pmol 3 42904.945 15.38886 14.77650 14.77650
BGAL_ECOLI lacZ P00722 Beta-galactosidase 200pmol 4 43279.844 15.40141 14.70670 14.70670
BGAL_ECOLI lacZ P00722 Beta-galactosidase 50pmol 1 14728.673 13.84634 14.38206 14.38206
BGAL_ECOLI lacZ P00722 Beta-galactosidase 50pmol 2 14736.710 13.84713 14.10465 14.10465
BGAL_ECOLI lacZ P00722 Beta-galactosidase 50pmol 3 14160.203 13.78955 14.38206 14.38206
BGAL_ECOLI lacZ P00722 Beta-galactosidase 50pmol 4 14758.731 13.84928 14.10465 14.10465
CYC_BOVIN CYCS CON__P62894 Cytochrome c 100pmol 1 10737.953 13.39043 12.96499 12.96499
CYC_BOVIN CYCS CON__P62894 Cytochrome c 100pmol 2 10655.384 13.37929 13.62766 13.62766
CYC_BOVIN CYCS CON__P62894 Cytochrome c 100pmol 3 10663.714 13.38042 12.81909 12.81909
CYC_BOVIN CYCS CON__P62894 Cytochrome c 100pmol 4 10843.115 13.40449 12.96499 12.96499
CYC_BOVIN CYCS CON__P62894 Cytochrome c 200pmol 1 19524.863 14.25302 13.10393 13.10393
CYC_BOVIN CYCS CON__P62894 Cytochrome c 200pmol 2 20072.297 14.29292 12.49496 12.49496
CYC_BOVIN CYCS CON__P62894 Cytochrome c 200pmol 3 20787.127 14.34340 14.00189 14.00189
CYC_BOVIN CYCS CON__P62894 Cytochrome c 200pmol 4 19924.240 14.28224 13.38772 13.38772
CYC_BOVIN CYCS CON__P62894 Cytochrome c 50pmol 1 6758.298 12.72244 12.49496 12.49496
CYC_BOVIN CYCS CON__P62894 Cytochrome c 50pmol 2 6721.135 12.71449 12.30540 12.30540
CYC_BOVIN CYCS CON__P62894 Cytochrome c 50pmol 3 6172.877 12.59173 13.38772 13.38772
CYC_BOVIN CYCS CON__P62894 Cytochrome c 50pmol 4 6028.398 12.55756 12.30540 12.30540
LYSC_CHICK LYZ P00698 Lysozyme C 100pmol 1 13798.590 13.75223 13.62766 13.62766
LYSC_CHICK LYZ P00698 Lysozyme C 100pmol 2 13880.411 13.76076 13.97388 13.97388
LYSC_CHICK LYZ P00698 Lysozyme C 100pmol 3 13723.719 13.74438 13.55168 13.55168
LYSC_CHICK LYZ P00698 Lysozyme C 100pmol 4 13944.603 13.76742 13.62766 13.62766
LYSC_CHICK LYZ P00698 Lysozyme C 200pmol 1 24344.188 14.57129 14.22236 14.22236
LYSC_CHICK LYZ P00698 Lysozyme C 200pmol 2 24742.227 14.59469 13.88102 13.88102
LYSC_CHICK LYZ P00698 Lysozyme C 200pmol 3 24803.633 14.59826 14.22236 14.22236
LYSC_CHICK LYZ P00698 Lysozyme C 200pmol 4 26381.047 14.68721 14.13067 14.13067
LYSC_CHICK LYZ P00698 Lysozyme C 50pmol 1 7169.955 12.80775 13.38772 13.38772
LYSC_CHICK LYZ P00698 Lysozyme C 50pmol 2 7797.536 12.92880 13.25790 13.25790
LYSC_CHICK LYZ P00698 Lysozyme C 50pmol 3 7432.793 12.85969 13.88102 13.88102
LYSC_CHICK LYZ P00698 Lysozyme C 50pmol 4 7543.633 12.88104 13.25790 13.25790
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 100pmol 1 15097.670 13.88204 13.97388 13.97388
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 100pmol 2 15840.281 13.95131 14.20112 14.20112
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 100pmol 3 15022.215 13.87481 13.94448 13.94448
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 100pmol 4 15160.493 13.88803 13.97388 13.97388
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 200pmol 1 21577.973 14.39727 14.00189 14.00189
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 200pmol 2 22968.959 14.48740 13.38772 13.38772
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 200pmol 3 20720.127 14.33875 13.70002 13.70002
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 200pmol 4 22153.398 14.43524 13.88102 13.88102
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 50pmol 1 12183.812 13.57268 13.88102 13.88102
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 50pmol 2 12521.783 13.61215 13.84672 13.84672
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 50pmol 3 11926.220 13.54185 14.13067 14.13067
TRFE_BOVIN TF CON__Q0IIK2 Serotransferrin (UP merge to Q29443) 50pmol 4 12021.495 13.55333 13.84672 13.84672

Reference

Bolstad, B. M., R. A. Irizarry, M. Astrand, and T. P. Speed. 2003. “A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias.” Bioinformatics 19 (2): 185–93. https://doi.org/10.1093/bioinformatics/19.2.185.
Breiman, L., J. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. New York, NY, USA: Routledge.
Doove, Lisa L., Stef van Buuren, and Elise Dusseldorp. 2014. “Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects.” Computational Statistics & Data Analysis 72: 92–104. https://doi.org/10.1016/j.csda.2013.10.025.
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. 2015. “Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.” Journal of Machine Learning Research 16 (104): 3367—3402. http://jmlr.org/papers/v16/hastie15a.html.
Kim, Ki-Yeol, Byoung-Jin Kim, and Gwan-Su Yi. 2004. “Reuse of Imputed Data in Microarray Analysis Increases Imputation Efficiency.” BMC Bioinformatics 5: 160. https://doi.org/10.1186/1471-2105-5-160.
Oba, Shigeyuki, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, and Shin Ishii. 2003. “A Bayesian Missing Value Estimation Method for Gene Expression Profile Data.” Bioinformatics 19 (16): 2088–96. https://doi.org/10.1093/bioinformatics/btg287.
Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys. New York, NY, USA: John Wiley & Sons.
Schafer, Joseph L. 1997. Analysis of Incomplete Multivariate Data. New York, NY, USA: Chapman & Hall/CRC.
Shah, Jasmit S., Shesh N. Rai, Andrew P. DeFilippis, Bradford G. Hill, Aruni Bhatnagar, and Guy N. Brock. 2017. “Distribution Based Nearest Neighbor Imputation for Truncated High Dimensional Data with Applications to Pre-Clinical and Clinical Metabolomics Studies.” BMC Bioinformatics 18: 114. https://doi.org/10.1186/s12859-017-1547-6.
Smyth, Gordon K. 2004. “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments.” Statistical Applications in Genetics and Molecular Biology 3 (1). https://doi.org/10.2202/1544-6115.1027.
Stacklies, Wolfram, Henning Redestig, Matthias Scholz, Dirk Walther, and Joachim Selbig. 2007. pcaMethods–a Bioconductor Package Providing PCA Methods for Incomplete Data.” Bioinformatics 23 (9): 1164–67. https://doi.org/10.1093/bioinformatics/btm069.
Troyanskaya, Olga, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B. Altman. 2001. “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics 17 (6): 520–25. https://doi.org/10.1093/bioinformatics/17.6.520.
van Buuren, Stef. 2018. Flexible Imputation of Missing Data. New York, NY, USA: Chapman & Hall/CRC.
van Buuren, Stef, and Karin Groothuis-Oudshoorn. 2011. “Mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software 45 (3): 1–67. https://doi.org/10.18637/jss.v045.i03.