Filtering
Shiying Xiao, Charles Watt, Jennifer C. Liddle, Jeremy L. Balsbaugh, Timothy E. Moore
Department
of Statistics, UConn
Proteomics
and Metabolomics Facility, UConn
Statistical
Consulting Services, UConn
2025-06-11
Source:vignettes/filtering.Rmd
filtering.Rmd
Preliminary
## load R package
library(msDiaLogue)
## preprocessing
fileName <- "../tests/testData/Toy_Spectronaut_Data.csv"
dataSet <- preprocessing(fileName,
filterNaN = TRUE, filterUnique = 2,
replaceBlank = TRUE, saveRm = TRUE)
## transformation
dataTran <- transform(dataSet, logFold = 2)
Examples
Case 1. Remove proteins specified by the user in this step and keep everything else.
In the example below, the specific protein with the identifier
“ALBU_BOVIN” will be removed, as will anything entries with an
identifier that contains the characters “HUMAN”. If
removeList = TRUE
, this function will remove what you’ve
specified and keep the rest.
filterOutIn(dataTran, listName = "ALBU_BOVIN", regexName = "HUMAN",
removeList = TRUE, saveRm = TRUE)
R.Condition | R.Replicate | CYC_BOVIN | TRFE_BOVIN | KRT16_MOUSE | ADH1_YEAST | LYSC_CHICK | BGAL_ECOLI |
---|---|---|---|---|---|---|---|
100pmol | 1 | 13.39043 | 13.88204 | 10.81329 | 14.75235 | 13.75223 | 14.54110 |
100pmol | 2 | 13.37929 | 13.95131 | NA | 14.73246 | 13.76076 | 14.54854 |
100pmol | 3 | 13.38042 | 13.87481 | NA | 14.69847 | 13.74438 | 14.48667 |
100pmol | 4 | 13.40449 | 13.88803 | NA | 14.67556 | 13.76742 | 14.44549 |
200pmol | 1 | 14.25302 | 14.39727 | 11.11126 | 15.57144 | 14.57129 | 15.33157 |
200pmol | 2 | 14.29292 | 14.48740 | NA | 15.60578 | 14.59469 | 15.38867 |
200pmol | 3 | 14.34340 | 14.33875 | 10.50303 | 15.62983 | 14.59826 | 15.38886 |
200pmol | 4 | 14.28224 | 14.43524 | NA | 15.71383 | 14.68721 | 15.40141 |
50pmol | 1 | 12.72244 | 13.57268 | NA | 14.00214 | 12.80775 | 13.84634 |
50pmol | 2 | 12.71449 | 13.61215 | NA | 14.00533 | 12.92880 | 13.84713 |
50pmol | 3 | 12.59173 | 13.54185 | 10.42381 | 14.02047 | 12.85969 | 13.78955 |
50pmol | 4 | 12.55756 | 13.55333 | NA | 13.99104 | 12.88104 | 13.84928 |
If you want to exclude two sets of proteins and no specific ones
(e.g. contaminants and decoys, but not specifically albumin), you can
drop the listName
designator entirely, and set the
regexName
to include a combination, like this:
filterOutIn(dataTran, regexName = c("DECOY", "CON__"),
removeList = TRUE, saveRm = TRUE)
R.Condition | R.Replicate | NUD4B_HUMAN | A0A7P0T808_HUMAN | A0A8I5KU53_HUMAN | ZN840_HUMAN | CC85C_HUMAN | TMC5B_HUMAN | C9JEV0_HUMAN | C9JNU9_HUMAN | ALBU_BOVIN | CYC_BOVIN | TRFE_BOVIN | KRT16_MOUSE | F8W0H2_HUMAN | H0Y7V7_HUMAN | H0YD14_HUMAN | H3BUF6_HUMAN | H7C1W4_HUMAN | H7C3M7_HUMAN | TCPR2_HUMAN | TLR3_HUMAN | LRIG2_HUMAN | RAB3D_HUMAN | ADH1_YEAST | LYSC_CHICK | BGAL_ECOLI | CYTA_HUMAN | KPCB_HUMAN | LIPL_HUMAN | PIP_HUMAN | CO6_HUMAN | BGAL_HUMAN | SYTC_HUMAN | CASPE_HUMAN | DCAF6_HUMAN | DALD3_HUMAN | HGNAT_HUMAN | RFFL_HUMAN | RN185_HUMAN | ZN462_HUMAN | ALKB7_HUMAN | POLK_HUMAN | ACAD8_HUMAN | A0A7I2PK40_HUMAN | NBDY_HUMAN | H0Y5R1_HUMAN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100pmol | 1 | 10.59617 | 11.629505 | 11.461371 | 8.315348 | 8.952781 | 8.833937 | 7.736180 | 7.889538 | 16.76292 | 13.39043 | 13.88204 | 10.81329 | 9.299651 | 10.357346 | 10.321521 | 8.132535 | 8.226486 | 10.173123 | 14.006782 | 7.486384 | 9.011536 | 10.26981 | 14.75235 | 13.75223 | 14.54110 | 9.262305 | 9.952103 | 8.781496 | 7.047859 | 7.234610 | 11.80494 | 14.71384 | 6.775489 | 7.725502 | 10.335975 | 10.939236 | 7.568272 | 10.111329 | 9.938277 | 7.496910 | 7.637679 | 7.712738 | NA | NA | NA |
100pmol | 2 | 10.71487 | 12.159989 | 10.052500 | 8.659134 | 9.121174 | 8.968803 | 7.530568 | 8.294768 | 16.76875 | 13.37929 | 13.95131 | NA | 9.167541 | 10.121893 | 10.338709 | 8.086487 | 7.954448 | 10.011280 | 10.512493 | 7.735480 | 8.943620 | 10.21241 | 14.73246 | 13.76076 | 14.54854 | 9.322413 | 10.073449 | 8.509870 | 7.008875 | 7.002919 | 11.73674 | 14.70683 | 7.279609 | 8.445472 | 10.058467 | 10.559522 | 7.265925 | 10.348343 | 8.602358 | 6.939530 | 9.553050 | 7.950604 | 10.471813 | NA | NA |
100pmol | 3 | 10.46639 | 12.190792 | 11.088689 | 8.103769 | 9.127531 | NA | 6.797573 | 8.969260 | 16.69347 | 13.38042 | 13.87481 | NA | 9.260677 | 10.257840 | 9.885818 | 8.275007 | 8.080432 | 10.820332 | 14.395759 | 7.649245 | 8.418693 | 10.26143 | 14.69847 | 13.74438 | 14.48667 | 9.107695 | 10.200628 | 8.317577 | NA | 6.888870 | 11.77277 | 14.67310 | 6.520759 | 8.318009 | 9.970591 | 10.390675 | 7.161124 | 10.401629 | 8.748640 | 7.176720 | 7.418964 | 7.793871 | 10.883458 | 9.706811 | NA |
100pmol | 4 | 10.66221 | 11.902450 | 11.010415 | 8.588923 | 9.124371 | 8.721258 | 6.397005 | 8.386462 | 16.67235 | 13.40449 | 13.88803 | NA | 9.792043 | 10.165829 | 10.092467 | NA | 7.841731 | 9.816296 | 14.485405 | 7.911680 | 8.707996 | 10.21790 | 14.67556 | 13.76742 | 14.44549 | 8.776573 | 10.183221 | 8.457541 | NA | 7.107332 | 11.76563 | 14.66124 | 6.463718 | 7.766514 | 8.936074 | 10.749752 | 6.643218 | 9.952253 | 8.620437 | 7.071718 | 7.181633 | 7.757381 | 10.653061 | 9.892252 | 9.835011 |
200pmol | 1 | 10.56298 | 12.047141 | 10.969287 | 8.402065 | 7.288615 | 8.509940 | 7.197741 | NA | 16.73721 | 14.25302 | 14.39727 | 11.11126 | 8.941866 | 10.283605 | 10.077367 | 8.077610 | 7.977841 | 10.506136 | 14.365875 | 7.806321 | 8.293637 | 10.20088 | 15.57144 | 14.57129 | 15.33157 | 8.510688 | 10.232523 | 8.590645 | 6.031341 | 7.239260 | 11.79502 | 14.68066 | 6.343792 | 7.535170 | 9.876348 | 11.069962 | 7.034874 | 8.573870 | 8.746924 | 7.903880 | 7.180345 | 8.730611 | 11.190966 | NA | 9.710589 |
200pmol | 2 | 10.53186 | 11.771837 | 11.088757 | NA | 9.104358 | NA | 7.085164 | 8.208757 | 16.79052 | 14.29292 | 14.48740 | NA | 9.387564 | 10.060966 | NA | 8.157292 | 8.019591 | 10.035834 | 14.652848 | 7.572173 | 8.822824 | 10.25298 | 15.60578 | 14.59469 | 15.38867 | 9.307350 | 10.269781 | 8.693930 | NA | 7.085982 | 11.84818 | 14.75441 | 6.120292 | 7.968111 | 9.994960 | 9.503153 | 6.860543 | 9.776460 | 8.775531 | 7.059936 | 7.330260 | 7.277041 | NA | 10.041446 | 9.966576 |
200pmol | 3 | 10.60347 | 8.477484 | 11.155251 | 8.568416 | 9.140283 | 8.954421 | 6.690756 | 8.888482 | 16.80274 | 14.34340 | 14.33875 | 10.50303 | 9.196543 | 10.428428 | 10.222608 | 8.188220 | 7.525047 | 10.134101 | NA | 7.447663 | 8.755449 | 10.24829 | 15.62983 | 14.59826 | 15.38886 | 8.802237 | 10.080459 | 8.482341 | NA | 7.011984 | 11.78574 | 14.72607 | 5.956155 | 7.094894 | 10.194229 | 10.710537 | 6.776144 | 9.874052 | 9.152012 | 8.302428 | 7.197231 | 7.632834 | 10.449137 | 9.709462 | 9.439995 |
200pmol | 4 | 10.57897 | 8.454127 | 11.118493 | 8.682375 | 8.194285 | 9.028272 | 6.572711 | 8.315126 | 16.82923 | 14.28224 | 14.43524 | NA | 9.074329 | 9.850693 | 10.123326 | 8.334982 | 6.609617 | 9.902441 | NA | 7.364369 | NA | 10.20847 | 15.71383 | 14.68721 | 15.40141 | 9.042105 | 10.131013 | 8.419983 | NA | 6.922516 | 11.87336 | 14.74952 | 6.137395 | 7.638402 | 9.960103 | 10.728447 | 6.969417 | 9.256541 | 8.984393 | 7.675486 | 7.339503 | 8.556645 | 9.807216 | NA | NA |
50pmol | 1 | 10.53159 | 9.132855 | 7.569305 | 8.045720 | 8.271192 | NA | 9.228590 | 7.587860 | 16.84602 | 12.72244 | 13.57268 | NA | 9.216503 | 9.812981 | 10.183775 | 8.187071 | 7.461197 | 9.276601 | 13.784136 | 7.253131 | 8.191030 | 10.21255 | 14.00214 | 12.80775 | 13.84634 | 11.543379 | 10.008055 | 8.172313 | 9.799682 | 7.019571 | 11.79277 | 14.64773 | 9.726148 | 7.192825 | NA | 8.849818 | 6.440419 | 8.545470 | 5.642106 | 7.884416 | 4.387496 | 7.153265 | NA | NA | NA |
50pmol | 2 | 10.53736 | NA | 10.513980 | 8.347621 | 8.456285 | NA | 7.992943 | 8.269956 | 16.74828 | 12.71449 | 13.61215 | NA | 9.187083 | 9.054498 | 10.128672 | 8.165500 | 6.694638 | 9.730023 | 14.604574 | 7.135959 | 8.501088 | 10.29986 | 14.00533 | 12.92880 | 13.84713 | 9.743997 | NA | 8.497645 | 7.484646 | 7.382746 | 11.78545 | 14.68558 | 7.699266 | 7.371963 | 8.051031 | 10.210618 | 6.553276 | 9.585343 | 8.934897 | 7.199104 | 6.466231 | 6.666879 | NA | NA | NA |
50pmol | 3 | 10.52018 | 5.409885 | 10.414587 | 6.392210 | 8.211960 | 8.629371 | 8.010051 | 8.125402 | 16.68880 | 12.59173 | 13.54185 | 10.42381 | 9.154545 | NA | 10.059451 | 8.200124 | 6.467420 | 9.528985 | 9.380464 | NA | NA | 10.19961 | 14.02047 | 12.85969 | 13.78955 | 9.839974 | 9.954989 | 8.320282 | 7.007159 | 6.915251 | 11.75573 | 14.66720 | 7.669944 | 7.050670 | 8.187233 | 9.865682 | 6.547034 | 9.095644 | 8.043669 | 8.290176 | 6.805857 | 6.417115 | NA | NA | NA |
50pmol | 4 | 10.54837 | 9.856548 | 10.487397 | 8.106476 | 7.921629 | 8.733797 | 7.627267 | 8.578971 | 16.75298 | 12.55756 | 13.55333 | NA | NA | 9.212217 | 10.346652 | 8.406582 | 4.913458 | 9.770142 | 13.939516 | 7.154078 | 8.187412 | 10.16831 | 13.99104 | 12.88104 | 13.84928 | 9.792434 | 10.153070 | NA | 7.253251 | NA | 11.80386 | 14.66456 | 7.573424 | 7.782606 | 9.402638 | 9.961304 | 4.980612 | 9.031966 | 8.814051 | 7.359200 | 7.169527 | 7.079907 | NA | NA | NA |
Keep in mind that if you only type “CON”, many protein names have CON somewhere in a text string, and those will be selected too. This is why the contaminants database uses two underscores to set off the identifier tag (CON__), so you can distinguish between contaminants and proteins with names like “condensin” or “ubiquitin-conjugating” or “domain-containing”.
If saveRm = TRUE
, the filtered-out data (“ALBU_BOVIN” +
“*HUMAN”) will be saved as a .csv file named
filtered_out_data.csv in the current working directory, and you
can inspect this list to see what was removed.
Case 2. Keep the proteins specified by the user in this step and remove everything else.
If we set removeList
to FALSE, running this code will
remove everything you didn’t specify and keep only things that
matched your search terms.
filterOutIn(dataTran, listName = "ALBU_BOVIN", regexName = "HUMAN",
removeList = FALSE)
R.Condition | R.Replicate | NUD4B_HUMAN | A0A7P0T808_HUMAN | A0A8I5KU53_HUMAN | ZN840_HUMAN | CC85C_HUMAN | TMC5B_HUMAN | C9JEV0_HUMAN | C9JNU9_HUMAN | ALBU_BOVIN | F8W0H2_HUMAN | H0Y7V7_HUMAN | H0YD14_HUMAN | H3BUF6_HUMAN | H7C1W4_HUMAN | H7C3M7_HUMAN | TCPR2_HUMAN | TLR3_HUMAN | LRIG2_HUMAN | RAB3D_HUMAN | CYTA_HUMAN | KPCB_HUMAN | LIPL_HUMAN | PIP_HUMAN | CO6_HUMAN | BGAL_HUMAN | SYTC_HUMAN | CASPE_HUMAN | DCAF6_HUMAN | DALD3_HUMAN | HGNAT_HUMAN | RFFL_HUMAN | RN185_HUMAN | ZN462_HUMAN | ALKB7_HUMAN | POLK_HUMAN | ACAD8_HUMAN | A0A7I2PK40_HUMAN | NBDY_HUMAN | H0Y5R1_HUMAN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100pmol | 1 | 10.59617 | 11.629505 | 11.461371 | 8.315348 | 8.952781 | 8.833937 | 7.736180 | 7.889538 | 16.76292 | 9.299651 | 10.357346 | 10.321521 | 8.132535 | 8.226486 | 10.173123 | 14.006782 | 7.486384 | 9.011536 | 10.26981 | 9.262305 | 9.952103 | 8.781496 | 7.047859 | 7.234610 | 11.80494 | 14.71384 | 6.775489 | 7.725502 | 10.335975 | 10.939236 | 7.568272 | 10.111329 | 9.938277 | 7.496910 | 7.637679 | 7.712738 | NA | NA | NA |
100pmol | 2 | 10.71487 | 12.159989 | 10.052500 | 8.659134 | 9.121174 | 8.968803 | 7.530568 | 8.294768 | 16.76875 | 9.167541 | 10.121893 | 10.338709 | 8.086487 | 7.954448 | 10.011280 | 10.512493 | 7.735480 | 8.943620 | 10.21241 | 9.322413 | 10.073449 | 8.509870 | 7.008875 | 7.002919 | 11.73674 | 14.70683 | 7.279609 | 8.445472 | 10.058467 | 10.559522 | 7.265925 | 10.348343 | 8.602358 | 6.939530 | 9.553050 | 7.950604 | 10.471813 | NA | NA |
100pmol | 3 | 10.46639 | 12.190792 | 11.088689 | 8.103769 | 9.127531 | NA | 6.797573 | 8.969260 | 16.69347 | 9.260677 | 10.257840 | 9.885818 | 8.275007 | 8.080432 | 10.820332 | 14.395759 | 7.649245 | 8.418693 | 10.26143 | 9.107695 | 10.200628 | 8.317577 | NA | 6.888870 | 11.77277 | 14.67310 | 6.520759 | 8.318009 | 9.970591 | 10.390675 | 7.161124 | 10.401629 | 8.748640 | 7.176720 | 7.418964 | 7.793871 | 10.883458 | 9.706811 | NA |
100pmol | 4 | 10.66221 | 11.902450 | 11.010415 | 8.588923 | 9.124371 | 8.721258 | 6.397005 | 8.386462 | 16.67235 | 9.792043 | 10.165829 | 10.092467 | NA | 7.841731 | 9.816296 | 14.485405 | 7.911680 | 8.707996 | 10.21790 | 8.776573 | 10.183221 | 8.457541 | NA | 7.107332 | 11.76563 | 14.66124 | 6.463718 | 7.766514 | 8.936074 | 10.749752 | 6.643218 | 9.952253 | 8.620437 | 7.071718 | 7.181633 | 7.757381 | 10.653061 | 9.892252 | 9.835011 |
200pmol | 1 | 10.56298 | 12.047141 | 10.969287 | 8.402065 | 7.288615 | 8.509940 | 7.197741 | NA | 16.73721 | 8.941866 | 10.283605 | 10.077367 | 8.077610 | 7.977841 | 10.506136 | 14.365875 | 7.806321 | 8.293637 | 10.20088 | 8.510688 | 10.232523 | 8.590645 | 6.031341 | 7.239260 | 11.79502 | 14.68066 | 6.343792 | 7.535170 | 9.876348 | 11.069962 | 7.034874 | 8.573870 | 8.746924 | 7.903880 | 7.180345 | 8.730611 | 11.190966 | NA | 9.710589 |
200pmol | 2 | 10.53186 | 11.771837 | 11.088757 | NA | 9.104358 | NA | 7.085164 | 8.208757 | 16.79052 | 9.387564 | 10.060966 | NA | 8.157292 | 8.019591 | 10.035834 | 14.652848 | 7.572173 | 8.822824 | 10.25298 | 9.307350 | 10.269781 | 8.693930 | NA | 7.085982 | 11.84818 | 14.75441 | 6.120292 | 7.968111 | 9.994960 | 9.503153 | 6.860543 | 9.776460 | 8.775531 | 7.059936 | 7.330260 | 7.277041 | NA | 10.041446 | 9.966576 |
200pmol | 3 | 10.60347 | 8.477484 | 11.155251 | 8.568416 | 9.140283 | 8.954421 | 6.690756 | 8.888482 | 16.80274 | 9.196543 | 10.428428 | 10.222608 | 8.188220 | 7.525047 | 10.134101 | NA | 7.447663 | 8.755449 | 10.24829 | 8.802237 | 10.080459 | 8.482341 | NA | 7.011984 | 11.78574 | 14.72607 | 5.956155 | 7.094894 | 10.194229 | 10.710537 | 6.776144 | 9.874052 | 9.152012 | 8.302428 | 7.197231 | 7.632834 | 10.449137 | 9.709462 | 9.439995 |
200pmol | 4 | 10.57897 | 8.454127 | 11.118493 | 8.682375 | 8.194285 | 9.028272 | 6.572711 | 8.315126 | 16.82923 | 9.074329 | 9.850693 | 10.123326 | 8.334982 | 6.609617 | 9.902441 | NA | 7.364369 | NA | 10.20847 | 9.042105 | 10.131013 | 8.419983 | NA | 6.922516 | 11.87336 | 14.74952 | 6.137395 | 7.638402 | 9.960103 | 10.728447 | 6.969417 | 9.256541 | 8.984393 | 7.675486 | 7.339503 | 8.556645 | 9.807216 | NA | NA |
50pmol | 1 | 10.53159 | 9.132855 | 7.569305 | 8.045720 | 8.271192 | NA | 9.228590 | 7.587860 | 16.84602 | 9.216503 | 9.812981 | 10.183775 | 8.187071 | 7.461197 | 9.276601 | 13.784136 | 7.253131 | 8.191030 | 10.21255 | 11.543379 | 10.008055 | 8.172313 | 9.799682 | 7.019571 | 11.79277 | 14.64773 | 9.726148 | 7.192825 | NA | 8.849818 | 6.440419 | 8.545470 | 5.642106 | 7.884416 | 4.387496 | 7.153265 | NA | NA | NA |
50pmol | 2 | 10.53736 | NA | 10.513980 | 8.347621 | 8.456285 | NA | 7.992943 | 8.269956 | 16.74828 | 9.187083 | 9.054498 | 10.128672 | 8.165500 | 6.694638 | 9.730023 | 14.604574 | 7.135959 | 8.501088 | 10.29986 | 9.743997 | NA | 8.497645 | 7.484646 | 7.382746 | 11.78545 | 14.68558 | 7.699266 | 7.371963 | 8.051031 | 10.210618 | 6.553276 | 9.585343 | 8.934897 | 7.199104 | 6.466231 | 6.666879 | NA | NA | NA |
50pmol | 3 | 10.52018 | 5.409885 | 10.414587 | 6.392210 | 8.211960 | 8.629371 | 8.010051 | 8.125402 | 16.68880 | 9.154545 | NA | 10.059451 | 8.200124 | 6.467420 | 9.528985 | 9.380464 | NA | NA | 10.19961 | 9.839974 | 9.954989 | 8.320282 | 7.007159 | 6.915251 | 11.75573 | 14.66720 | 7.669944 | 7.050670 | 8.187233 | 9.865682 | 6.547034 | 9.095644 | 8.043669 | 8.290176 | 6.805857 | 6.417115 | NA | NA | NA |
50pmol | 4 | 10.54837 | 9.856548 | 10.487397 | 8.106476 | 7.921629 | 8.733797 | 7.627267 | 8.578971 | 16.75298 | NA | 9.212217 | 10.346652 | 8.406582 | 4.913458 | 9.770142 | 13.939516 | 7.154078 | 8.187412 | 10.16831 | 9.792434 | 10.153070 | NA | 7.253251 | NA | 11.80386 | 14.66456 | 7.573424 | 7.782606 | 9.402638 | 9.961304 | 4.980612 | 9.031966 | 8.814051 | 7.359200 | 7.169527 | 7.079907 | NA | NA | NA |
Extension
Besides protein names, the function filterProtein()
provides a similar function to filter proteins by additional protein
information.
For Spectronaut: “PG.Genes”, “PG.ProteinAccessions”, “PG.ProteinDescriptions”, and “PG.ProteinNames”.
For Scaffold: “ProteinDescriptions”, “AccessionNumber”, and “AlternateID”.
filterProtein(dataTran, proteinInformation = "preprocess_protein_information.csv",
text = c("Putative zinc finger protein 840", "Bovine serum albumin"),
by = "PG.ProteinDescriptions",
removeList = FALSE)
where proteinInformation
is the file name for protein
information, automatically generated by preprocessing()
. In
this case, the proteins whose "PG.ProteinDescriptions"
match with “Putative zinc finger protein 840” or “Bovine serum albumin”
will be kept. Note that the search value text
is used for
exact equality search.
R.Condition | R.Replicate | ZN840_HUMAN | ALBU_BOVIN |
---|---|---|---|
100pmol | 1 | 8.315348 | 16.76292 |
100pmol | 2 | 8.659134 | 16.76875 |
100pmol | 3 | 8.103769 | 16.69347 |
100pmol | 4 | 8.588923 | 16.67235 |
200pmol | 1 | 8.402065 | 16.73721 |
200pmol | 2 | NA | 16.79052 |
200pmol | 3 | 8.568416 | 16.80274 |
200pmol | 4 | 8.682375 | 16.82923 |
50pmol | 1 | 8.045720 | 16.84602 |
50pmol | 2 | 8.347621 | 16.74828 |
50pmol | 3 | 6.392210 | 16.68880 |
50pmol | 4 | 8.106476 | 16.75298 |
Details
In some cases, a researcher may wish to filter out a specific protein
or proteins from the dataset. The most common instance of this would be
proteins identified from the common contaminants database, where we
don’t want something like BSA to be matched to a human protein because
the search algorithm didn’t have the correct option available, but we
don’t actually care about BSA itself and want to leave it out of our
visualization. Other examples may be filtering out entries from the
decoy database (specific to a Scaffold file only, will not be present in
a Spectronaut file), or a mixed-species experiment where the researcher
wants to evaluate data from only one species at a time. This step allows
you to set aside specific proteins from downstream analysis, using
either an exact match identifier (the listName =
argument),
or text-containing identifiers (the regexName =
argument).
listName
and
regexName
are defined, the proteins to be selected or
removed is the union of the two terms.Keep in mind: Removal of any proteins, including common contaminants, will affect any global calculations performed after this step (such as normalization). This should not be done without a clear understanding of how this will affect your results.