Tuesday, February 3, 2015

Perl: Skills for reading text files in Perl



Abstract: How to open a file handle to access text contents.


One central operation for data processing is accessing files. Perl a kind of script language known as ‘glue language’, and text format files are so popular in bioinformatics data analysis. So the kills of file processing is so important for manipulating text file.

Open a file and then write a file
The operations were the basic operations of file handle in Perl. The other more complicated operations are developed from these codes.
open my($IN), "<", "/home/yuan/data_2/test/test_R1.fastq" or die; # open a file handle with read only
open my($OUT), ">", "/home/yuan/data_2/test/test.txt" or die; # open a file handle with write
while(my $line=<$IN>){ #loop for read all line one by one
chomp($line);
print $OUT $line, "\n"; #export
}
close($IN); #close file handle
close($OUT); #close file handle

The first line and the last line in the file
open my($IN), "<", "/home/yuan/data_2/test/test_R1.fastq" or die;
my $first=<$IN>; #read the first line
print $first;
while(my $line=<$IN>){
print $line if eof($IN); # read the last line by using eof();
}
close($IN);

Read multiple lines per loop iteration
For example, there are four lines per read record in FASTQ format files.
open my($IN), "<", "/home/yuan/data_2/test/test_R1.fastq" or die;
# read four lines at a time
while((my $L1=<$IN>) & (my $L2=<$IN>) & (my $L3=<$IN>) & (my $L4=<$IN>)){
print $L2; # only print the second line as sequences
}
close($IN);

Read lines into array
The Perl codes of slurping file at a time are more precise than the method of reading lines one by one. However, it is not good choice for big data because too much memory would be cost.
#read all text contents at a time
#first method
open my($IN), "<", "/home/yuan/data_2/test/test_R1.fastq" or die;
my @array=<$IN>; #read file into array
close($IN);

another method
use File::Slurp;
my @array=File::Slurp::read_file("/home/yuan/data_2/test/test_R1.fastq");

#export
#array loop from the first to the last
foreach my $line(@array){
print $line;
}

# array loop from the last to the first
foreach my $line(reverse @array){
print $line;
}


Writing data: 2014.01.21



No comments:

Post a Comment