10/24 Dear motd. I need some perl to randomize the lines in a text file.
I know this is easy but i have no perl-fu. Please help.
\_ @lines = <FILEHANDLE>;
print splice(@lines,rand(@lines),1) while @lines;
\_ Let's hope your file is not bigger than the amount of virtual
memory of the machine.
\_ Yes, since motd so often grows to fill memory on soda....
\_ Let's... How completely pointless. This was a 2 minute perl
snippet, fer chrissake...
\_ Maybe I should show what I have and you can tell me where I'm
going wrong, This uses a fisher-yates randomization to be
unbiased:
#!/usr/bin/perl
open(FILE, "+< $_");
while (<FILE>) {
push(@lines, $_);
while (@lines) {
print splice(@lines,rand(@lines)%@arraylength);
}
@reordered = fisher_yates_shuffle(@lines);
foreach (@reordered) {
print $_;
}
sub fisher_yates_shuffle {
my $list = shift; # this is an array reference
my $i = @{$list};
return unless $i;
while ( --$i ) {
my $j = int rand( $i + 1 );
@{$list}[$i,$j] = @{$list}[$j,$i];
}
}
\_ Your function isn't returning the array. @reordered is being
set to "0" (The return value of the "while"). Change
"foreach(@reordered)" to "foreach(@lines)" and this code
should work.
\_ Careful about using rand with %. You can get into distribution
problems there.
\_ The % doesn't do anything; perl rand called that way can
never return >= @lines.
\_ Upgraded as per dbushong. I didn't trust perl enough.
print splice(@lines,rand(@lines)%@arraylength) while (@lines);
\_ Upgraded as per dbushong
\_ Where's $_ coming from in the open() line?
\_ #!/usr/bin/perl
die "usage: $0 file\n" if @ARGV != 1;
open(my $fh, '<', $ARGV[0]);
my @offsets = (0);
push(@offsets, tell($fh)) while <$fh>;
pop @offsets;
while (@offsets) {
seek($fh, splice(@offsets, rand(@offsets), 1), 0);
print scalar <$fh>;
}
close($fh);
## this is how i'd do it. --dbushong
\_ My (extremely short) version:
/msg dbushong hey, can you write a solution to that motd thing?
\- i have had problems using perl rand to do this on files with
more than 32k lines. you may want to test this out ... maybe
rand returns more values than it use to but i had to re-write
this for larger files ... this was +5yrs ago. if you want the
codes mail me. oh also for larger files performance can be an
issue. [i dont mean really large files ... i typically was
operating on about 130k entries ... 2x/16netblocks of addresses]
--psb
\_ Actually, it looks like it's not perl rand, it's just
manipulating the slices efficiently on large arrays that's
making things suck. I'll ponder. --dbushong
\_ OK, redone to user fisher-yates as above. Now it only takes
22 seconds on soda on /usr/share/dict/words: --dbushong
#!/usr/bin/perl
die "usage: $0 file\n" if @ARGV != 1;
open(my $fh, '<', $ARGV[0]);
my @offsets = (0);
push(@offsets, tell($fh)) while <$fh>;
for (my $i = @offsets - 2; $i >= 0; $i--) {
my $j = int(rand($i));
@offsets[$i,$j] = @offsets[$j,$i] if $i != $j;
}
for (@offsets) {
seek($fh, $_, 0);
print scalar <$fh>;
}
close($fh);
\- hello my codes taeke about 5-6 sec on /usr/dict/words
on sloda but the sloda numbers are not that stable
it is interesting to see the memory growth variations
of the different approaches. ok tnx.
this time i didnt check the quality of the shuffle.
SSH-soda{12}[~/bin]% while 1
loop==> ./rand1.pl /usr/share/dict/words > /dev/null
loop==> end
0:05.46sec, [3.961u 0.100s 74.3%], [10080Kbmax 0pf+0#swap]
0:06.56sec, [3.949u 0.146s 62.1%], [10078Kbmax 0pf+0#swap]
0:05.42sec, [3.953u 0.108s 74.7%], [10080Kbmax 0pf+0#swap]
0:06.70sec, [3.921u 0.172s 61.0%], [10082Kbmax 0pf+0#swap]
0:08.29sec, [4.041u 0.182s 50.9%], [10074Kbmax 0pf+0#swap]
0:05.19sec, [3.870u 0.185s 78.0%], [10074Kbmax 0pf+0#swap]
0:04.79sec, [3.830u 0.176s 83.5%], [10078Kbmax 0pf+0#swap]
0:04.55sec, [3.902u 0.159s 89.0%], [10074Kbmax 0pf+0#swap]
0:06.07sec, [3.917u 0.182s 67.3%], [10076Kbmax 0pf+0#swap]
\_ How would an Intel Critical Asset randomize a file? |