: ) wonderful world ( :

the metasyntactic variable

Archive for the ‘linux’ Category

join variable length multiline data entries with sed

without comments

% cat test.txt
data1-1
data1-2
data1-3
closing-form
data2-1
data2-2
data2-3
data2-4
closing-form
data3-1
data3-2
closing-form
% cat test.txt | sed -n -e ':x;N;/\nclosing-form$/!bx;s/\n/;/g;p'
data1-1;data1-2;data1-3;closing-form
data2-1;data2-2;data2-3;data2-4;closing-form
data3-1;data3-2;closing-form
%

 

A precodition is that there’s no empty data entry. If there is, you can introduce helper entries:

% cat testt.txt
data1-1
data1-2
closing-form
closing-form
closing-form
data3-1
data3-2
closing-form
% cat testt.txt | sed '/^closing-form$/ihelper-entry'
data1-1
data1-2
helper-entry
closing-form
helper-entry
closing-form
helper-entry
closing-form
data3-1
data3-2
helper-entry
closing-form
%

Written by grault

November 13, 2009 - 4:19 pm at November 13, 2009 - 4:19 pm

Posted in command line, linux

manipulate k-th column in file with sed & friends

without comments

$ echo $0
bash
$

 

Which means that all the stuff below holds at least when using Bash. One of the important features in it is “Process Substitution”. Not so efficient (due to extra bash processes spawned), but I just don’t care at the moment..

The first use case is to manipulate the k-th column of a file. Let’s see our test file:

$ cat -A if1.txt
a^Ib^Ic^Id^Ie^If^Ig^Ih$
b^Ic^Id^Ie^If^Ig^Ih^Ia$
c^Id^Ie^If^Ig^Ih^Ia^Ib$
d^Ie^If^Ig^Ih^Ia^Ib^Ic$
e^If^Ig^Ih^Ia^Ib^Ic^Id$
f^Ig^Ih^Ia^Ib^Ic^Id^Ie$
g^Ih^Ia^Ib^Ic^Id^Ie^If$
h^Ia^Ib^Ic^Id^Ie^If^Ig$
$ cat if1.txt
a       b       c       d       e       f       g       h
b       c       d       e       f       g       h       a
c       d       e       f       g       h       a       b
d       e       f       g       h       a       b       c
e       f       g       h       a       b       c       d
f       g       h       a       b       c       d       e
g       h       a       b       c       d       e       f
h       a       b       c       d       e       f       g
$

 

The problem is to search and replace with sed, but only in a given column.
The first solution is done by using paste, cut.

$ paste <(cut -f-4 if1.txt) <(cut -f5 if1.txt | sed 's/h/x/g') <(cut -f6- if1.txt)
a       b       c       d       e       f       g       h
b       c       d       e       f       g       h       a
c       d       e       f       g       h       a       b
d       e       f       g       x       a       b       c
e       f       g       h       a       b       c       d
f       g       h       a       b       c       d       e
g       h       a       b       c       d       e       f
h       a       b       c       d       e       f       g
$

 

The second one uses the bash language and its read command which is able to fill an array with values read from a line.

$ cat if1.txt | while read -a line; do
> line[4]=$(echo ${line[4]} | sed 's/h/x/')
> echo -e ${line[*]} | sed 's/\ /\t/g'
> done
a       b       c       d       e       f       g       h
b       c       d       e       f       g       h       a
c       d       e       f       g       h       a       b
d       e       f       g       x       a       b       c
e       f       g       h       a       b       c       d
f       g       h       a       b       c       d       e
g       h       a       b       c       d       e       f
h       a       b       c       d       e       f       g
$

 

Note that this multiline command can be written into a single line as well (obviously by removing the secondary prompt characters).

Now what if you want to do computations based on other columns. Here’s the same approach for this (removing 2nd and 4th columns and introducing a new 3rd one which should be the sum of the two deleted ones):

$ cat -A if2.txt
a^I1^Id^I2$
b^I3^Ie^I4$
c^I5^If^I6$
$ cat if2.txt
a       1       d       2
b       3       e       4
c       5       f       6
$ paste <(cut -f1,3 if2.txt) <(cut -f2,4 if2.txt | sed 's/$/ + p/' | dc)
a       d       3
b       e       7
c       f       11
$ cat if2.txt | while read -a line; do
> echo -e ${line[0]}\\t${line[2]}\\t$(echo ${line[1]} ${line[3]} + p | dc)
> done
a       d       3
b       e       7
c       f       11
$

Written by grault

October 15, 2009 - 3:48 pm at October 15, 2009 - 3:48 pm

multiconnection download with scsh

without comments

There’s a server (of a radio) I download audio files from. The thing is, that the bandwidth for a connection is limited to ~24Kb/sec nowdays (several years ago there wasn’t any limit). By getting the file with multiple connections and concurrently solves the problem somewhat. Unfortunately the number of connections from a given IP address is also limited to ~15. Anyway, let’s say ~240Kb/sec (when using 10 connections) is much more than 24Kb/sec.

Parts of a file can be obtained by Curl. I decided to use The Scheme Shell to implement my idea due to its thread support and strong relationship with command line tools (as being a shell).

The solution is a fast hack. Let’s see..

$ ls
getItFast.scm  getItFast.scm~
$ ./getItFast.scm http://someserver/2200.mp3
$ ls
2200.mp3.00  2200.mp3.08  2200.mp3.16  2200.mp3.24  2200.mp3.32  2200.mp3.40
2200.mp3.01  2200.mp3.09  2200.mp3.17  2200.mp3.25  2200.mp3.33  2200.mp3.41
2200.mp3.02  2200.mp3.10  2200.mp3.18  2200.mp3.26  2200.mp3.34  2200.mp3.42
2200.mp3.03  2200.mp3.11  2200.mp3.19  2200.mp3.27  2200.mp3.35  getItFast.scm
2200.mp3.04  2200.mp3.12  2200.mp3.20  2200.mp3.28  2200.mp3.36  getItFast.scm~
2200.mp3.05  2200.mp3.13  2200.mp3.21  2200.mp3.29  2200.mp3.37
2200.mp3.06  2200.mp3.14  2200.mp3.22  2200.mp3.30  2200.mp3.38
2200.mp3.07  2200.mp3.15  2200.mp3.23  2200.mp3.31  2200.mp3.39
$ cat 2200.mp3.* > 2200.mp3
$ rm 2200.mp3.*
$ ls
2200.mp3  getItFast.scm  getItFast.scm~
$ cat getItFast.scm
#!/usr/bin/scsh \
-o placeholders -o threads -o locks -s
!#

; this many thread will be started,
; each of'em represents a connection
(define POOL-SIZE 10)

; the length of a chunk in bytes
; (downloaded with one connection)
(define STEP 1000000)

(define URL (argv 1))
(define FNAME (file-name-nondirectory URL))

(define url-content-length
  (lambda (url)
    (string->number
     (cadr ((infix-splitter (rx (+ white)))
            (run/string
             (| (curl -s -S -I ,url)
                (grep "Content-Length"))))))))

(define LENGTH (url-content-length URL))

(define make-queue
  (lambda (data-list)
    (let ((lock (make-lock)))
      (lambda ()
        (let ((re '()))
          (obtain-lock lock)
          (if (null? data-list)
              (set! re '())
              (begin
                (set! re (car data-list))
                (set! data-list (cdr data-list))))
          (release-lock lock)
          re)))))

(define range-string
  (lambda (beg end)
    (let ((begs (number->string beg))
          (ends (number->string end)))
      (string-append begs "-" ends))))

(define get-part
  (lambda (beg end fn)
    (run (curl -o ,fn -s -S -r
               ,(range-string beg end) ,URL))))

; this long is the number field
; in the filenames of parts
(define PADLEN
  (string-length
   (number->string
    (ceiling
     (/ LENGTH STEP)))))

(define file-counter-string
  (lambda (i)
    (let loop ((s (number->string i)))
      (if (<= PADLEN (string-length s))
          s
          (loop (string-append "0" s))))))

(define counted-file-name
  (lambda (i)
    (string-append FNAME
                   "."
                   (file-counter-string i))))

; this contains the works to do
; (work ~ download a specific chunk)
; e.g. ((0 999999 "foo.mp3.00") (1000000 1999999 "foo.mp3.01") ... )
(define QUEUE
  (make-queue
   (let loop ((work-list '()) (low 0) (upp (- STEP 1)) (counter 0))
     (if (> low LENGTH)
         work-list
         (loop (cons (list low upp (counted-file-name counter)) work-list)
               (+ upp 1)
               (min LENGTH (+ upp STEP))
               (+ counter 1))))))

(define signal-thread-finish
  (lambda (waiter)
    (placeholder-set! waiter #f)))

(define start-worker
  (lambda ()
    (let ((waiter (make-placeholder)))
      (spawn
       (lambda ()
         (let loop ()
           (let ((work (QUEUE)))
             (if (null? work)
                 (signal-thread-finish waiter)
                 (begin
                   (apply get-part work)
                   (loop)))))))
      waiter)))

(let loop ((i POOL-SIZE) (waiters '()))
  (if (= i 0)
      (map placeholder-value waiters)
      (loop (- i 1) (cons (start-worker) waiters))))
$

 

Useful links:

Written by grault

August 5, 2009 - 11:41 am at August 5, 2009 - 11:41 am

Posted in linux, lisp

join & filter multiline data records with sed

without comments

Grouping with sed:

$ cat nrs.txt
01
02
03
04
05
06
07
08
09
10
11
12
$ cat nrs.txt | sed -n -e 'N;s/\n/ /g;p'
01 02
03 04
05 06
07 08
09 10
11 12
$ cat nrs.txt | sed -n -e 'N;N;s/\n/ /g;p'
01 02 03
04 05 06
07 08 09
10 11 12
$ cat nrs.txt | sed -n -e 'N;N;N;s/\n/ /g;p'
01 02 03 04
05 06 07 08
09 10 11 12
$ cat nrs.txt | sed -n -e 'N;N;N;N;N;s/\n/ /g;p'
01 02 03 04 05 06
07 08 09 10 11 12
$

 

Good for the following task:

$ cat entries.txt
entry-1-data-1
entry-1-data-2
entry-2-data-1
entry-2-data-2
$ cat entries.txt | sed -n -e 'N;s/\n/ /;p'
entry-1-data-1 entry-1-data-2
entry-2-data-1 entry-2-data-2
$

 

Two long groups, removing elements (1st, 2nd):

$ cat nrs.txt | sed -n -e 'p;n'
01
03
05
07
09
11
$ cat nrs.txt | sed -n -e 'n;p'
02
04
06
08
10
12
$

 

In general case you have groups of length _k_ and the starting pattern is ‘n;n;…;n’. There’s k-1 number of letter n here. You can place letter p around the n-s, which is exactly k possibility. If there’s a p in position k0 then the k0-th element will be printed out.
So if you have 6 long groups and you want every fifth element:

$ cat nrs.txt | sed -n -e 'n;n;n;n;p;n'
05
11
$

Written by grault

June 25, 2009 - 3:54 pm at June 25, 2009 - 3:54 pm

Posted in command line, linux, script

cumulating minutes begun

without comments

$ cat seconds.txt
120
123
$ cat seconds.txt | sed 's/$/ 60 ~ 0 !=r +/' | sed '1i[1+] sr 0' | sed '$ap' | dc
5
$

Written by grault

June 24, 2009 - 4:11 pm at June 24, 2009 - 4:11 pm

Posted in command line, linux, script

connecting grepping into sed

with 5 comments

I use grep and sed mostly as the following pattern

cat file | grep some-pattern | sed s/other-pattern/replacement/

 

But what if some-pattern and other-pattern is the same, moreover you want to refer groups in replacement. Here’s what sed offers for this:

cat file | sed -e 's/pattern/replacement/p; d'

Written by grault

March 24, 2009 - 5:24 pm at March 24, 2009 - 5:24 pm

Posted in command line, linux, notes

removing Hungarian accents with sed on XP

without comments

I’m up to create a backup from the family photo collection. To avoid further issues with character encoding I decided to remove accents from characters in file names. This is the sed file I wrote:

s/\o341/a/g
s/\o355/i/g
s/\o373/u/g
s/\o365/o/g
s/\o374/u/g
s/\o366/o/g
s/\o372/u/g
s/\o363/o/g
s/\o351/e/g
s/\o301/A/g
s/\o315/I/g
s/\o333/U/g
s/\o325/O/g
s/\o334/U/g
s/\o326/O/g
s/\o332/U/g
s/\o323/O/g
s/\o311/E/g

Written by grault

December 26, 2008 - 2:47 pm at December 26, 2008 - 2:47 pm

Posted in command line, linux

file renamer improvement

without comments

I decided to create a context menu for files to remove spaces or unwanted characters from the name. I use nautilus, you are able to do so by nautilus-actions package. We need the script which will run in case the menupoint is selected. If multiple files are selected, a space separated list of them will be the parameter. My script is like this

#!/usr/bin/zsh

pattern="[^a-zA-Z0-9-.]"
for (( i=1 ; i<=$# ; i+=1 ))
do
  source=$*[$i]
  target=${source:h}/${${source:t}//${~pattern}/_}
  if [[ ! -a $target ]] then
    mv "$source" "$target"
  fi
done

Written by grault

May 18, 2008 - 9:14 pm at May 18, 2008 - 9:14 pm

Posted in linux

file renamer

with 2 comments

Let’s suppose you have a bunch of files with various characters in filename which you want to get rid of. I mean you want to eliminate those characters, not the files.

afroid-laptop% paste <(ls -1 | sed -e 's/^\(.*\)$/"\1"/') <(ls -1 | sed -e 's/\ /
_/g') | sed -e 's/^/mv /'
mv "01 Track 01 13.mp3" 01_Track_01_13.mp3
mv "02 Track 02 22.mp3" 02_Track_02_22.mp3
mv "03 Track 03 32.mp3" 03_Track_03_32.mp3
mv "04 Track 04 42.mp3" 04_Track_04_42.mp3
mv "05 Track 05 52.mp3" 05_Track_05_52.mp3
mv "06 Track 06 62.mp3" 06_Track_06_62.mp3
mv "07 Track 07 72.mp3" 07_Track_07_72.mp3
mv "08 Track 08 82.mp3" 08_Track_08_82.mp3
mv "09 Track 09 91.mp3" 09_Track_09_91.mp3
afroid-laptop% paste <(ls -1 | sed -e 's/^\(.*\)$/"\1"/') <(ls -1 | sed -e 's/\ /
_/g') | sed -e 's/^/mv /' | sh
afroid-laptop% ls
01_Track_01_13.mp3  04_Track_04_42.mp3  07_Track_07_72.mp3
02_Track_02_22.mp3  05_Track_05_52.mp3  08_Track_08_82.mp3
03_Track_03_32.mp3  06_Track_06_62.mp3  09_Track_09_91.mp3
afroid-laptop%

Written by grault

May 14, 2008 - 8:03 pm at May 14, 2008 - 8:03 pm

Posted in command line, linux

useful key shortcuts on ubuntu

without comments

I’m evil sometimes ; )

As a first scenario, I wanted to save some pages of a book provided by a flash site. This site hides the pages once seen because of copyrigth issues. I had to create a bunch of screenshots : ) I also wanted to minimize clicks or key strokes to use during a screenshot.

On ubuntu, start up ‘gconf-editor’ from a terminal. Go to path ‘apps/metacity/keybinding_commands/command_1′. Place there ‘/somepath/sshot.sh’. Content of that is like this:

afroid-laptop% cat sshot.sh
import -window root -quality 90 /pathtoscreenshotscollection/`date +%Y%m%d%H%M%S`.png
afroid-laptop%

Later on, under ‘global_keybindings/command_1′, insert the string e.g. ‘<Alt>T’.

Obviously you can change names and bindings as it’s appropriate for you.

Next one is to get descriptions for words form a dictionary.
Command to place is: ‘firefox “http://pewebdic2.cw.idm.fr/popup/popupmode.html?search_str=”`xsel`’.
Of course you have to install the xsel package before usage. Later you only select the word with your mouse and press F8 for example and a page appears with the word from that dictionary…

Written by grault

April 27, 2008 - 9:39 pm at April 27, 2008 - 9:39 pm

Posted in command line, linux