Wednesday, June 17, 2015

sed by Example


Syntax: sed [options] 'instruction' file

Actually you can have more than 1 instruction:
sed -e 'instruction1' -e 'instruction2' -e 'instruction3'
The most important sed option is -n. Bear in mind that without using -n option, lines that were not touched by sed will be printed as well. In other words, by using -n sed prints only lines which affected. For example in the following print examples we do need -n otherwise it prints all lines again. 

Important Note: In case of deleting we do not need -n option. 

Important Note 2: Whenever you use -n you must use p in the action section to print in STDOUT. 


Print


To print the 25th line of a file:
$ sed -n '25p' /etc/passwd
To print line 24 to line 26:
$ sed -n '24,26p' /etc/passwd

To print all lines but the 3rd one:
$ sed -n '3!p' /etc/passwd
To print all lines except 3 to 7:
$ sed -n '3,7!p' /etc/passwd
To print the last line of a file:
$ sed -n '$p' /etc/passwd
To print all lines containing test:
$ sed -n '/test/p' /etc/passwd

To ignore case and print all lines containing test, Test, teSt,  etc:
$ sed -n '/test/Ip' /etc/passwd
You can also use Bash wildcards like * and ? in the file name:
$ sed -n '/behnam/Ip' testfile*.txt
If we have a list of names and want to have a range of lines containing those words:
$ sed -n '/nagios/,/ntp/p' /etc/passwd
So it prints lines starting the line containing nagios (1st nagios in the file) through the line containing ntp (again 1st appearance)



To print the line that have the 1st nagios plus two following lines:
$ sed -n '/nagios/,+2p' /etc/passwd
The following example does not work because Regex quantifiers (just for + and ? but not for *) should be escaped by escape character which is \
$ sed -ne '/^b.+'/Ip /etc/passwd
So we should run:
$ sed -ne '/^b.\+'/Ip /etc/passwd
Now it detects and prints Behnam, behnam, bp, BP, etc at the beginning of the lines. As you saw earlier, I is for ignoring case. 

The following example also works by adding -r option to sed and not using \ before +
$ sed -ner '/^b.+'/Ip /etc/passwd

Delete

To delete 1st line: (do not use -n option)
$ sed '1d' /etc/passwd > ~/passwd
To delete last line:
$ sed '$d' /etc/passwd > ~/passwd
To delete lines other than the last line:
$ sed '$!d' /etc/passwd > ~/passwd
To delete every 2nd line beginning of line 3. i.e. line 3, 5, 7, ...  

$ sed '3~2d' /etc/passwd > ~/passwd
To delete all blank lines in the file:
$ sed '/^$/d' /etc/passwd > ~/passwd

Substitute

Most common syntax for substitution is 
sed '/s/LHS/RHS/g' file.txt 
Or you can use /Ig instead of /g for ignoring case sensitivity. 

LHS can be literal and regex, RHS can be literal and back references like & and \1

To delete all blank lines and replaces behnam as well:

$ sed -e '/^$/d' -e 's/behnam/bp/g' /etc/passwd

Note: As you see the syntax is similar to find and replace command in vi:

:s/behnam/bp/g
If you want to do the same but want to create a new file with bak extension:
$ sed -i.bak -e '/^$/d' -e 's/behnam/bp/g' /etc/passwd

To change Behnam in the test01.txt file to bp in the test02.txt file:
$ sed s/Behnam/bp/ test01.txt > test02.txt
Another way to do that is: 
$ cat test01.txt | sed s/Behnam/bp/p > test02.txt
Note: using quotes is highly recommended. If you have metacharacters in the command, quotes are necessary so you'd better type:
$ sed 's/Behnam/bp/' test01.txt > test02.txt
To change Behnam to Behdad:
$ echo Behnam | sed 's/nam/dad/'
As you know, sed is line oriented. So if you have such a file:
one two three, one two four
four two three two one
one hundred and one
And run:
$ sed 's/one/FIVE/' testfile.txt
The output would be:
Five two three, one two four
four two three two FIVE
FIVE hundred and one
Note that this changed one to FIVE once on each line and din touch the 2nd ones. 


To replace all Behnam just in lines which have Pournader:
$ sed '/Pournader/s/Behnam/Ben/g' testfile.txt
$ sed -n '/Pournader/s/Behnam/Ben/gp' testfile.txt
 To replace all Behnam just in lines which starts with Behnam (case insensitive):
$ sed '/^Behnam/Is/Pournader/123/g' testfile.txt

If you want to change a pathname that contains a slash you could use the backslash to quote the slash:
$ sed 's/\/etc\/passwd/' old_file > new_file

Back References

We can Use & as the matched string. & means the full value of matched pattern. 

To search for a pattern and add some characters, like parenthesis, around the pattern:
$ sed 's/[a-z1-9]*/(&)/' old_file > new_file
You can also double a pattern
$ sed 's/[a-z1-9]*/& &/' old_file > new_file
$ echo "123 abc" | sed -n 's/[0-9]*/& &/p'
$ echo "123 abc" | sed -nr 's/[0-9]+/& &/p'
To put "item:" at the beginning of each line: 
$ sed -n 's/.*/item: &/p' testfile.txt
To put "item:" at the beginning of each word: 
$ sed -n 's/.*/item: &/gp' testfile.txt
To search and print lines with a particular pattern:
$ sed -n 's/^Behnam/&/gp' testfile.txt
It is just another way to do:
$ sed -n '/^Behdad/gp' testfile.txt
To match a number between 100 and 99999 and print:

$ sed -n 's/[1-9][0-9]\{2,4\}/&/gp'  

Note: be careful to escape { and } in sed by using escape character \

\1 is the first remembered pattern and the \2 is the second remembered pattern. We can continue up to \9

If you want to keep the 1st word of a line, and delete the rest of the line, mark the important part with the parenthesis:
$ echo "behnam pournader" | sed -n 's/\([a-z]*\).*/\1/p'
[a-z]* matches 0 or more lower case letters (behnam).* matches zero or more characters after the first match (pournader)
Note: Do not forget to use \( and \) to group the pattern when using \1

This returns abc as again [a-z]* matches just abc and .* matches 123:
$ echo "abc123" | sed -n 's/\([a-z]*\).*/\1/p'
So to keep the 1st word of a line and delete the rest of the line, we use:
$ sed -n 's/\([a-z]*\) .*/\1/p' testfile.txt  
Note: Do not forget to put an space before dot. 

If you want to switch two words around:
$ echo "red dog" | sed -n 's/\([a-z]*\) \([a-z]*\)/\2 \1/p'
Note 1: Space between the 2 remembered patterns is there to make sure 2 words are found. If a line just have 1 (or less) word, sed does not touch it in this case. 

Again by using -r, backslash is not needed before ( and ):
$ echo "red dog" | sed -nr 's/([a-z]*) ([a-z]*)/\2 \1/p'
If you want to eliminate duplicated words, you can try:
$ echo "behnam behnam" | sed -n 's/\([a-z]*\) \1/\1/p'
To just detect duplicated words:
$ sed -n '/\([a-z][a-z]*\) \1/p'
To reverse the first three characters on a line:
$ echo "behnam" | sed -n 's/^\(.\)\(.\)\(.\)/\3\2\1/p'
Note: Instead of using [A-Za-z]* which won't match words like "won't", we'd better use [^ ]* that matches everything except a space. This will also match anything because * means 0 or more! 

The following will put parenthesis around just the 1st word in each line: 
$ sed -n 's/[^ ][^ ]*/(&)/' old_file > new_file  
Note: [^ ] is used 2 times in order to avoid matching the null string.

As you see before, if you want to make changes for every word, add a g after the last delimiter. Otherwise it replaces just the 1st match on all lines. 
$ sed -n 's/[^ ][^ ]*/(&)/g' old_file > new_file
To keep the 1st word on the line but delete the 2nd one:
$ echo "red dog" | sed -n 's/\([a-zA-Z]*\) \([a-zA-Z]*\) /\1 /p'


sed Script File

We also can use a sed script file to do that using following syntax: 
sed -f SedScriptFile DataFile01.txt DataFile02.txt
So the contents of sed script file can be
/^$/d
s/behnam/bp/g
Important note: we do not need to escape any character in sed script file.

Labels: ,

Tuesday, June 16, 2015

Regex


Metacharacters have some special meanings in Regex: 


Backslash \
Caret ^
Dollar sign $
Dot .
Pipe |
Question mark ?
Asterisk *
Plus sign +
Parenthesis ()
Square bracket []
Curly brace {}

Note: To use any of above-mentioned characters as a literal in regular expression, you have to escape them with a backslash so to match 2+2=4, enter 2\+2=4. If you do not escape +, it has its special metacharacter meaning. 
The backslash escapes a special character, which means that character gets interpreted literally so \$ means $, rather than its Regex meaning. Likewise \\ has the literal meaning of \

If you are using grep, to find the literal asterisk character in a file, use single quotes, otherwise it shows everything in the file:
$ grep '*' /etc/profile


Brackets

Brackets enclose a set of characters to match in a single regex. so to match an a or an e, use [ae]. You may use this in gr[ae]y to match either gray or grey.

Hyphen is used to show a range of characters so [0-9] matches a single digit between 0 and 9. You may use more than one range like [0-9a-mA-M]. You may also combine single characters and ranges like [0-9a-mzA-MZ] which matches 0 to 9, a to m and A to M plus z and Z. 

Now we are ready to match common word patterns by using combined sequences of characters in square brackets:

[0-9][0-9][0-9][0-9][0-9] matches any US zip code and [Bb][Ee][Hh][Nn][Aa][Mm] matches BEHNAM, Behnam, behNAm, etc 

Example: To list all files in the current directory which start with letter a, b, c, m, or in the range of u to z: (tip: use -d option to avoid getting messy stdout)
$ ls -ld [a-cmu-z]*

Caret

If the the pattern within the square braces starts with ! or ^, any character not enclosed will be matched. I mean inserting ^ after the opening bracket negates the character class so the result is that the character class matches anything that is not in the character class. As an example b[^x] matches "be" in "behnam" but it does not match "pub" because we do not see any character after "b" in "pub". Or [^x-zX-Z] matches any character except those characters in the range of x to y. 

For the use of caret as an anchor, wait a minute to reach to Anchors section of this tutorial. Anchors do not match any character. They match a position before, after, or in between characters. 


Dot

Dot matches almost any character, I mean it matches a single character, except line break characters. So we can say dot is the short form of [^\n]. As an example Behn.m matches Behnam, Behnom, Behn#m, but not Behnm or Behnaam and B... matches Beer and Bear but not Bug.  

Example: To get all six-character words starting with b and ending in m simply enter: 
$ grep '\<b....m\>' /usr/share/dict/words
Note: If the file /usr/share/dict/words does not exists, install words package by issuing: yum install words


Asterisk, Plus and ?


  • Asterisk matches any number of previous characters, including zero instance of characters. 
  • Plus sign is like asterisk but matches one or more previous characters. 

  • ? is also similar to asterisk but matches 0 or 1 of the previous characters. It is generally used for matching single characters like colo?r which matches colour or color.   
Example: [a-zA-Z]* matches zero or more letters, and tries to match as many characters as possible to the end of the word. 

Example: <[A-Za-z][A-Za-z0-9]*> matches an HTML tag with no attributes. <[A-Za-z0-9]+> seems to be easier to write but it matches invalid tags such as <5>


Anchors

Anchors do not match any characters. Anchors match a position. 
^ matches at the start of the string, and $ matches end of the string so ^Behnam matches Behnam at the beginning of a line and Pournader$ matches Pournader at the end of a line. 
^Behnam$ matches lines with only Behnam word. 
^B matches only the first B in BehBam.

Note: As you saw previously in Brackets section, the caret matches the beginning of a line, but sometimes negates the meaning of a set of characters.

Example: to display lines starting with the string "root":
$ grep ^root /etc/passwd

Example: in order to see which accounts have no shell assigned:
‍$ grep :$ /etc/passwd

As we said earlier, $ at the end of a Regex matches the end of a line so Pournader$ matches Pournader at the end of a line and ^$ matches blank lines.

Example: in order to see which accounts have bash as shell:
‍$ grep bash$ /etc/passwd


Word Boundaries 

The angle brackets must be escaped, otherwise they have their literal meanings. \< and \> mark word boundaries: /< matches beginning of a word and \> matches the end of a word. As an example \<the\> matches the word "the" itself but not the words "them", "there", "other", and so on. 

\b Matches the empty string at the edge of a word.
\B Matches the empty string provided it's not at the edge of a word.
\< Match the empty string at the beginning of word.
\> Match the empty string at the end of word.


Alternation

Alternation is Regex equivalent of "or". As an example Toyota|Honda matches Toyota in "I have a Toyota and a Honda". If the regular expression is applied again, it matches Honda too. We can add as many alternatives as we want: Toyota|Honda|Ford|Subaru.

Important Note: Alternation has the lowest precedence of all other operators so 
"Toyota|Honda tire" matches "Toyota" or "Honda tire". To match "Toyota ire" or "Honda tire", we have to group them as: (Toyota|Honda) tire.

Example:
‍$ grep 'be(a|e)r' testfile.txt


Repeating a Pattern

To specify a specific amount of repetition, use curly braces:

  • [1-9][0-9]{3} matches a number between 1000 and 9999
  • [1-9][0-9]{2,4} matches a number between 100 and 99999
  • [0-9]\{5\} matches exactly five digits

Laziness and Greediness

Sometimes Regex does not seem to behave the way you had expected because Regex is very greedy and it matches as large as it can. I mean the answer of 
^F.+: on "From: using the :abc" string is the largest possible match which is 
"From: using the :" not "From:". 
The solution is adding ? which means please be lazy and stop at the 1st so ^F.+?: will match the smallest match which is "From:"

As an another example, the regex <.+> matches <EM>second</EM> in "This is my <EM>second</EM> test" html string. Again, to make it lazy place a question mark after the quantifier so <.+?> matches <EM>.

For more information on this subject consult this link


Back Reference

You can use the back reference \1 to match the same text that was matched by the capturing group. 

Example: ([xyz])=\1 matches x=x, y=y, and z=z


Non-Printable Characters

Use \t to match a tab character (ASCII 0x09), and \n for line feed (0x0A). 

Note: Bear in mind that text files in Microsoft Windows use \r\n to terminate lines. UNIX text files simply use \n


Shorthand Character Classes

\d matches a single character that is a digit 
\w matches a "word character" (alphanumeric characters plus underscore)
\s matches a white-space character (includes tabs and line breaks)

Labels: ,

grep by Example


Syntax: grep 'word' file1 file2 file3

To find all lines containing behnam:
$ grep -i behnam /etc/passwd
To search recursively i.e. read all files under /etc:
$ grep -r 192.168.1.5 /etc/
To search only behnam not behnamp:
$ grep -w behnam test.txt
To search 2 different words:
$ egrep -w behnam|behdad test.txt
To report the number of times that the pattern has been matched:
$ grep -c behnam test.txt

To precede each line of output with the number of the line in the file:
$ grep -n root /etc/passwd

Tp print all lines that do not contain behnam:
$ grep -v behnam test.txt
To find out how many lines does not match the pattern:
$ grep -v -c behnam test.txt
To display lines starting with the string "root"
$ grep ^root /etc/passwd
To filter the name of the hard disk partitions in dmesg output: 
$ dmesg | egrep "(s|h)d[a-z][1-9]"

Note: The above example does not work with grep and as you see we used egrep instead. egrep is nothing but grep -E which switches grep into a special mode so that the expression is evaluated as an Extended Regular Expression as opposed to its normal pattern matching.

To list text files whose contents mention behnam:
$ grep -l behnam *.txt
To display output in colors:
$ grep --color root /etc/passwd
To search a string in a Gzip compressed file:
$ zgrep –i behnam test.tar.gz
To see which accounts have no shell assigned
$ grep :$ /etc/passwd

To display all words starting with b and ending in m:
$ grep '\<b.*m\>' test.txt
To display the lines which does not match 2 or more patterns: 
$ grep -v -e "pattern 1" -e "pattern 2"
To show the position of match:
$ grep -o -b root /etc/passwd

Labels: ,

Friday, June 12, 2015

VMware FT vs VMware HA

Knowing your environment uptime and data recovery needs for each server is a key to determine whether you need HA or FT or none of them.


Fault Tolerance

FT creates a full sync copy of your virtual machine on two different ESXi hosts to maximize uptime. In other words, in FT you face two separate virtual machines that are exactly mirrored and disaster recovery takes place in milliseconds as the secondary machine takes over rapidly but FT sucks up your CPU and storage as you keep two sync copy of a particular running vm.  

FT instantly moves virtual machines to a new ESXi host via vLockstep, which keeps a mirrored virtual machine synchronized with the primary vm and ready to take over in less than a second. If your company can not withstand few seconds downtime for end users, you should definitely use FT. 

In addition to zero downtime, in FT you don't lose the in-memory application state in the event of a failure. 

Note: When FT is configured for a vm, vCenter Server need not be online for Fault Tolerance to work and failover occurs from main virtual machine to its mirrored virtual machine.


High Availability

In High Availability (HA) you have just one virtual machine managing by an ESXi host. If the ESXi host fails, the 2nd ESXi takes the roll of vm management.

It might take some seconds or minutes for the vm to boot up. Utilization-wise it would be a better solution as the vm has no mirrored copy to consume your resources like CPU, memory, network and storage. 


So in one sentence: An FT environment has no service interruption but a significantly higher cost, while an HA environment has a minimal service interruption.

Labels: , , , , ,