====== File, Network and Revision Control Utilities ======

----

===== File Utilities =====

----

==== Batch processing ====

=== find ===

  * find files/directories/named pipes/etc.
  * See examples [[http://www.devdaily.com/unix/edu/examples/find.shtml|here]].
    * ''-exec'' option allows running a command on the ''find'' results.
      * Not everything is possible with ''find's'' ''-exec'' option.
        * A shell ''[[cs_370_-_text_processing_utilities#nospace|for]]'' loop may be more appropriate instead.

  * ''find'' is sometimes recommended over ''ls'' in [[https://cssegit.monmouth.edu/jchung/csse370repo/-/blob/main/scripts/nospace?ref_type=heads#L10|certain situations]].

  * Quickly ''locate'' a file on a local filesystem.
    * Use ''[[cs_370_-_unix_shells_and_shell_scripting#shell_command_substitution|locate]]''
    * If set up on the system, can find files rapidly using ''locate filename''.

=== xargs ===

  * execute commands on a collection of arguments
  * See examples [[http://wikiless.tiekoetter.com/wiki/Xargs|here]].
  * Also see examples in the ''xargs'' man page.
  * Often used at the back end of a ''find'' command because it's more capable than ''find's'' -exec option.

<code>
# Find all hidden files (files that begin with period ".*"),
# starting in the current dir, and then use xargs to run "ls -l"
# on all the find results.
$ find . -type f -name '.*' | xargs ls -l

# Same output as 
# $ find . -type f -name '.*' -exec ls -l {} \;


# List all users logged in and run finger on each userid
$ w -h | awk '{ print $1 }' | sort | uniq | xargs finger
#
# w -h               - list logged in users and associated info, excluding header (-h)
# awk '{ print $1 }' - print only userid from w output
# sort | uniq        - reduce duplicates
# xargs finger       - run finger on list of users from uniq
</code>

==== Archiving and compression ====

=== tar ===

  * Tape Archive (See [[http://wikiless.tiekoetter.com/wiki/Tar_(file_format)|article]].)
  * Archive file/dirs, preserving file/dir attributes.
    * Usually operate on directories, not files.
  * Common options:
<code>
        c - create archive
        t - view existing archive
        v - operate verbosely
        x - extract archive
        f - create the specified tar file ( or use "-" to send tar'ed
            files to stdout ) 
</code>

  * Examples:
<code>
# Create tar archive (cs370.tar) of your ~/cs370 directory in /tmp:
# Dashes ("-") are optional for tar options.
$ tar cvf /tmp/cs370-${USER}.tar ~/cs370  # "tar -cvf ..." does the same thing

# Change to /tmp
$ cd /tmp

# "tar tvf cs370-${USER}.tar" shows the contents of cs370.tar.
# "tar xvf cs370-${USER}.tar" extracts the contents of cs370.tar to
# the CURRENT directory. (BE CAREFUL.)
</code>

  * Since ''.tar'' archive are not compressed, ''tar'' is often used in combination with a file compressor such as ''gzip'', ''bzip2'' or ''xz''. See examples below.

=== gzip/bzip2/xz ===

  * Compression of single files
    * ''gzip'' is faster; ''bzip2'' compresses more; ''xz'' compresses better than ''bzip2'' and is faster.
    * See [[http://www.debianadmin.com/create-and-extract-bz2-and-gz-files.html|article]] on gzip and bzip2.
    * See [[http://tukaani.org/xz/format.html|information on xz format]].

  * Compress individual files:
<code>
$ cd /tmp

# Copy nano config files to /tmp
$ cp /usr/share/nano/*.nanorc  /tmp

$ gzip *.nanorc    # will result in all .nanorc files in current dir
                   # being compressed and given the .nanorc.gz extension


$ gunzip *.gz      # will uncompress the .nanorc.gz files and
                   # leave files w/ .nanorc extensions


$ bzip2 *.nanorc
$ bunzip2 *.bz2    # same as above, but with bzip2


$ xz *.nanorc
$ unxz *.xz        # same as above, but with xz
</code>        

  * Use "-c" option to gzip, bzip2 and xz send compressed data to stdout

<code>
# Copy large wordlist to /tmp
$ cp /usr/share/dict/words  /tmp

# Compress wordlist to a separate compressed files:
$ gzip -c words > words.gz
$ bzip2 -c words > words.bz2
$ xz -c words > words.xz
# Compare size of compressed file formats.
# Also try zip:
$ zip words.zip words
</code>

  * View compressed text files
    * On most Linux systems, program documentation under /usr/share/doc is usually compressed to save space.

<code>
# View nano documentation
$ cd /usr/share/doc/nano
$ ls

# View a compressed file (NEWS.gz)
$ gunzip -c NEWS.gz | less

or, more simply,

$ zless NEWS.gz

# Cat a compressed file (NEWS.gz)
$ zcat NEWS.gz:
</code>

  * gzip/bzip2/xz are often used in combination with ''tar''.

<code>
# Tar your ~/cs370 dir to tar's stdout (-) and xz it,
# redirecting the result to cs370.tar.xz:
$ tar cvf -  ~/cs370 | xz -c > /tmp/cs370-${USER}.tar.xz

# Change to /tmp
$ cd /tmp

# Do the reverse to view contents of cs370-${USER}.tar.xz:
$ unxz -c cs370-${USER}.tar.xz | tar tvf -
</code>

  * GNU tar (the version most widely used) has command line options that make it much easier to compress tar archives with gzip, bzip2 and xz:

<code>
$ tar cvJf /tmp/cs370-${USER}.tar.xz ~/cs370 
# GNU tar's "J" option forces use of xz to compress the tar archive
#        if "z"        uses          gzip
#        if "j"        uses          bzip2

# to view tar.xz    
$ tar tvJf cs370-${USER}.tar.xz

# to extract tar.xz 
$ cd /tmp; tar xvJf cs370-${USER}.tar.xz
</code>

----
----

===== Network Utilities =====

-----

==== telnet/ftp ====

  * Venerable remote login and file transfer programs
    * contain known security vulnerabilities
  * Should avoid using, especially on older, legacy systems.
  * Telnet sometimes useful for querying network ports for services

  # Check if vnc service running on plato (port 5900)
  # Any response means the service is running
  telnet plato 5900

==== ssh/scp ====

  * Secure Shell and Secure Copy
    * verify ssh setup from [[cs_370_-_introduction_unix_fundamentals#secure_shell_ssh|week 1]]
  * More secure and versatile remote login and file transfer programs
    * See [[http://wikiless.tiekoetter.com/wiki/Secure_Shell|article]] on ssh security mechanisms.
  * ssh is used as both a remote login program and remote command execution method:

  * Warning about running programs through ~/.bashrc and their possible effects on ssh/scp.

<code>
#
# Remote login:
#

# Login remotely to rockhopper.
# Authenticate using either a password or encrypted
# key exchange:
$ ssh <your_userid>@rockhopper

# See verbose output of a ssh login process
$ ssh -v <your_userid>@rockhopper


#
# Remote command execution:
#

# Run the 'uptime' command on plato:
$ ssh plato 'uptime'

# See logins on rockhopper
$ ssh rockhopper 'finger'

# See jchung logins on rockhopper
$ ssh rockhopper 'finger | grep -i chung'

# Same thing, but stdout from rockhopper piped to local grep
$ ssh rockhopper 'finger' | grep -i chung


# Tar your ~/cs370 dir locally, pipe to gzip on rockhopper to
# create rockhopper:/tmp/$USER-cs370.tar.gz:
$ tar cvf -  ~/cs370 | ssh rockhopper "gzip -c > /tmp/$USER-cs370.tar.gz"
</code>

  * Remote file transfers with scp (uses same authentication mechanism as ssh):

<code>
# Create and transfer /tmp/cs370.tar.xz to your home dir on the plato server:
$ tar  cJf  /tmp/cs370.tar.xz  ~/cs370
$ scp  /tmp/cs370.tar.xz  plato:~


# Transfer ~/cs370.tar.xz from plato to local /tmp:
$ scp  plato:~/cs370.tar.xz  /tmp
</code>

==== rsync ====

  * Remote Sync (See [[http://wikiless.tiekoetter.com/wiki/Rsync|article]])
  * More efficient file transfer program that is useful for keeping remote directories synchronized with local ones
    * Rsync algorithm transfers differences between local and remote copies of files, rather than entire files. 
  * Uses ssh authentication by default

<code>
# Transfer entire ~/cs370 dir to a remote machine:/tmp
# rsync command options (similar to cp options)
#   -a archive (recursively copy dirs and preserve all file/dir attributes)
#   -u update  (only transfer files that are newer than destination)
#   -v verbose
# 
# In this rsync command, the source is ~/cs370 and the destination is localhost:/tmp.
$ rsync -auv  ~/cs370  localhost:/tmp

# Run it again.
# Since -u (update) is being used, nothing gets transferred because the source
# and destination are both up-to-date.
$ rsync -auv  ~/cs370  localhost:/tmp

# Update timestamp of ~/cs370/examples dir with 'touch',
# and run rsync again.
$ touch ~/cs370/examples
$ rsync -auv  ~/cs370  localhost:/tmp
</code>

==== wget ====

  * Web Get (See [[http://wikiless.tiekoetter.com/wiki/Wget|article]])
  * non-interactive URL download program
  * Default mode:  download and save html file at specified URL

<code>
$ wget "http://wikiless.tiekoetter.com/wiki/regular_expressions"
	  
# Saves article to file "regular_expressions".
</code>

  * ''-O file_name'' option saves to specified ''file_name''.
    * '' -O -'' sends html to stdout. 

  * The [[https://wikiless.tiekoetter.com/wiki/CURL|cURL]] utility has similar functionality and is simpler.
==== The "-" (STDOUT) convention ====

  * To work well with other programs (see the [[cs_370_-_introduction_unix_fundamentals#the_unix_philosophy_or_style | UNIX philosophy]]), utilities like ''tar'' and ''wget'' allow the use of the "-" (STDOUT) convention.
    * Output that would normally be sent to a file is sent instead to STDOUT with "-".
      * ''tar cvf - ~/cs370'' # (sends tar archive data to STDOUT instead of to a .tar file)
      * ''wget -O - <nowiki>http://monmouth.edu</nowiki>'' # (sends retrieved URL to STDOUT instead of to a file)
    * A third utility we've looked at, ''[[https://cssegit.monmouth.edu/jchung/csse370repo/-/blob/main/scripts/text2png|enscript]]'', also uses the "-" convention.

----
----

===== Diff/Patch =====

-----

==== diff - find differences between two files ====

  * Run the following commands first:

  mkdir -p ~/cs370/examples/revcontrol
  cd ~/cs370/examples/revcontrol
  wget -q http://bit.ly/2zZgGiV -O diffpatch.tar.xz # download diffpatch.tar.xz
  tar xvJf diffpatch.tar.xz                         # extract the diffpatch directory
  ls
  cd diffpatch

  * diff
    * compares 2 files and displays a list of editing changes that would convert the first file into the second file.
      * The 3 kinds of editing changes are ''a''-add lines, ''c''-change lines, and ''d''-delete lines.

<code>
        SYNOPSIS
               diff [options] from-file to-file
</code>

  * Examples:

<code>
# diff input file #1
# saved as seuss1
$ cat seuss1
If a packet hits a pocket on a socket on a port,
and the bus is interrupted at a very last resort,  
and the access of the memory makes your floppy disk abort,
then the socket packet pocket has an error to report.


# diff input file #2
# saved as seuss2
$ cat seuss2
If a pocket hits a rocket on a socket on a port,
and the bus is interrupted at a very last resort,  
and the access of the memory makes your floppy abort,
then the socket packet pocket has an error to report.


# Use diff to show differences between seuss1 and seuss2:
$ diff seuss1 seuss2
1c1
< If a packet hits a pocket on a socket on a port,
---
> If a pocket hits a rocket on a socket on a port,
3c3
< and the access of the memory makes your floppy disk abort,
---
> and the access of the memory makes your floppy abort,


# diff input file #3
# saved as seuss3
$ cat seuss3
If a pocket hits a rocket on a socket on a port,
and the bus is interrupted at a very last resort,  
and the access of the memory makes your floppy abort,
then the socket packet pocket has an error to report.
       
If your cursor finds a menu item followed by a dash,
and the double-clicking icon puts your window in the trash,
and your data is corrupted cause the index doesn't hash,
then your situation's hopeless and your system's gonna crash!


# Use diff to show differences between seuss2 and seuss3:
$ diff seuss2 seuss3
4a5,9
> 
> If your cursor finds a menu item followed by a dash,
> and the double-clicking icon puts your window in the trash,
> and your data is corrupted cause the index doesn't hash,
> then your situation's hopeless and your system's gonna crash!


# diff input file #4
# saved as seuss4
$ cat seuss4
If a packet hits a pocket on a socket on a port,
and the access of the memory makes your floppy disk abort,
and the bus is interrupted at a very last resort,  
then the socket packet pocket has an error to report.


# Use diff to show differences between seuss3 and seuss4:
$ diff seuss3 seuss4
1c1,2
< If a pocket hits a rocket on a socket on a port,
---
> If a packet hits a pocket on a socket on a port,
> and the access of the memory makes your floppy disk abort,
3d3
< and the access of the memory makes your floppy abort,
5,9d4
< 
< If your cursor finds a menu item followed by a dash,
< and the double-clicking icon puts your window in the trash,
< and your data is corrupted cause the index doesn't hash,
< then your situation's hopeless and your system's gonna crash!
</code>

==== patch - apply a diff file to an original ====

<code>
        SYNOPSIS
               patch [options] [originalfile [patchfile]]
</code>

  * Example:

<code>
# Using diff and patch to merge changes
$ diff seuss3 seuss4 > diff34        # Generate diff file


$ cp seuss3 seuss3.orig              # Backup original seuss3


$ patch --verbose seuss3 diff34      # Apply diff34 to seuss3
Hmm...  Looks like a normal diff to me...
Patching file seuss3 using Plan A...
Hunk #1 succeeded at 1.
Hunk #2 succeeded at 4.
Hunk #3 succeeded at 5.
done


$ cat seuss3         # seuss3 is now the same as seuss4
If a packet hits a pocket on a socket on a port,
and the access of the memory makes your floppy disk abort,
and the bus is interrupted at a very last resort,
then the socket packet pocket has an error to report.
</code>

----

===== Revision Control Utilities =====

-----

  * version control systems (VCS)
  * Help to keep track of versions of files.
    * Store the differences between versions of files, rather than entire versions of files.
      * Saves space.
      * UNIX ''diff'' command or equivalent functionality plays a part in defining differences between versions of files.
  * Plays an important role in software development, particularly team development
    * Single user version control systems:  [[http://wikiless.tiekoetter.com/wiki/Revision_Control_System|RCS]]
    * Multi-user version control systems: [[http://wikiless.tiekoetter.com/wiki/Concurrent_Versions_System|CVS]], [[http://wikiless.tiekoetter.com/wiki/Subversion_(software)|Subversion]], [[http://wikiless.tiekoetter.com/wiki/Git_(software)|Git]], [[https://wikiless.tiekoetter.com/wiki/Mercurial|Mercurial]]
      * Centralized: CVS, Subversion
      * Decentralized: Git, Mercurial

----

===== Lab Activities =====

-----

==== 1. **(Do in lab)** Find all files that contain a string or regular expression ====

In ''~/.bashrc'', define a shell function called ''searchfiles'' which uses ''find'' to list all files that contain the string (or regular expression) that you pass in as the first function parameter, ''$1''. Note that we don't want to search file ''names'' but file ''contents'' for a string, and then list the files that match. 

Answer:

<code>
# function searchfiles which uses find to list all files that 
# contain the string (or regular expression)
# that you pass in as the first function parameter, $1.
function searchfiles
{
   find . -type f |                 # list all files recursively starting in . (current dir)
   xargs grep -li "$1" 2> /dev/null # using xargs, make grep list files (-l) in which a match is found
	
   # Can also use command substitution, if not too many find results:
   # grep -li "$1" $(find . -type f) 2> /dev/null
   # 
   # If using GNU grep (most UNIX systems), can use just grep recursively (-r):
   # grep -rli "$1" 2> /dev/null
}
</code>

==== 2. **(Do in class)** Change to directory based on find result ====

In ~/.bashrc, define a shell function called ''findcd'' that changes to a directory based on a find result. If what you're searching for matches a filename, then change to the directory where that file resides. If what you're searching for matches a directory name, then change to that directory.

Example usage:

  # Change to a dir named randomwall or to a dir that contains a file called randomwall
  findcd randomwall 
  
  # Change to a dir named examples  
  findcd examples
  
  # Change to a dir that contains a file called roster
  findcd roster

  * Note: This should be a function and not a shell script because shell scripts run in their own sub-shells.

Answer:

<code>
# function findcd - changes to a directory based on the first hit from a find
function findcd
{
   # "head -n1" chooses first find result;
   # use head instead of tail here, else may have 
   # to wait for find to print many search results;
   # if find finds nothing, $findresult is ""
   findresult=$(find . -iname "*$1*" | head -n1)

   # If $findresult is a file, can't cd to it,
   # so have to trim $findresult to a directory
   if [ -f "$findresult" ]; then
      filename=$(basename "$findresult") # see man basename
      # delete $filename from end (\$) of $findresult
      findresult=$(echo "$findresult" | sed "s/$filename\$//")
   fi

   cd "$findresult" # If $findresult is "", nothing happens.
}
</code>

==== 3. Find maximum directory depth ====

Within your home directory, find the maximum depth of a directory. Your results should include the directory's name.

Note: You'll need to use the ''find'' command's ''-printf'' option. See ''man find''.

Answer:

<code>
# Starting in current directory (.), find directories (-type d), 
# print the depth of each directory found (-printf "%d "),
# print the path of each directory found (-print),
# do a descending, numeric sort (-rn), show only the first result (head -n1)
find . -type d -printf '%d ' -print | sort -rn | head -n1

or

find . -type d -printf '%d ' -exec ls -ld {} ';' | sort -rn | head -n1
</code>

==== 4. **(Do in class)** Create and use a git repository with gitlab ====

  * Login to gitlab at [[http://cssegit.monmouth.edu|cssegit.monmouth.edu]].

  * Set up SSH login to gitlab. You should have created a SSH private/public key pair in [[cs_370_-_introduction_unix_fundamentals#secure_shell_ssh|Week 1]]). 
    * The [[https://cssegit.monmouth.edu/jchung/csse370repo/-/blob/main/scripts/ssh_setup.sh|ssh_setup.sh]] script can be used to check your SSH keys setup.
    * **NOTE:** Add a SSH key to your gitlab profile before creating any new projects on gitlab.

  * Back up your course directory (''~/cs370'' or ''~/se370'') using ''cp'' or ''rsync'':

  # Using cp
  cp  -av  ~/cs370  ~/cs370-$(date +%m%d%y)

  # or using rsync
  rsync -av  ~/cs370  ~/cs370-$(date +%m%d%y)

  * Create a new repository (project) on gitlab.
    * **DO NOT** include a README when creating the project.
    * Make it a private project.
    * Follow the instructions on gitlab under ''"Push an existing folder"'' to git-initialize your UNIX account course directory and push the contents to gitlab.

  * Add user jchung as a member of your gitlab project (member type: Reporter).

  * If ''git'' commandline retrieval and push operations require a userid and password to be entered even though you added your SSH public key to your gitlab profile, then see [[https://www.reddit.com/r/git/comments/nmpytz/comment/gzq5o5x/|this possible solution (reddit)]].

----