Sed is dumping the entire fileExtract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file

Can I cause damage to electrical appliances by unplugging them when they are turned on?

What (the heck) is a Super Worm Equinox Moon?

Did the UK lift the requirement for registering SIM cards?

Is there a RAID 0 Equivalent for RAM?

Has the laser at Magurele, Romania reached a tenth of the Sun's power?

Does the reader need to like the PoV character?

Giving feedback to someone without sounding prejudiced

What fields between the rationals and the reals allow a good notion of 2D distance?

Multiplicative persistence

How to convince somebody that he is fit for something else, but not this job?

Why is the Sun approximated as a black body at ~ 5800 K?

Doesn't the system of the Supreme Court oppose justice?

Merge org tables

A Trivial Diagnosis

Can you use Vicious Mockery to win an argument or gain favours?

Pre-mixing cryogenic fuels and using only one fuel tank

Has any country ever had 2 former presidents in jail simultaneously?

Why is so much work done on numerical verification of the Riemann Hypothesis?

How can I write humor as character trait?

Make a Bowl of Alphabet Soup

Is there a nicer/politer/more positive alternative for "negates"?

"before" and "want" for the same systemd service?

Why is it that I can sometimes guess the next note?

How to preserve electronics (computers, iPads and phones) for hundreds of years



Sed is dumping the entire file


Extract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file













0















I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:



sed -e 's|(href)|1|' index.html


Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.



The same happens if I remove -e or if I add g at the end.



It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?










share|improve this question

















  • 1





    For what you are looking for I suppose grep is the command to go with.

    – Ravexina
    2 hours ago












  • @Ravexina grep prints the entire line, I am looking for a small portion of a line.

    – Brydon Gibson
    2 hours ago











  • @zx485 Changing to / (or ,) does not change the behaviour

    – Brydon Gibson
    2 hours ago






  • 1





    Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

    – Ravexina
    2 hours ago












  • s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

    – steeldriver
    2 hours ago















0















I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:



sed -e 's|(href)|1|' index.html


Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.



The same happens if I remove -e or if I add g at the end.



It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?










share|improve this question

















  • 1





    For what you are looking for I suppose grep is the command to go with.

    – Ravexina
    2 hours ago












  • @Ravexina grep prints the entire line, I am looking for a small portion of a line.

    – Brydon Gibson
    2 hours ago











  • @zx485 Changing to / (or ,) does not change the behaviour

    – Brydon Gibson
    2 hours ago






  • 1





    Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

    – Ravexina
    2 hours ago












  • s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

    – steeldriver
    2 hours ago













0












0








0








I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:



sed -e 's|(href)|1|' index.html


Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.



The same happens if I remove -e or if I add g at the end.



It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?










share|improve this question














I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:



sed -e 's|(href)|1|' index.html


Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.



The same happens if I remove -e or if I add g at the end.



It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?







command-line sed






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 2 hours ago









Brydon GibsonBrydon Gibson

1284




1284







  • 1





    For what you are looking for I suppose grep is the command to go with.

    – Ravexina
    2 hours ago












  • @Ravexina grep prints the entire line, I am looking for a small portion of a line.

    – Brydon Gibson
    2 hours ago











  • @zx485 Changing to / (or ,) does not change the behaviour

    – Brydon Gibson
    2 hours ago






  • 1





    Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

    – Ravexina
    2 hours ago












  • s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

    – steeldriver
    2 hours ago












  • 1





    For what you are looking for I suppose grep is the command to go with.

    – Ravexina
    2 hours ago












  • @Ravexina grep prints the entire line, I am looking for a small portion of a line.

    – Brydon Gibson
    2 hours ago











  • @zx485 Changing to / (or ,) does not change the behaviour

    – Brydon Gibson
    2 hours ago






  • 1





    Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

    – Ravexina
    2 hours ago












  • s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

    – steeldriver
    2 hours ago







1




1





For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago






For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago














@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago





@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago













@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago





@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago




1




1





Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago






Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago














s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago





s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago










3 Answers
3






active

oldest

votes


















0














This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.



grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='


The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".






share|improve this answer























  • I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

    – Brydon Gibson
    2 hours ago


















0














you should use grep to find text in a file
sed is better for text substitutions



If you want to list the hypertext links, you can simply grep the file like this :



grep -Po '(?<=href=")[^"]*' index.html





share|improve this answer






























    0














    That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:





    sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 


    Where:



    • / is used as delimiter in this case (you can use | or #, etc.).


    • The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.


    • This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.



    • The option -r enables the extended regular expressions. Without this option our command can be:



      sed -n 's/^.*href="(http.*)".*$/1/p' index.html


    • The command s means substitute: #<string-or-regexp>#<replacement>#.


    • ^ will match to the beginning of the line. $ will match to the end of the line.


    • within the the , the capture group (http.*), will be treated as the variable 1.


    Example of usage:



    $ cat index.html 
    <!DOCTYPE html>
    <html><head><title>Page Title</title></head><body>
    <h1>My First Heading</h1>
    <p>My first paragraph.</p>
    <a href="https://www.w3schools.com">Visit W3Schools</a>
    </body></html>

    $ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
    https://www.w3schools.com


    More examples:



    • Extract the content from a file between two match patterns (Extract only HTML from a file)

    • Awk commands find and replace string and print every thing after key word

    • Converting numbers in a CSV file to their corresponding URLs





    share|improve this answer
























      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "89"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127646%2fsed-is-dumping-the-entire-file%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.



      grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='


      The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".






      share|improve this answer























      • I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

        – Brydon Gibson
        2 hours ago















      0














      This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.



      grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='


      The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".






      share|improve this answer























      • I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

        – Brydon Gibson
        2 hours ago













      0












      0








      0







      This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.



      grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='


      The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".






      share|improve this answer













      This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.



      grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='


      The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 2 hours ago









      S. NixonS. Nixon

      233




      233












      • I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

        – Brydon Gibson
        2 hours ago

















      • I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

        – Brydon Gibson
        2 hours ago
















      I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

      – Brydon Gibson
      2 hours ago





      I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

      – Brydon Gibson
      2 hours ago













      0














      you should use grep to find text in a file
      sed is better for text substitutions



      If you want to list the hypertext links, you can simply grep the file like this :



      grep -Po '(?<=href=")[^"]*' index.html





      share|improve this answer



























        0














        you should use grep to find text in a file
        sed is better for text substitutions



        If you want to list the hypertext links, you can simply grep the file like this :



        grep -Po '(?<=href=")[^"]*' index.html





        share|improve this answer

























          0












          0








          0







          you should use grep to find text in a file
          sed is better for text substitutions



          If you want to list the hypertext links, you can simply grep the file like this :



          grep -Po '(?<=href=")[^"]*' index.html





          share|improve this answer













          you should use grep to find text in a file
          sed is better for text substitutions



          If you want to list the hypertext links, you can simply grep the file like this :



          grep -Po '(?<=href=")[^"]*' index.html






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          cmak.frcmak.fr

          2,2391121




          2,2391121





















              0














              That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:





              sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 


              Where:



              • / is used as delimiter in this case (you can use | or #, etc.).


              • The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.


              • This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.



              • The option -r enables the extended regular expressions. Without this option our command can be:



                sed -n 's/^.*href="(http.*)".*$/1/p' index.html


              • The command s means substitute: #<string-or-regexp>#<replacement>#.


              • ^ will match to the beginning of the line. $ will match to the end of the line.


              • within the the , the capture group (http.*), will be treated as the variable 1.


              Example of usage:



              $ cat index.html 
              <!DOCTYPE html>
              <html><head><title>Page Title</title></head><body>
              <h1>My First Heading</h1>
              <p>My first paragraph.</p>
              <a href="https://www.w3schools.com">Visit W3Schools</a>
              </body></html>

              $ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
              https://www.w3schools.com


              More examples:



              • Extract the content from a file between two match patterns (Extract only HTML from a file)

              • Awk commands find and replace string and print every thing after key word

              • Converting numbers in a CSV file to their corresponding URLs





              share|improve this answer





























                0














                That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:





                sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 


                Where:



                • / is used as delimiter in this case (you can use | or #, etc.).


                • The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.


                • This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.



                • The option -r enables the extended regular expressions. Without this option our command can be:



                  sed -n 's/^.*href="(http.*)".*$/1/p' index.html


                • The command s means substitute: #<string-or-regexp>#<replacement>#.


                • ^ will match to the beginning of the line. $ will match to the end of the line.


                • within the the , the capture group (http.*), will be treated as the variable 1.


                Example of usage:



                $ cat index.html 
                <!DOCTYPE html>
                <html><head><title>Page Title</title></head><body>
                <h1>My First Heading</h1>
                <p>My first paragraph.</p>
                <a href="https://www.w3schools.com">Visit W3Schools</a>
                </body></html>

                $ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
                https://www.w3schools.com


                More examples:



                • Extract the content from a file between two match patterns (Extract only HTML from a file)

                • Awk commands find and replace string and print every thing after key word

                • Converting numbers in a CSV file to their corresponding URLs





                share|improve this answer



























                  0












                  0








                  0







                  That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:





                  sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 


                  Where:



                  • / is used as delimiter in this case (you can use | or #, etc.).


                  • The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.


                  • This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.



                  • The option -r enables the extended regular expressions. Without this option our command can be:



                    sed -n 's/^.*href="(http.*)".*$/1/p' index.html


                  • The command s means substitute: #<string-or-regexp>#<replacement>#.


                  • ^ will match to the beginning of the line. $ will match to the end of the line.


                  • within the the , the capture group (http.*), will be treated as the variable 1.


                  Example of usage:



                  $ cat index.html 
                  <!DOCTYPE html>
                  <html><head><title>Page Title</title></head><body>
                  <h1>My First Heading</h1>
                  <p>My first paragraph.</p>
                  <a href="https://www.w3schools.com">Visit W3Schools</a>
                  </body></html>

                  $ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
                  https://www.w3schools.com


                  More examples:



                  • Extract the content from a file between two match patterns (Extract only HTML from a file)

                  • Awk commands find and replace string and print every thing after key word

                  • Converting numbers in a CSV file to their corresponding URLs





                  share|improve this answer















                  That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:





                  sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 


                  Where:



                  • / is used as delimiter in this case (you can use | or #, etc.).


                  • The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.


                  • This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.



                  • The option -r enables the extended regular expressions. Without this option our command can be:



                    sed -n 's/^.*href="(http.*)".*$/1/p' index.html


                  • The command s means substitute: #<string-or-regexp>#<replacement>#.


                  • ^ will match to the beginning of the line. $ will match to the end of the line.


                  • within the the , the capture group (http.*), will be treated as the variable 1.


                  Example of usage:



                  $ cat index.html 
                  <!DOCTYPE html>
                  <html><head><title>Page Title</title></head><body>
                  <h1>My First Heading</h1>
                  <p>My first paragraph.</p>
                  <a href="https://www.w3schools.com">Visit W3Schools</a>
                  </body></html>

                  $ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
                  https://www.w3schools.com


                  More examples:



                  • Extract the content from a file between two match patterns (Extract only HTML from a file)

                  • Awk commands find and replace string and print every thing after key word

                  • Converting numbers in a CSV file to their corresponding URLs






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 1 hour ago

























                  answered 2 hours ago









                  pa4080pa4080

                  14.6k52872




                  14.6k52872



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Ask Ubuntu!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127646%2fsed-is-dumping-the-entire-file%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Are there any comparative studies done between Ashtavakra Gita and Buddhim?How is it wrong to believe that a self exists, or that it doesn't?Can you criticise or improve Ven. Bodhi's description of MahayanaWas the doctrine of 'Anatta', accepted as doctrine by modern Buddhism, actually taught by the Buddha?Relationship between Buddhism, Hinduism and Yoga?Comparison of Nirvana, Tao and Brahman/AtmaIs there a distinction between “ego identity” and “craving/hating”?Are there many differences between Taoism and Buddhism?Loss of “faith” in buddhismSimilarity between creation in Abrahamic religions and beginning of life in Earth mentioned Agganna Sutta?Are there studies about the difference between meditating in the morning versus in the evening?Can one follow Hinduism and Buddhism at the same time?Are there any prohibitions on participating in other religion's practices?Psychology of 'flow'

                      fallocate: fallocate failed: Text file busy in Ubuntu 17.04? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)defragmenting and increasing performance of old lubuntu system with swap partitionIssue with increasing the root partition from the swapthis /usr/bin/dpkg returned error || ubuntu-16.04, 64bitDefault 17.04 swap file locationHow to Resize Ubuntu 17.04 Zesty Swap file size?Ubuntu freezes from online formsMy Laptop is not starting after upgrade ubuntu 16.04 (Kernel 4.8.0-38 to 04.10.0-36)hcp: ERROR: FALLOCATE FAILED!Not sure my swap is being usedWine 3.0 asking for more virtual free swap

                      Where else does the Shulchan Aruch quote an authority by name?Parashat Metzora+HagadolPesach/PassoverShulchan Aruch UTF-8Anonymous glosses in the Shulchan AruchWhy is the Shulchan Aruch definitive?Siman 32, Kitzur Shulchan Aruch: UntranslatedLitvaks/Yeshivish and Shulchan AruchBuying a Shulchan AruchEnglish version of SHULCHAN ARUCHIs there any place where Shulchan Aruch rules with the Rosh against the Rif and Rambam?Are there practices where Sepharadim do not hold by Shulchan Aruch?5th part of the shulchan aruch