Sed is dumping the entire fileExtract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file
Can I cause damage to electrical appliances by unplugging them when they are turned on?
What (the heck) is a Super Worm Equinox Moon?
Did the UK lift the requirement for registering SIM cards?
Is there a RAID 0 Equivalent for RAM?
Has the laser at Magurele, Romania reached a tenth of the Sun's power?
Does the reader need to like the PoV character?
Giving feedback to someone without sounding prejudiced
What fields between the rationals and the reals allow a good notion of 2D distance?
Multiplicative persistence
How to convince somebody that he is fit for something else, but not this job?
Why is the Sun approximated as a black body at ~ 5800 K?
Doesn't the system of the Supreme Court oppose justice?
Merge org tables
A Trivial Diagnosis
Can you use Vicious Mockery to win an argument or gain favours?
Pre-mixing cryogenic fuels and using only one fuel tank
Has any country ever had 2 former presidents in jail simultaneously?
Why is so much work done on numerical verification of the Riemann Hypothesis?
How can I write humor as character trait?
Make a Bowl of Alphabet Soup
Is there a nicer/politer/more positive alternative for "negates"?
"before" and "want" for the same systemd service?
Why is it that I can sometimes guess the next note?
How to preserve electronics (computers, iPads and phones) for hundreds of years
Sed is dumping the entire file
Extract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file
I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:
sed -e 's|(href)|1|' index.html
Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.
The same happens if I remove -e or if I add g at the end.
It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?
command-line sed
add a comment |
I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:
sed -e 's|(href)|1|' index.html
Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.
The same happens if I remove -e or if I add g at the end.
It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?
command-line sed
1
For what you are looking for I supposegrepis the command to go with.
– Ravexina
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@zx485 Changing to/(or,) does not change the behaviour
– Brydon Gibson
2 hours ago
1
Usegrep -oso grep prints only the matched (non-empty) parts of a matching line.
– Ravexina
2 hours ago
s|(href)|1|captures the stringhref, and simply replaces it with itself - leaving everything unchanged
– steeldriver
2 hours ago
add a comment |
I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:
sed -e 's|(href)|1|' index.html
Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.
The same happens if I remove -e or if I add g at the end.
It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?
command-line sed
I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:
sed -e 's|(href)|1|' index.html
Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.
The same happens if I remove -e or if I add g at the end.
It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?
command-line sed
command-line sed
asked 2 hours ago
Brydon GibsonBrydon Gibson
1284
1284
1
For what you are looking for I supposegrepis the command to go with.
– Ravexina
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@zx485 Changing to/(or,) does not change the behaviour
– Brydon Gibson
2 hours ago
1
Usegrep -oso grep prints only the matched (non-empty) parts of a matching line.
– Ravexina
2 hours ago
s|(href)|1|captures the stringhref, and simply replaces it with itself - leaving everything unchanged
– steeldriver
2 hours ago
add a comment |
1
For what you are looking for I supposegrepis the command to go with.
– Ravexina
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@zx485 Changing to/(or,) does not change the behaviour
– Brydon Gibson
2 hours ago
1
Usegrep -oso grep prints only the matched (non-empty) parts of a matching line.
– Ravexina
2 hours ago
s|(href)|1|captures the stringhref, and simply replaces it with itself - leaving everything unchanged
– steeldriver
2 hours ago
1
1
For what you are looking for I suppose
grep is the command to go with.– Ravexina
2 hours ago
For what you are looking for I suppose
grep is the command to go with.– Ravexina
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@zx485 Changing to
/ (or ,) does not change the behaviour– Brydon Gibson
2 hours ago
@zx485 Changing to
/ (or ,) does not change the behaviour– Brydon Gibson
2 hours ago
1
1
Use
grep -o so grep prints only the matched (non-empty) parts of a matching line.– Ravexina
2 hours ago
Use
grep -o so grep prints only the matched (non-empty) parts of a matching line.– Ravexina
2 hours ago
s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged– steeldriver
2 hours ago
s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged– steeldriver
2 hours ago
add a comment |
3 Answers
3
active
oldest
votes
This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.
grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='
The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".
I am actually looking for what's after the href, so I'm looking forhref="[I want this content]"
– Brydon Gibson
2 hours ago
add a comment |
you should use grep to find text in a filesed is better for text substitutions
If you want to list the hypertext links, you can simply grep the file like this :
grep -Po '(?<=href=")[^"]*' index.html
add a comment |
That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:
sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
Where:
/is used as delimiter in this case (you can use|or#, etc.).The option
-n(--quiet,--silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.
The option
-renables the extended regular expressions. Without this option our command can be:sed -n 's/^.*href="(http.*)".*$/1/p' index.htmlThe command
smeans substitute:#<string-or-regexp>#<replacement>#.^will match to the beginning of the line.$will match to the end of the line.within the the , the capture group
(http.*), will be treated as the variable1.
Example of usage:
$ cat index.html
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>
$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
https://www.w3schools.com
More examples:
- Extract the content from a file between two match patterns (Extract only HTML from a file)
- Awk commands find and replace string and print every thing after key word
- Converting numbers in a CSV file to their corresponding URLs
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127646%2fsed-is-dumping-the-entire-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.
grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='
The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".
I am actually looking for what's after the href, so I'm looking forhref="[I want this content]"
– Brydon Gibson
2 hours ago
add a comment |
This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.
grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='
The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".
I am actually looking for what's after the href, so I'm looking forhref="[I want this content]"
– Brydon Gibson
2 hours ago
add a comment |
This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.
grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='
The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".
This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.
grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='
The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".
answered 2 hours ago
S. NixonS. Nixon
233
233
I am actually looking for what's after the href, so I'm looking forhref="[I want this content]"
– Brydon Gibson
2 hours ago
add a comment |
I am actually looking for what's after the href, so I'm looking forhref="[I want this content]"
– Brydon Gibson
2 hours ago
I am actually looking for what's after the href, so I'm looking for
href="[I want this content]"– Brydon Gibson
2 hours ago
I am actually looking for what's after the href, so I'm looking for
href="[I want this content]"– Brydon Gibson
2 hours ago
add a comment |
you should use grep to find text in a filesed is better for text substitutions
If you want to list the hypertext links, you can simply grep the file like this :
grep -Po '(?<=href=")[^"]*' index.html
add a comment |
you should use grep to find text in a filesed is better for text substitutions
If you want to list the hypertext links, you can simply grep the file like this :
grep -Po '(?<=href=")[^"]*' index.html
add a comment |
you should use grep to find text in a filesed is better for text substitutions
If you want to list the hypertext links, you can simply grep the file like this :
grep -Po '(?<=href=")[^"]*' index.html
you should use grep to find text in a filesed is better for text substitutions
If you want to list the hypertext links, you can simply grep the file like this :
grep -Po '(?<=href=")[^"]*' index.html
answered 2 hours ago
cmak.frcmak.fr
2,2391121
2,2391121
add a comment |
add a comment |
That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:
sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
Where:
/is used as delimiter in this case (you can use|or#, etc.).The option
-n(--quiet,--silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.
The option
-renables the extended regular expressions. Without this option our command can be:sed -n 's/^.*href="(http.*)".*$/1/p' index.htmlThe command
smeans substitute:#<string-or-regexp>#<replacement>#.^will match to the beginning of the line.$will match to the end of the line.within the the , the capture group
(http.*), will be treated as the variable1.
Example of usage:
$ cat index.html
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>
$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
https://www.w3schools.com
More examples:
- Extract the content from a file between two match patterns (Extract only HTML from a file)
- Awk commands find and replace string and print every thing after key word
- Converting numbers in a CSV file to their corresponding URLs
add a comment |
That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:
sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
Where:
/is used as delimiter in this case (you can use|or#, etc.).The option
-n(--quiet,--silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.
The option
-renables the extended regular expressions. Without this option our command can be:sed -n 's/^.*href="(http.*)".*$/1/p' index.htmlThe command
smeans substitute:#<string-or-regexp>#<replacement>#.^will match to the beginning of the line.$will match to the end of the line.within the the , the capture group
(http.*), will be treated as the variable1.
Example of usage:
$ cat index.html
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>
$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
https://www.w3schools.com
More examples:
- Extract the content from a file between two match patterns (Extract only HTML from a file)
- Awk commands find and replace string and print every thing after key word
- Converting numbers in a CSV file to their corresponding URLs
add a comment |
That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:
sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
Where:
/is used as delimiter in this case (you can use|or#, etc.).The option
-n(--quiet,--silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.
The option
-renables the extended regular expressions. Without this option our command can be:sed -n 's/^.*href="(http.*)".*$/1/p' index.htmlThe command
smeans substitute:#<string-or-regexp>#<replacement>#.^will match to the beginning of the line.$will match to the end of the line.within the the , the capture group
(http.*), will be treated as the variable1.
Example of usage:
$ cat index.html
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>
$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
https://www.w3schools.com
More examples:
- Extract the content from a file between two match patterns (Extract only HTML from a file)
- Awk commands find and replace string and print every thing after key word
- Converting numbers in a CSV file to their corresponding URLs
That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:
sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
Where:
/is used as delimiter in this case (you can use|or#, etc.).The option
-n(--quiet,--silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.
The option
-renables the extended regular expressions. Without this option our command can be:sed -n 's/^.*href="(http.*)".*$/1/p' index.htmlThe command
smeans substitute:#<string-or-regexp>#<replacement>#.^will match to the beginning of the line.$will match to the end of the line.within the the , the capture group
(http.*), will be treated as the variable1.
Example of usage:
$ cat index.html
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>
$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html
https://www.w3schools.com
More examples:
- Extract the content from a file between two match patterns (Extract only HTML from a file)
- Awk commands find and replace string and print every thing after key word
- Converting numbers in a CSV file to their corresponding URLs
edited 1 hour ago
answered 2 hours ago
pa4080pa4080
14.6k52872
14.6k52872
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127646%2fsed-is-dumping-the-entire-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
For what you are looking for I suppose
grepis the command to go with.– Ravexina
2 hours ago
@Ravexina grep prints the entire line, I am looking for a small portion of a line.
– Brydon Gibson
2 hours ago
@zx485 Changing to
/(or,) does not change the behaviour– Brydon Gibson
2 hours ago
1
Use
grep -oso grep prints only the matched (non-empty) parts of a matching line.– Ravexina
2 hours ago
s|(href)|1|captures the stringhref, and simply replaces it with itself - leaving everything unchanged– steeldriver
2 hours ago