Sed is dumping the entire fileExtract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file

Can I cause damage to electrical appliances by unplugging them when they are turned on?

What (the heck) is a Super Worm Equinox Moon?

Did the UK lift the requirement for registering SIM cards?

Is there a RAID 0 Equivalent for RAM?

Has the laser at Magurele, Romania reached a tenth of the Sun's power?

Does the reader need to like the PoV character?

Giving feedback to someone without sounding prejudiced

What fields between the rationals and the reals allow a good notion of 2D distance?

Multiplicative persistence

How to convince somebody that he is fit for something else, but not this job?

Why is the Sun approximated as a black body at ~ 5800 K?

Doesn't the system of the Supreme Court oppose justice?

Merge org tables

A Trivial Diagnosis

Can you use Vicious Mockery to win an argument or gain favours?

Pre-mixing cryogenic fuels and using only one fuel tank

Has any country ever had 2 former presidents in jail simultaneously?

Why is so much work done on numerical verification of the Riemann Hypothesis?

How can I write humor as character trait?

Make a Bowl of Alphabet Soup

Is there a nicer/politer/more positive alternative for "negates"?

"before" and "want" for the same systemd service?

Why is it that I can sometimes guess the next note?

How to preserve electronics (computers, iPads and phones) for hundreds of years

Sed is dumping the entire file

Extract the content from a file between two match patterns (Extract only HTML from a file)Converting numbers in a CSV file to their corresponding URLsAwk commands find and replace string and print every thing after key wordCan I save the command output to a file other than text-files?Detecting the fault in my sed commandusing sed to remove equals sign from a fileSed replace specific line in fileHow can I create a CSV file from a directory listing with multiple columns based on the file names?Error while using sed fileremoving lines by sedstrange behavior with sed and text files on UbuntuHow do I make a keyboard shortcut for code which uses xsel?Sed script crashing on big file

I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:

sed -e 's|(href)|1|' index.html

Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.

The same happens if I remove -e or if I add g at the end.

It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?

asked 2 hours ago

Brydon Gibson

1284

1

For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago

@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago

@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago

1

Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago

s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago

add a comment |

I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:

sed -e 's|(href)|1|' index.html

Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.

The same happens if I remove -e or if I add g at the end.

It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?

asked 2 hours ago

Brydon Gibson

1284

1

For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago

@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago

@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago

1

Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago

s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago

add a comment |

I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:

sed -e 's|(href)|1|' index.html

Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.

The same happens if I remove -e or if I add g at the end.

It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?

asked 2 hours ago

Brydon Gibson

1284

I'm trying to parse contents of an HTML file to scrape a download directory, however I've modified it to a MWE that reproduces my issue:

sed -e 's|(href)|1|' index.html

Prints the entirety of index.html. I was originally thinking that it was an issue with my expression, but this very basic expression proves that wrong.

The same happens if I remove -e or if I add g at the end.

It's been a while since I've done sed, am I doing something wrong here? Is sed getting confused with the characters in an html file?

command-line sed

asked 2 hours ago

Brydon Gibson

1284

asked 2 hours ago

Brydon Gibson

1284

asked 2 hours ago

Brydon Gibson

1284

asked 2 hours ago

Brydon Gibson

1284

asked 2 hours ago

Brydon Gibson

1284

1

For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago

@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago

@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago

1

Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago

s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago

add a comment |

1

For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago

@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago

@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago

1

Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago

s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago

For what you are looking for I suppose grep is the command to go with.

– Ravexina
2 hours ago

@Ravexina grep prints the entire line, I am looking for a small portion of a line.

– Brydon Gibson
2 hours ago

@zx485 Changing to / (or ,) does not change the behaviour

– Brydon Gibson
2 hours ago

Use grep -o so grep prints only the matched (non-empty) parts of a matching line.

– Ravexina
2 hours ago

s|(href)|1| captures the string href, and simply replaces it with itself - leaving everything unchanged

– steeldriver
2 hours ago

add a comment |

3 Answers
3

active

oldest

votes

This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.

grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='

The first grep singles out only lines that contain the href. The tr converts spaces to newlines. The second grep grabs just the href section you were interested in. Finally, the cut grabs everything after the "href=".

answered 2 hours ago

S. Nixon

233

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

add a comment |

you should use grep to find text in a file
sed is better for text substitutions

If you want to list the hypertext links, you can simply grep the file like this :

grep -Po '(?<=href=")[^"]*' index.html

answered 2 hours ago

cmak.fr

2,2391121

add a comment |

That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:

sed -nr 's/^.*href="(http.*)".*$/1/p' index.html

Where:

/ is used as delimiter in this case (you can use | or #, etc.).

The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.

This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.

The option -r enables the extended regular expressions. Without this option our command can be:
```
sed -n 's/^.*href="(http.*)".*$/1/p' index.html
```

The command s means substitute: #<string-or-regexp>#<replacement>#.

^ will match to the beginning of the line. $ will match to the end of the line.

within the the , the capture group (http.*), will be treated as the variable 1.

Example of usage:

$ cat index.html 
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
 <h1>My First Heading</h1>
 <p>My first paragraph.</p>
 <a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>

$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 
https://www.w3schools.com

More examples:

Extract the content from a file between two match patterns (Extract only HTML from a file)

Awk commands find and replace string and print every thing after key word

Converting numbers in a CSV file to their corresponding URLs

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1127646%2fsed-is-dumping-the-entire-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.

grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='

answered 2 hours ago

S. Nixon

233

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

add a comment |

This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.

grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='

answered 2 hours ago

S. Nixon

233

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

add a comment |

This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.

grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='

answered 2 hours ago

S. Nixon

233

This may be overly cumbersome, but I think it would work for you, as long as your href contents contains no spaces.

grep "href" index.html |tr ' ' 'n'|grep "^href" |cut -f2 -d'='

answered 2 hours ago

S. Nixon

233

answered 2 hours ago

S. Nixon

233

answered 2 hours ago

S. Nixon

233

answered 2 hours ago

S. Nixon

233

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

add a comment |

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

I am actually looking for what's after the href, so I'm looking for href="[I want this content]"

– Brydon Gibson
2 hours ago

add a comment |

you should use grep to find text in a file
sed is better for text substitutions

If you want to list the hypertext links, you can simply grep the file like this :

grep -Po '(?<=href=")[^"]*' index.html

answered 2 hours ago

cmak.fr

2,2391121

add a comment |

you should use grep to find text in a file
sed is better for text substitutions

If you want to list the hypertext links, you can simply grep the file like this :

grep -Po '(?<=href=")[^"]*' index.html

answered 2 hours ago

cmak.fr

2,2391121

add a comment |

you should use grep to find text in a file
sed is better for text substitutions

If you want to list the hypertext links, you can simply grep the file like this :

grep -Po '(?<=href=")[^"]*' index.html

answered 2 hours ago

cmak.fr

2,2391121

you should use grep to find text in a file
sed is better for text substitutions

If you want to list the hypertext links, you can simply grep the file like this :

grep -Po '(?<=href=")[^"]*' index.html

answered 2 hours ago

cmak.fr

2,2391121

answered 2 hours ago

cmak.fr

2,2391121

answered 2 hours ago

cmak.fr

2,2391121

answered 2 hours ago

cmak.fr

2,2391121

add a comment |

That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:

sed -nr 's/^.*href="(http.*)".*$/1/p' index.html

Where:

/ is used as delimiter in this case (you can use | or #, etc.).

The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.

This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.

The option -r enables the extended regular expressions. Without this option our command can be:
```
sed -n 's/^.*href="(http.*)".*$/1/p' index.html
```

The command s means substitute: #<string-or-regexp>#<replacement>#.

^ will match to the beginning of the line. $ will match to the end of the line.

within the the , the capture group (http.*), will be treated as the variable 1.

Example of usage:

$ cat index.html 
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
 <h1>My First Heading</h1>
 <p>My first paragraph.</p>
 <a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>

$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 
https://www.w3schools.com

More examples:

Extract the content from a file between two match patterns (Extract only HTML from a file)

Awk commands find and replace string and print every thing after key word

Converting numbers in a CSV file to their corresponding URLs

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

add a comment |

That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:

sed -nr 's/^.*href="(http.*)".*$/1/p' index.html

Where:

/ is used as delimiter in this case (you can use | or #, etc.).

The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.

This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.

The option -r enables the extended regular expressions. Without this option our command can be:
```
sed -n 's/^.*href="(http.*)".*$/1/p' index.html
```

The command s means substitute: #<string-or-regexp>#<replacement>#.

^ will match to the beginning of the line. $ will match to the end of the line.

within the the , the capture group (http.*), will be treated as the variable 1.

Example of usage:

$ cat index.html 
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
 <h1>My First Heading</h1>
 <p>My first paragraph.</p>
 <a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>

$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 
https://www.w3schools.com

More examples:

Extract the content from a file between two match patterns (Extract only HTML from a file)

Awk commands find and replace string and print every thing after key word

Converting numbers in a CSV file to their corresponding URLs

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

add a comment |

That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:

sed -nr 's/^.*href="(http.*)".*$/1/p' index.html

Where:

/ is used as delimiter in this case (you can use | or #, etc.).

The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.

This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.

The option -r enables the extended regular expressions. Without this option our command can be:
```
sed -n 's/^.*href="(http.*)".*$/1/p' index.html
```

The command s means substitute: #<string-or-regexp>#<replacement>#.

^ will match to the beginning of the line. $ will match to the end of the line.

within the the , the capture group (http.*), will be treated as the variable 1.

Example of usage:

$ cat index.html 
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
 <h1>My First Heading</h1>
 <p>My first paragraph.</p>
 <a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>

$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 
https://www.w3schools.com

More examples:

Extract the content from a file between two match patterns (Extract only HTML from a file)

Awk commands find and replace string and print every thing after key word

Converting numbers in a CSV file to their corresponding URLs

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

That you've explaned sounds as the normal behaviour of sed used with the command substitution. I suppose you are looking for something like this:

sed -nr 's/^.*href="(http.*)".*$/1/p' index.html

Where:

/ is used as delimiter in this case (you can use | or #, etc.).

The option -n (--quiet, --silent) suppress automatic printing of pattern space, and along with this option we should use some additional command(s) to tell sed what to print.

This additional command is the print command p, added to the end of the script. If sed wasn't started with an -n option, the p command will duplicate the input.

The option -r enables the extended regular expressions. Without this option our command can be:
```
sed -n 's/^.*href="(http.*)".*$/1/p' index.html
```

The command s means substitute: #<string-or-regexp>#<replacement>#.

^ will match to the beginning of the line. $ will match to the end of the line.

within the the , the capture group (http.*), will be treated as the variable 1.

Example of usage:

$ cat index.html 
<!DOCTYPE html>
<html><head><title>Page Title</title></head><body>
 <h1>My First Heading</h1>
 <p>My first paragraph.</p>
 <a href="https://www.w3schools.com">Visit W3Schools</a>
</body></html>

$ sed -nr 's/^.*href="(http.*)".*$/1/p' index.html 
https://www.w3schools.com

More examples:

Extract the content from a file between two match patterns (Extract only HTML from a file)

Awk commands find and replace string and print every thing after key word

Converting numbers in a CSV file to their corresponding URLs

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

edited 1 hour ago

answered 2 hours ago

pa4080

14.6k52872

answered 2 hours ago

pa4080

14.6k52872

answered 2 hours ago

pa4080

14.6k52872

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttgjyuk

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3