Print everything between two patterns, then delete first and last line of the resulting output

Question

otherdata
otherdata
start_data
one
two
three
four
end_data
otherdata
otherdata

The resulting output should just be:

one
two
three
four

This looked like a job for sed to me:

sed -n '/start_data/,/end_data/{1d;$d;p}' myfile

Did not work. First line was deleted, but not the last line! (for no reason that I could explain by logic so far)

OK, so let's try the ugly way:

sed -n '/start_data/,/end_data/{/start_data\|end_data/!p}' myfile

Fair enough, this works. But I'd like to make the shorter method work as well, as the resulting output will always contain the two patterns on first and last line, since we're only extracting the data in between.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

Seriously. Define "*between". Define "print, then delete". What do you mean, "... the resulting* output will always contain the two patterns on first and last line ..."? Do you want start_data and end_data in your output or don't you? — G-Man Says 'Reinstate Monica', Jun 18 '15 at 05:44
@G-Man "Resulting output" refers to the output between the two pattern matches A and B (both of which I want to exclude). And from plain logic, what you get out will always have pattern A as first line and pattern B as the last, so a simple sed statement that does a d on first and last line will do. — syntaxerror, Jun 18 '15 at 08:47
@Cyrus Oops!! Good catch. Totally forgot to specify my output... — syntaxerror, Jun 18 '15 at 08:52
@StéphaneChazelas Thanks for that FAQ site, but -- that addresses a different problem, at least telling from the output (Contents of input.fil - Output of sed script). It will also print the lines preceding first pattern and following second pattern. Hence, that's not entirely the same... — syntaxerror, Jun 18 '15 at 09:29
That FAQ is about generally operating on a range exluding the boundaries, in that example they prepend >> to each line, but of course, you can use -n and print them instead or whatever operation you want to perform on the range. — Stéphane Chazelas, Jun 18 '15 at 09:54

score 4 · Accepted Answer · edited Jun 18 '15 at 09:59

4

You can reverse the logic:

sed '1,/start_data/d;/end_data/,$d'

That assumes start_data is not on the first line. To work around that, if you have GNU sed, you can make it instead:

sed '0,/start_data/d;/end_data/Q'

That 0 and Q are GNU-specific. Q quits sed without printing the pattern space, so that would also make it more efficient as it wouldn't keep reading and discarding the rest of the file as with the first solution.

edited Jun 18 '15 at 09:59

Stéphane Chazelas

544,893

answered Jun 18 '15 at 04:37

jimmij

47,140

Though this solution is very elegant, it gives a null string. :(( Have you actually tested your line? – syntaxerror Jun 18 '15 at 08:43
@syntaxerror works fine on your example with GNU sed version 4.2.1. – jimmij Jun 18 '15 at 09:26
1

@syntaxerror. It certainly works on the sample you provided. It would only give an empty output if start_data was not found in the input or it was only found on the first line (with GNU sed you can replace 1 with 0 to work around that), or if there's end_data on the next line after the first one containing start_data. – Stéphane Chazelas Jun 18 '15 at 09:27
YES! This works. Thank you very much. I do have GNU sed here and the 1 must be replaced by 0. So what you're saying is, for the 1 case, pattern #1 must be preceded by some other data, or sed will fail. Of course, these things may happen :) Gladly the 0 variant will also work in both cases, just tried with one of my very large lists. – syntaxerror Jun 18 '15 at 09:36
Congrats, you've just gained a 50 percent clarity boost with the update of your answer. :) Looks near-perfect now, well done. (Those freaking GNU-isms every time, grrrr. :-@) – syntaxerror Jun 18 '15 at 14:31

score 3 · Answer 2 · edited Jun 18 '15 at 01:22

3

awk seems to be a good fit to this problem:

$ awk '/end_data/{f=0;};f{print;};/start_data/{f=1;}' myfile
one
two
three
four

The above uses the flag f to decide if a line should be printed. When start_data, the flag is set to true (1). When end_data is found, the flag is set to false (0). When f is true, the line is printed.

Why does sed choke at the attempt of combining the 1d and $d statements in curly braces?

It is not "choking." It is just that 1d and $d refer to the first and last lines in the file, not the first and last lines in the pattern.

edited Jun 18 '15 at 01:22

cuonglm

153,898

answered Jun 18 '15 at 00:49

John1024

74,655

Yes, that's what I had assumed! That is, when I match between two patterns, that the 1d and $d will refer to the resulting output after I "filtered" the data by restricting the content between start_data and end_data, not the source file as-is. – syntaxerror Jun 18 '15 at 00:53

mikeserv · Answer 3 · 2015-06-19T00:06:13.300

3

Well, this works:

sed -ne/start_data/!d\;:n -e'n;/end_data/q;p;bn' <in

It doesn't even attempt to print until it encounters /start_pattern/ and from that address on through to the last line, it will replace the current line w/ the next, quit input entirely if the newline pulled in matches /end_data/, or else print. And that's all.The output is, given your sample data:

one
two
three
four

It won't recognize a line as an end_data match if it also matches the first start_data line which occurs in input.

edited Jun 19 '15 at 00:06

answered Jun 18 '15 at 04:52

mikeserv

58,310

@StéphaneChazelas Right, just noticed that by trial-and-error. :) – syntaxerror Jun 18 '15 at 11:38
Great update, Mike. Good work. – syntaxerror Jun 18 '15 at 23:55
@syntaxerror - I really couldn't follow what you said before, but I read it right after I woke up, so, in fairness, I don't think I was all there when I did. – mikeserv Jun 19 '15 at 00:03
1

Heh, never do that. A cup of coffee always works wonders (at least for me). :P – syntaxerror Jun 19 '15 at 00:48
Thanks. :) Upvote done now, since you've begged for it. See, you got 20k rep, so my upvote is just like a water drop in a big sea. You can't be serious to require my upvote to be happy. ;-) With 20k, you've achieved everything imaginable that can be achieved on this site. I would accept tens of downvotes per week should I ever get to enter this rep zone...I'd just not care anymore. – syntaxerror Jun 19 '15 at 08:33
@syntaxerror - no, i don't care - you can vote however you want - i just thought it was really weird. that's never happened before. especailly because the comment's a lot harder to write. I was confused, mostly. – mikeserv Jun 19 '15 at 08:35

score 1 · Answer 4 · answered Jun 18 '15 at 01:46

You have an answer to your question already; I'll throw in another way of doing this using Perl.

< inputfile perl -0777 -pe 's/^(.*\n)*?start_data.*\n((.*\n)*?)end_data(.*\n)*/$2/'

-0777: slurps the whole file at once instead of one line at the time
-p: places a while (<>) {[...]} loop around the script and prints the processed file
-e: reads the script from the arguments

Perl command breakdown:

s: asserts to perform a substitution
/: starts the pattern
^: matches the start of the file
(.*\n)*?: matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
start_data.*\n: matches a start_data string, any number of any character greedily within the current line and a newline
((.*\n)*?): groups and matches any number of any character greedily within the current line and a newline, zero or more times lazily within the current file (i.e. it matches the least times as possible, stopping when the following pattern starts to match)
end_data: matches an end_data string
(.*\n)*: matches any number of any character greedily within the current line and a newline, zero or more times greedily within the current file (i.e. it matches the the most times as possible)
/: stops the pattern / starts the replacement string
$2: replaces with the second captured group
/: stops the replacement string / starts the modifiers

+1 because of your great effort in explaining perl crypto-lingo to rookies ;-) — syntaxerror, Jun 18 '15 at 08:36

Scott - Слава Україні · Answer 5 · 2015-06-18T18:02:20.780

1

Here, let me make a trivial, cosmetic modification to the input file provided in the question:

% cat myfile
red
orange
start_data
one
two
three
four
end_data
yellow
green

I have simply replaced the otherdata lines with distinct other data, so we can refer to every line in the input file uniquely, by content, without having to say “the first line”, since that is apparently subject to misinterpretation, or “the first otherdata line”, which is a little verbose (and, for all I know, also maybe subject to misinterpretation).

Now, probably the closest thing you're going to find to your first attempt is

% sed -n '/start_data/,/end_data/p' myfile | sed '1d;$d'
one
two
three
four

Your first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile) "chokes" because (as John1024 said) line 1 is the red line^* and line $ is the green line^**. The 1d;$d; has no effect because those lines (along with, in fact, all of the otherdata/colordata lines) are already excluded by the /start_data/,/end_data/ range.
__________
^* i.e., the first line in the entire input file, not just the matched range
^** i.e., the last line in the entire input file, not just the matched range

By the way, are you saying that your command produced the following output?

one
two
three
four
end_data

Because that doesn't make sense, unless start_data was line 1 (i.e., if red and orange were absent).

edited Jun 18 '15 at 18:02

answered Jun 18 '15 at 11:32

Scott - Слава Україні

10,519

"because those lines are already excluded by the /start_data/,/end_data/ range" Nope, in fact they are NOT. sed -n '/start_data/,/end_data/p' myfile WILL print both start_data and end_data patterns, each on their own line (first/last). If you don't believe me, try it out. :) – syntaxerror Jun 18 '15 at 11:41
@syntaxerror: You're not reading what John and I are saying!!! As long as you do a single sed command (in contrast to my answer, which does sed … | sed …), line 1 is the red line and line $ is line 10 which is the green line. And the /start_data/,/end_data/ range excludes the color lines (a.k.a. the otherdata lines in your question). If you don't believe me, try sed -n '/start_data/,/end_data/{4d;p}' myfile (but first, guess what the output will be). – Scott - Слава Україні Jun 18 '15 at 11:56
To begin with, I have no idea why you're always referring to red and green...is this some allusion to sports which I don't get perhaps? ;) Or to a traffic light? Symbolism is great, but it's always hard to grasp without explaining the symbols first... P.S. Nevertheless, the downvote is not from me. – syntaxerror Jun 18 '15 at 14:33
(0) @syntaxerror: I have edited my answer to clarify the use of the spectrum. … … … … … … … … … Also, would somebody care to explain the downvote? My answer (1) solves the problem, in sed (as the OP requested), with a command that, as far as I can tell, works correctly for all reasonable variations of the input data (e.g., start_data on the first line or end_data on the last) without requiring any GNUisms, and also (2) takes another crack at answering the (explicit) question, “Why does sed choke” on the OP’s first attempt (sed -n '/start_data/,/end_data/{1d;$d;p}' myfile)? – Scott - Слава Україні Jun 18 '15 at 18:05

Print everything between two patterns, then delete first and last line of the resulting output

5 Answers5