Stanford CS144 lab 1
这道题目,我觉得是比较困难的,如果你没有进行一些系统的刷题经验,那么这道题目可能会让你有点吃力。我两个月前第一次做是做了整整7天(包括debug),虽然并不是一直耗在这道题目上,但还是好了很久的,甚至最后过了其其他测试,speedtest还是过不了。这两天重新pickup了,只花了一整天,两三个小时写,三四个小时debug就顺利过了。
首先,你需要了解字节流重排器的概念。字节流重排器的作用是将一个字节流按照一定的规则重新排列,使得字节流中的数据按照一定的顺序排列。它要求把失序乱序、重叠重复、冗余数据的数据给变成合法的数据流。
如果你刷过一些关于Intervals的题目,这道题目你花个最多一天应该就能搞定。如果没有比较系统性的解决方案,而是去分类讨论硬做,其实是事倍功半的。
前置刷题推荐
这些经典的Interval问题,可以做做,对这个lab有帮助。
这类题基本套路就是排序,然后遍历合并。
我的解法
- reassembler.hh
#pragma once
#include "byte_stream.hh"#include <iostream>#include <set>#include <string>#include <algorithm>#include <cmath>
struct Interval { uint64_t start; uint64_t end; std::string data;
bool operator<(const Interval& other) const { if (start == other.start) { return end < other.end; } return start < other.start; }};
class Reassembler{public: // Construct Reassembler to write into given ByteStream. explicit Reassembler( ByteStream&& output ) : output_( std::move( output ) ) {}
void insert( uint64_t first_index, std::string data, bool is_last_substring );
// How many bytes are stored in the Reassembler itself? uint64_t bytes_pending() const;
// Access output stream reader Reader& reader() { return output_.reader(); } const Reader& reader() const { return output_.reader(); }
// Access output stream writer, but const-only (can't write from outside) const Writer& writer() const { return output_.writer(); }
private: ByteStream output_; // the Reassembler writes to this ByteStream std::set<Interval> buf_ {}; uint64_t nxt_expected_idx_ = 0; uint64_t eof_idx_ = UINT64_MAX;};
- reassembler.cc
#include "reassembler.hh"
using namespace std;
void Reassembler::insert( uint64_t first_index, string data, bool is_last_substring ){ uint64_t wd_start = nxt_expected_idx_; uint64_t wd_end = wd_start + output_.writer().available_capacity(); uint64_t cur_start = first_index; uint64_t cur_end = cur_start + data.size();
// set the eof index of this reassembling if (is_last_substring) { eof_idx_ = cur_end; }
if (cur_start >= wd_end) { return; }
uint64_t start_idx = max(wd_start, cur_start); uint64_t end_idx = min(wd_end, cur_end); if (start_idx >= end_idx) { if (nxt_expected_idx_ == eof_idx_) { output_.writer().close(); } return; } uint64_t len = end_idx - start_idx;
// insert the current data buf_.insert({start_idx, end_idx, data.substr(start_idx - first_index, len)});
// handle the overlapping of intervals std::vector<Interval> merged; auto it = buf_.begin(); Interval last = *it; it ++;
while (it != buf_.end()) { if (it->start <= last.end) { if (last.end < it->end) { last.end = it->end; last.data = last.data.substr(0, it->start - last.start) + it->data; } } else { merged.push_back(last); last = *it; } it ++; } merged.push_back(last);
buf_.clear(); for (const auto& interval : merged) { buf_.insert(interval); }
// push when it ready it = buf_.begin(); while (it->start == nxt_expected_idx_) { output_.writer().push(it->data); nxt_expected_idx_ = it->end; it = buf_.erase(it); }
// close when all bytes are pushed if (nxt_expected_idx_ == eof_idx_) { output_.writer().close(); }}
uint64_t Reassembler::bytes_pending() const{ uint64_t pendcnt = 0; for (const auto& interval : buf_) { pendcnt += interval.end - interval.start; } return pendcnt;}