-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New bulk import api failed during test with agitation #1260
Comments
I wrote a test that continually creates a table with splits a and scans its metadata while another thread add splits. I found one case where the LinkingIterator misses the first tablet when a split is added at the beginning of the table when it starts scanning. Not sure if this is the cause of the problem seen here though. |
Wrote the following test to stress test the LinkingIterator while researching this issue. The test runs continually scanning the metadata table while a table is splitting. This is to look for issues where the linking iterator does not handle split properly. It did find one issue. Not sure what to do with the test thought as it just runs for a long time. public class TmpIT extends AccumuloClusterHarness {
@Override
public int defaultTimeoutSeconds() {
return 30;
}
@Test
public void test() throws Exception {
for (int i = 0; i < 1000; i++) {
testScanWhileSplitting();
System.out.println("I " + i);
}
}
private void testScanWhileSplitting() throws AccumuloSecurityException, AccumuloException,
TableExistsException, TableNotFoundException {
Random rand = new Random();
List<Text> initialSplits = new ArrayList<>();
int initSize = rand.nextInt(1990) + 10;
for (int i = 0; i < initSize; i++) {
initialSplits.add(new Text(String.format("%016x", rand.nextLong())));
}
Collections.sort(initialSplits);
// call the batchwriter with buffer of size zero
String table = getUniqueNames(1)[0];
try (AccumuloClient c = Accumulo.newClient().from(getClientProps()).build()) {
ExecutorService es = Executors.newFixedThreadPool(1);
NewTableConfiguration ntc = new NewTableConfiguration();
ntc.withSplits(new TreeSet<>(initialSplits));
c.tableOperations().create(table, ntc);
String id = c.tableOperations().tableIdMap().get(table);
Future<Void> splitFuture = es.submit(() -> {
List<Text> nextSplits = new ArrayList<>();
int nextsize = rand.nextInt(1990) + 10;
for (int i = 0; i < nextsize; i++) {
nextSplits.add(new Text(String.format("%016x", rand.nextLong())));
}
c.tableOperations().addSplits(table, new TreeSet<Text>(nextSplits));
return null;
});
do {
// TODO in LoadFiles following called overlapping() while building
Iterator<TabletMetadata> tabletIter = TabletsMetadata.builder().forTable(TableId.of(id))
.checkConsistency().fetchPrev().fetchLocation().fetchLoaded().build(c).iterator();
ArrayDeque<Text> splits = new ArrayDeque<Text>(initialSplits);
Text prev = null;
while (tabletIter.hasNext() && !splits.isEmpty()) {
Text split = splits.removeFirst();
KeyExtent extent = tabletIter.next().getExtent();
Assert.assertEquals(prev, extent.getPrevEndRow());
while (extent.getEndRow().compareTo(split) < 0) {
prev = extent.getEndRow();
extent = tabletIter.next().getExtent();
Assert.assertEquals(prev, extent.getPrevEndRow());
}
prev = extent.getEndRow();
Assert.assertEquals(split, extent.getEndRow());
}
while (tabletIter.hasNext()) {
KeyExtent extent = tabletIter.next().getExtent();
Assert.assertEquals(prev, extent.getPrevEndRow());
prev = extent.getEndRow();
if (extent.getEndRow() == null) {
Assert.assertFalse(tabletIter.hasNext());
}
}
Assert.assertTrue(splits.isEmpty());
// System.out.println("Finished a pass");
} while (!splitFuture.isDone());
es.shutdown();
c.tableOperations().delete(table);
}
}
} |
The LinkingIterator checks the structure of the AccumuloMetadata table as it is scans. It was not properly handling a split of the first tablet. It was also not ensuring the last tablet for a table was seen. This patch fixes those two issues. These issues were discovered while researching apache#1260, however I am not sure if these issues could have caused apache#1260.
The LinkingIterator checks the structure of the AccumuloMetadata table as it is scans. It was not properly handling a split of the first tablet. It was also not ensuring the default tablet for a table was seen. This patch fixes those two issues. These issues were discovered while researching #1260, however I am not sure if these issues could have caused #1260.
I suspect that splitting could have been a cause for this issue. Yesterday I ran a test importing 100 directories into a table with a small split threshold (32M). This caused a lot of splits to happen while the imports were running. I did not see any problems. I am currently running a test to import 1000 directories with agitation incorporating some of the recent changes. I opened apache/accumulo-testing#94 because generating the 1000 dirs for import takes quite a while. |
I have done a lot of testing since seeing this and have not been able to reproduce this so far. I am going to close this for now. |
While running bulk import with agitation on a small EC2 cluster I noticed a blip marker was not going away.
I looked the master logs for the FATE transaction ID and found that it failed with the following error message.
The text was updated successfully, but these errors were encountered: